Talk of the Town
The most powerful software of the future may be your very own
voice.
. Illustration by Huan Tran.
Your wallet is empty and you're hungry, so you do the logical
thing: You ask your car, "Where's the nearest
ATM, and, by the way,
where's the nearest sushi restaurant?" No, you're not crazy. You're
just driving a
Honda Accord equipped with Touch by Voice, a
voice-recognition system powered by
IBM. Seconds later, the car
talks back through its speaker system, telling you where to load up
on cash and also where to score a
California roll. It all happens
so effortlessly that you forget you're talking to a computer.
Slowly, discreetly but pervasively,
voice recognition - where
computers hear us speak and know what we mean - has become a part
of our everyday lives. "It's amazing how often most of us now use
voice recognition, frequently without realizing we are," says
Peter Mahoney, vice president of marketing for Nuance
Communications, a Burlington, Massachusetts-based developer of
tools for what the trade calls "voice rec."
About a decade ago, when big companies first began experimenting
with voice recognition, we definitely knew we were tangling with
it, because most of the time the systems did not work. Computers
would whine, "Could you say that again?" "Sorry. I don't
understand." Our reaction was a swift no thanks - give us a person
to speak with. Now computers
do understand us. "Accuracy is
much better today," says Mahoney, thanks to computers that are
smarter and more powerful. Underlying recognition algorithms (the
math that shapes the systems) have gotten better, too, as having 10
years of input has permitted researchers to tweak their formulas to
let us speak more naturally but still be understood.
Also fueling this recent consumer embrace of voice recognition is
what Mahoney delicately refers to as a "backlash against talking
with nonnative speakers," which, put more plainly, means that many
of us would rather talk to a computer than to someone at a call
center in a developing country. "We are a self-serve society," adds
Cambridge, Massachusetts-based Paul Kowal, coauthor of
Enabling
IVR Self-Service with Speech Recognition. "Voice-recognition
systems are always friendly." They never tell us we are wrong, they
are unfailingly cheerful, and, increasingly, they're indeed giving
us what we want. What's not to like?
Voice recognition is also cheap - low costs are driving many
deployments as companies look for ways to save money on human
employees, who require salaries. But the real excitement swirling
around this software is the growing recognition that voice is one
data-input device most of us always have with us, particularly in
an age of ubiquitous wireless phones. Companies are now learning to
harness voice inputs so that we can do truly cool things more
quickly and easily than ever.
"Using your voice to get the information you want is 10 times
faster than using a mobile phone's keypad," says Dipanshu Sharma,
CTO of San Diego-based V-Enable, a pioneer in developing
speech-based search tools. Of course, a conventional wireless phone
can be used to search for, say, movie showtimes - but go ahead and
type in "Snakes on a Plane" and your zip code. Wouldn't it be much
faster just to ask your phone for this information? "With our
service, that's what you do," says Sharma, who also says Verizon
Wireless subscribers already can tap into this voice-powered
service.
"Voice is quicker and smarter than your finger is," agrees Tom
Freeman, a cofounder of VoiceBox Technologies in Bellevue,
Washington. "Using keystrokes, you'll probably need at least eight
to download a new ringtone for your phone. Using our tools, just
say, 'Show me ringtones by Usher.'?" Freeman adds that, lately,
voice-rec tool providers have been upping the ante. "We don't want
the system ever to say, 'Sorry. I don't understand.' We are trying
to build in context awareness. The better we understand your
context, the more likely we are to understand what you want." He
provides this example: Say you call a help number and mumble into
your cell phone, "Blah-blah traffic." Are you asking about a 1970s
super band?
Michael Douglas movies? Local road conditions? If you
were to call into a movie hotline, the system would have a head
start in giving you the right information. "We are getting smarter
about building a hypothesis about what the user wants to know.
That's as important in improving responses as are the gains in
understanding the spoken words," says Freeman.
Here's how smart voice rec has become. You are in a hopping, noisy
bar in
Bangkok, and suddenly you're overcome with the urge to sing
"Jumpin' Jack Flash." Dial into
Grammy Thailand, a Thai wireless
and entertainment provider, say the song title, and bam! - out
blasts a karaoke-perfect version of the
Rolling Stones classic.
"Put your phone in speaker mode and sing right along," says Mike
Katz, director of product marketing at NMS Communications, a
Framingham,
Massachusetts, developer of communications
technologies.
Hold on, though, because you really haven't heard anything yet.
IBM's voice-recognition guru,
Brian Garr, says scientific ambitions
are white-water fast when it comes to what voice recognition will
do next. IBM's Superhuman Speech project, for instance, aims to
create computers that are better at understanding speech than
humans are, says Garr. That's right,
better than we are. "We
believe we will be there by 2011," he says.
"We're still in the early days of
speech recognition, comparable to
where the Internet was 10 years ago," adds Garr. But, watch out, he
says, because just as the Internet became integral to our lives, so
will speech rec, probably a lot faster than we expect. "We're just
now figuring out so many new uses."
The examples keep multiplying. A case in point, coming probably
within the next year to your cell phone: "You'll be able to dictate
SMS messages into your wireless," predicts Nuance's Mahoney, whose
company is far along in its development of that very tool.
Picture zapping this message to a coworker: "SMS iz kewl 2 uz, bt a
pain 2 typ, w aL d multi-tapping. It wud b so gr8 jst 2 spk it!"
How long would it have taken you to tap that into a phone? And
that's assuming you know the
SMS shorthand that allows quicker
input. But it would be many times faster just to dictate your
message and let the smart phone do the typing for you.
"Big leaps are coming in the near future," promises Mahoney. The
technology, finally, is here - computers hear and understand us.
Now it comes down to creating tools we want to use - and that, says
Mahoney, is exactly what's going on. Can you hear it happening?
Operators Are Standing By
Sometimes voice-recognition software just can't get the job done.
Bill Andrews, general manager of Self-Service Speech Solutions for
Convergys, a Cincinnati-based provider of outsourced customer-care
solutions, says well-designed systems always give users an easy
opt-out because voice recognition does not work seamlessly for
everybody, particularly for those with speech impediments or heavy
accents.
If you're frustrated by a tin-eared computer, there are ways to
bypass the voice-rec maze. Go to
Gethuman.com, click on the Database tab, and you'll
find a long list of companies and the secret formulas for getting
through to a real, live person.
It Really Does Work
Imagine my voice as you read this. Why? Because this sidebar was
"written" using Nuance's Dragon NaturallySpeaking, a voice-to-text
computer program. What is impressive is that the start-up time
needed for getting the program to recognize my voice with a high
degree of accuracy was under 30 minutes. That's a vast improvement.
Four years ago, I broke an arm in a freak accident, and to stay on
deadline, I used a voice program for the eight weeks my arm was out
of action. Ramp-up time for that software was about a day - a
long day spent reading texts into the computer to teach it
how to recognize my accent, intonation, and other speaking quirks.
And the program was never very accurate - perhaps it got 80 percent
of my speech at best. With today's software, Nuance has cut the
learning time down to just a few minutes. How's the accuracy with
the new program? Pretty good. Not perfect, as text still needs a
close review and some polishing. But call the software 95 percent
on target.
Even better, this is software that can be used to do most tasks on
a computer. Integration with
Microsoft Office is complete, meaning
you can write in Word, do e-mail in Outlook, and even use Excel and
PowerPoint, with all data input happening with your voice.
The disadvantage: Talking to a computer is significantly slower than typing, at least for me. At a guess, as I write this, I am speaking perhaps 25 words a minute, compared with a typing rate of twice that. Nuance promises that, with use, eventually we will learn to dictate much faster, perhaps as fast as 140 words per minute, a rate matched only by world-class typists. As a beginner, I am much slower, and, of course, that is a frustration
.
For people with physical problems, however, NaturallySpeaking is a first-rate solution. And for anybody who is tired of hunting and pecking at a keyboard, this software is well worth a try.
List price for Dragon NaturallySpeaking 9 Professional, the version tested, is about $750. Cheaper versions — even under $100 — are available, as are low-cost upgrade licenses for prior users. For more information, go to
www.nuance.com/naturallyspeaking/preferred.