 |
|
 |
|
 |
|
|
|
 |
 |
Talk of the Town
The most powerful software of the future may be your very own voice. By Robert McGarvey. Illustration by Huan Tran.
Your
wallet is empty and you’re hungry, so you do the logical thing: You ask
your car, “Where’s the nearest ATM, and, by the way, where’s the
nearest sushi restaurant?” No, you’re not crazy. You’re just driving a
Honda Accord equipped with Touch by Voice, a voice-recognition system
powered by IBM. Seconds later, the car talks back through its speaker
system, telling you where to load up on cash and also where to score a
California roll. It all happens so effortlessly that you forget you’re
talking to a computer.
Slowly, discreetly but pervasively, voice
recognition — where computers hear us speak and know what we mean — has
become a part of our everyday lives. “It’s amazing how often most of us
now use voice recognition, frequently without realizing we are,” says
Peter Mahoney, vice president of marketing for Nuance Communications,
a Burlington, Massachusetts–based developer of tools for what the trade
calls “voice rec.”
About a decade ago, when big companies first
began experimenting with voice recognition, we definitely knew we were
tangling with it, because most of the time the systems did not work.
Computers would whine, “Could you say that again?” “Sorry. I don’t
understand.” Our reaction was a swift no thanks — give us a person to
speak with. Now computers do understand us. “Accuracy is much
better today,” says Mahoney, thanks to computers that are smarter and
more powerful. Underlying recognition algorithms (the math that shapes
the systems) have gotten better, too, as having 10 years of input has
permitted researchers to tweak their formulas to let us speak more
naturally but still be understood.
Also fueling this recent
consumer embrace of voice recognition is what Mahoney delicately refers
to as a “backlash against talking with nonnative speakers,” which, put
more plainly, means that many of us would rather talk to a computer
than to someone at a call center in a developing country. “We are a
self-serve society,” adds Cambridge, Massachusetts–based Paul Kowal,
coauthor of Enabling IVR Self-Service with Speech Recognition. “Voice-recognition
systems are always friendly.” They never tell us we are wrong, they are
unfailingly cheerful, and, increasingly, they’re indeed giving us what
we want. What’s not to like?
Voice recognition is also cheap —
low costs are driving many deployments as companies look for ways to
save money on human employees, who require salaries. But the real
excitement swirling around this software is the growing recognition
that voice is one data-input device most of us always have with us,
particularly in an age of ubiquitous wireless phones. Companies are now
learning to harness voice inputs so that we can do truly cool things
more quickly and easily than ever.
“Using your voice to get the
information you want is 10 times faster than using a mobile phone’s
keypad,” says Dipanshu Sharma, CTO of San Diego–based V-Enable, a
pioneer in developing speech-based search tools. Of course, a
conventional wireless phone can be used to search for, say, movie
showtimes — but go ahead and type in “Snakes on a Plane” and your zip
code. Wouldn’t it be much faster just to ask your phone for this
information? “With our service, that’s what you do,” says Sharma, who
also says Verizon Wireless subscribers already can tap into this
voice-powered service.
“Voice is quicker and smarter than your
finger is,” agrees Tom Freeman, a cofounder of VoiceBox Technologies
in Bellevue, Washington. “Using keystrokes, you’ll probably need at
least eight to download a new ringtone for your phone. Using our tools,
just say, ‘Show me ringtones by Usher.’ ” Freeman adds that, lately,
voice-rec tool providers have been upping the ante. “We don’t want the
system ever to say, ‘Sorry. I don’t understand.’ We are trying to build
in context awareness. The better we understand your context, the more
likely we are to understand what you want.” He provides this example:
Say you call a help number and mumble into your cell phone, “Blah-blah
traffic.” Are you asking about a 1970s super band? Michael Douglas
movies? Local road conditions? If you were to call into a movie
hotline, the system would have a head start in giving you the right
information. “We are getting smarter about building a hypothesis about
what the user wants to know. That’s as important in improving responses
as are the gains in understanding the spoken words,” says Freeman.
Here’s
how smart voice rec has become. You are in a hopping, noisy bar in
Bangkok, and suddenly you’re overcome with the urge to sing “Jumpin’
Jack Flash.” Dial into Grammy Thailand, a Thai wireless and
entertainment provider, say the song title, and bam! — out blasts a
karaoke-perfect version of the Rolling Stones classic. “Put your phone
in speaker mode and sing right along,” says Mike Katz, director of
product marketing at NMS Communications, a Framingham, Massachusetts,
developer of communications technologies.
Hold on, though, because you really haven’t heard anything yet.
IBM’s
voice-recognition guru, Brian Garr, says scientific ambitions are
white-water fast when it comes to what voice recognition will do next.
IBM’s Superhuman Speech project, for instance, aims to create computers
that are better at understanding speech than humans are, says Garr.
That’s right, better than we are. “We believe we will be there by 2011,” he says.
“We’re
still in the early days of speech recognition, comparable to where the
Internet was 10 years ago,” adds Garr. But, watch out, he says, because
just as the Internet became integral to our lives, so will speech rec,
probably a lot faster than we expect. “We’re just now figuring out so
many new uses.”
The examples keep multiplying. A case in point,
coming probably within the next year to your cell phone: “You’ll be
able to dictate SMS messages into your wireless,” predicts Nuance’s
Mahoney, whose company is far along in its development of that very
tool.
Picture zapping this message to a coworker: “SMS iz kewl 2
uz, bt a pain 2 typ, w aL d multi-tapping. It wud b so gr8 jst 2 spk
it!” How long would it have taken you to tap that into a phone? And
that’s assuming you know the SMS shorthand that allows quicker input.
But it would be many times faster just to dictate your message and let
the smart phone do the typing for you.
“Big leaps are coming
in the near future,” promises Mahoney. The technology, finally, is here
— computers hear and understand us. Now it comes down to creating tools
we want to use — and that, says Mahoney, is exactly what’s going on.
Can you hear it happening?
|
 |
 |
|
 |
 |
|
|
|
|
|
|
 |
 |
 |
 |
Operators Are Standing By
Sometimes voice-recognition
software just can’t get the job done. Bill Andrews, general manager of
Self-Service Speech Solutions for Convergys, a Cincinnati-based
provider of outsourced customer-care solutions, says well-designed
systems always give users an easy opt-out because voice recognition
does not work seamlessly for everybody, particularly for those with
speech impediments or heavy accents.
If you’re frustrated by a tin-eared computer, there are ways to bypass the voice-rec maze. Go to Gethuman.com,
click on the Database tab, and you’ll find a long list of companies and
the secret formulas for getting through to a real, live person.
|
 |
 |
 |
 |
|
|
|
 |
|
|
 |
 |
 |
 |
It Really Does Work
Imagine my voice as you read this.
Why? Because this sidebar was “written” using Nuance’s Dragon
NaturallySpeaking, a voice-to-text computer program. What is impressive
is that the start-up time needed for getting the program to recognize
my voice with a high degree of accuracy was under 30 minutes. That’s a
vast improvement. Four years ago, I broke an arm in a freak accident,
and to stay on deadline, I used a voice program for the eight weeks my
arm was out of action. Ramp-up time for that software was about a day —
a long day spent reading texts into the computer to teach it
how to recognize my accent, intonation, and other speaking quirks. And
the program was never very accurate — perhaps it got 80 percent of my
speech at best. With today’s software, Nuance has cut the learning time
down to just a few minutes. How’s the accuracy with the new program?
Pretty good. Not perfect, as text still needs a close review and some
polishing. But call the software 95 percent on target.
Even
better, this is software that can be used to do most tasks on a
computer. Integration with Microsoft Office is complete, meaning you
can write in Word, do e-mail in Outlook, and even use Excel and
PowerPoint, with all data input happening with your voice.
The
disadvantage: Talking to a computer is significantly slower than
typing, at least for me. At a guess, as I write this, I am speaking
perhaps 25 words a minute, compared with a typing rate of twice that.
Nuance promises that, with use, eventually we will learn to dictate
much faster, perhaps as fast as 140 words per minute, a rate matched
only by world-class typists. As a beginner, I am much slower, and, of
course, that is a frustration . For people with physical
problems, however, NaturallySpeaking is a first-rate solution. And for
anybody who is tired of hunting and pecking at a keyboard, this
software is well worth a try.
List price for Dragon
NaturallySpeaking 9 Professional, the version tested, is about $750.
Cheaper versions — even under $100 — are available, as are low-cost
upgrade licenses for prior users. For more information, go to www.nuance.com/naturallyspeaking/preferred.
|
 |
 |
 |
 |
|
|
|
 |
|
|
 |
|
|  |