Talk of the TownThe most powerful software of the future may be your very own voice.
. Illustration by Huan Tran.
Your wallet is empty and you’re hungry, so you do the logical thing: You ask your car, “Where’s the nearest
ATM, and, by the way, where’s the nearest sushi restaurant?” No, you’re not crazy. You’re just driving a
Honda Accord equipped with Touch by Voice, a voice-recognition system powered by
IBM. Seconds later, the car talks back through its speaker system, telling you where to load up on cash and also where to score a
California roll. It all happens so effortlessly that you forget you’re talking to a computer.
Slowly, discreetly but pervasively,
voice recognition — where computers hear us speak and know what we mean — has become a part of our everyday lives. “It’s amazing how often most of us now use voice recognition, frequently without realizing we are,” says Peter Mahoney, vice president of marketing for Nuance Communications, a Burlington, Massachusetts–based developer of tools for what the trade calls “voice rec.”
About a decade ago, when big companies first began experimenting with voice recognition, we definitely knew we were tangling with it, because most of the time the systems did not work. Computers would whine, “Could you say that again?” “Sorry. I don’t understand.” Our reaction was a swift no thanks — give us a person to speak with. Now computers
do understand us. “Accuracy is much better today,” says Mahoney, thanks to computers that are smarter and more powerful. Underlying recognition algorithms (the math that shapes the systems) have gotten better, too, as having 10 years of input has permitted researchers to tweak their formulas to let us speak more naturally but still be understood.
Also fueling this recent consumer embrace of voice recognition is what Mahoney delicately refers to as a “backlash against talking with nonnative speakers,” which, put more plainly, means that many of us would rather talk to a computer than to someone at a call center in a developing country. “We are a self-serve society,” adds
Cambridge, Massachusetts–based Paul Kowal, coauthor of
Enabling IVR Self-Service with Speech Recognition. “Voice-recognition systems are always friendly.” They never tell us we are wrong, they are unfailingly cheerful, and, increasingly, they’re indeed giving us what we want. What’s not to like?
Voice recognition is also cheap — low costs are driving many deployments as companies look for ways to save money on human employees, who require salaries. But the real excitement swirling around this software is the growing recognition that voice is one data-input device most of us always have with us, particularly in an age of ubiquitous wireless phones. Companies are now learning to harness voice inputs so that we can do truly cool things more quickly and easily than ever.
“Using your voice to get the information you want is 10 times faster than using a mobile phone’s keypad,” says Dipanshu Sharma, CTO of San Diego–based V-Enable, a pioneer in developing speech-based search tools. Of course, a conventional wireless phone can be used to search for, say, movie showtimes — but go ahead and type in “Snakes on a Plane” and your zip code. Wouldn’t it be much faster just to ask your phone for this information? “With our service, that’s what you do,” says Sharma, who also says Verizon Wireless subscribers already can tap into this voice-powered service.
“Voice is quicker and smarter than your finger is,” agrees Tom Freeman, a cofounder of VoiceBox Technologies in Bellevue, Washington. “Using keystrokes, you’ll probably need at least eight to download a new ringtone for your phone. Using our tools, just say, ‘Show me ringtones by Usher.’?” Freeman adds that, lately, voice-rec tool providers have been upping the ante. “We don’t want the system ever to say, ‘Sorry. I don’t understand.’ We are trying to build in context awareness. The better we understand your context, the more likely we are to understand what you want.” He provides this example: Say you call a help number and mumble into your cell phone, “Blah-blah traffic.” Are you asking about a 1970s super band?
Michael Douglas movies? Local road conditions? If you were to call into a movie hotline, the system would have a head start in giving you the right information. “We are getting smarter about building a hypothesis about what the user wants to know. That’s as important in improving responses as are the gains in understanding the spoken words,” says Freeman.
Here’s how smart voice rec has become. You are in a hopping, noisy bar in
Bangkok, and suddenly you’re overcome with the urge to sing “Jumpin’ Jack Flash.” Dial into
Grammy Thailand, a Thai wireless and entertainment provider, say the song title, and bam! — out blasts a karaoke-perfect version of the
Rolling Stones classic. “Put your phone in speaker mode and sing right along,” says Mike Katz, director of product marketing at NMS Communications, a
Framingham,
Massachusetts, developer of communications technologies.
Hold on, though, because you really haven’t heard anything yet.
IBM’s voice-recognition guru, Brian Garr, says scientific ambitions are white-water fast when it comes to what voice recognition will do next. IBM’s Superhuman Speech project, for instance, aims to create computers that are better at understanding speech than humans are, says Garr. That’s right,
better than we are. “We believe we will be there by 2011,” he says.
“We’re still in the early days of speech recognition, comparable to where the Internet was 10 years ago,” adds Garr. But, watch out, he says, because just as the Internet became integral to our lives, so will speech rec, probably a lot faster than we expect. “We’re just now figuring out so many new uses.”
The examples keep multiplying. A case in point, coming probably within the next year to your cell phone: “You’ll be able to dictate
SMS messages into your wireless,” predicts Nuance’s Mahoney, whose company is far along in its development of that very tool.
Picture zapping this message to a coworker: “SMS iz kewl 2 uz, bt a pain 2 typ, w aL d multi-tapping. It wud b so gr8 jst 2 spk it!” How long would it have taken you to tap that into a phone? And that’s assuming you know the SMS shorthand that allows quicker input. But it would be many times faster just to dictate your message and let the smart phone do the typing for you.
“Big leaps are coming in the near future,” promises Mahoney. The technology, finally, is here — computers hear and understand us. Now it comes down to creating tools we want to use — and that, says Mahoney, is exactly what’s going on. Can you hear it happening?
Operators Are Standing BySometimes voice-recognition software just can’t get the job done. Bill Andrews, general manager of Self-Service Speech Solutions for Convergys, a Cincinnati-based provider of outsourced customer-care solutions, says well-designed systems always give users an easy opt-out because voice recognition does not work seamlessly for everybody, particularly for those with speech impediments or heavy accents.
If you’re frustrated by a tin-eared computer, there are ways to bypass the voice-rec maze. Go to
Gethuman.com, click on the Database tab, and you’ll find a long list of companies and the secret formulas for getting through to a real, live person.
It Really Does WorkImagine my voice as you read this. Why? Because this sidebar was “written” using Nuance’s Dragon NaturallySpeaking, a voice-to-text computer program. What is impressive is that the start-up time needed for getting the program to recognize my voice with a high degree of accuracy was under 30 minutes. That’s a vast improvement. Four years ago, I broke an arm in a freak accident, and to stay on deadline, I used a voice program for the eight weeks my arm was out of action. Ramp-up time for that software was about a day — a
long day spent reading texts into the computer to teach it how to recognize my accent, intonation, and other speaking quirks. And the program was never very accurate — perhaps it got 80 percent of my speech at best. With today’s software, Nuance has cut the learning time down to just a few minutes. How’s the accuracy with the new program? Pretty good. Not perfect, as text still needs a close review and some polishing. But call the software 95 percent on target.
Even better, this is software that can be used to do most tasks on a computer. Integration with
Microsoft Office is complete, meaning you can write in Word, do e-mail in Outlook, and even use Excel and PowerPoint, with all data input happening with your voice.
The disadvantage: Talking to a computer is significantly slower than typing, at least for me. At a guess, as I write this, I am speaking perhaps 25 words a minute, compared with a typing rate of twice that. Nuance promises that, with use, eventually we will learn to dictate much faster, perhaps as fast as 140 words per minute, a rate matched only by world-class typists. As a beginner, I am much slower, and, of course, that is a frustration
.
For people with physical problems, however, NaturallySpeaking is a first-rate solution. And for anybody who is tired of hunting and pecking at a keyboard, this software is well worth a try.
List price for Dragon NaturallySpeaking 9 Professional, the version tested, is about $750. Cheaper versions — even under $100 — are available, as are low-cost upgrade licenses for prior users. For more information, go to
www.nuance.com/naturallyspeaking/preferred.