Like text input auto-correction, voice-to-text smartphone solutions are often a wellspring of laughs due to comical or bizzare transcription results. But Google is working to fix some of that with a new personalized recognition option on its Android Voice Search software, which should improve voice transcription over time. More importantly, it provides a glimpse at how future voice programs could operate and how we’ll end up interacting with increasingly personal cloud services.
Here’s how it works: Android users who opt in to the latest version of Voice Search on their Android device will associate their voice input with their Google account. Over time, Google’s servers will build a speech model tailored to a specific user, combining its pronunciation learnings to improve the product. In a blog post announcing the new product today, Google said the effect will be subtle at first but will quickly improve over time. The latest version of Voice Search will also include some improvements to name recognition and speed, particularly over 3G and slower EDGE networks.
This is a powerful development that leverages the cloud, not just to make remote services available in more places, but very personalized as well. By tying a user’s utterances all together to one account, they’re helping Google learn how to best serve them. That can be unnerving for some and could raise serious privacy issues for people worried about a technology company holding on to recordings of their voice, however. Google, for its part, is making sure that users can easily disable personalized recognition at any time. Users can also disassociate the voice recordings with their account through their Google Dashboard but it’s unclear if Google will delete them.
Aside from such concerns, this is the wave of the future for voice actions. Right now, we turn to voice recognition apps for one-off voice commands, local searches, navigation or transcription. But because these services live in the cloud, companies can build a unified experience for a user that spans multiple devices.
In a recent interview with Microsoft’s Ilya Bukshteyn of Google’s voice business, he said a user’s interaction with a voice service, if they allow it, will eventually be coordinated on the back-end so the machines can not only improve their results by learning a user’s speech pattern, but they will also be able to essentially carry on conversations with users. Ultimately, Microsoft, Google or other providers will recognize users as they interact with their voice services from a variety of smart devices — phones, computers, cars, set-top boxes — and will start to anticipate what they want as a regular human will.
It may sound a little creepy, but that’s what the cloud can do, given remote processing power combined with connectivity. And that’s why this is going to be key area for technology companies like Microsoft, Google, and Apple as Om recently noted. Voice recognition is not just for tapping discrete services, it can all be tied into one personal experience. Users will need to get used to the idea of servers in the cloud carrying on a conversation with them. That’s what’s possible as long as people opt in. Google’s personalized recognition is just the first step along that road.
Related content from GigaOM Pro (sub req’d):
- How Speech Technologies Will Transform Mobile Use
- Transient Apps: The Consumer Influence on Enterprise Mobility, Part 2
- Rogue Devices: The Consumer Influence on Enterprise Mobility, Part 1
Image courtesy Flickr user DJOtaku.