Google’s speech recognition technology now has a 4.9% word error rate

Authored by venturebeat.com and submitted by spsheridan
image for Google’s speech recognition technology now has a 4.9% word error rate

Google CEO Sundar Pichai today announced that the company’s speech recognition technology has now achieved a 4.9 percent word error rate. Put another way, Google transcribes every 20th word incorrectly. That’s a big improvement from the 23 percent the company saw in 2013 and the 8 percent it shared two years ago at I/O 2015.

The tidbit was revealed at Google’s I/O 2017 developer conference, where a big emphasis is on artificial intelligence. Deep learning, a type of AI, is used to achieve accurate image recognition and speech recognition. The method involves ingesting lots of data to train systems called neural networks, and then feeding new data to those systems in an attempt to make predictions.

“We’ve been using voice as an input across many of our products,” Pichai said onstage. “That’s because computers are getting much better at understanding speech. We have had significant breakthroughs, but the pace even since last year has been pretty amazing to see. Our word error rate continues to improve even in very noisy environments. This is why if you speak to Google on your phone or Google Home, we can pick up your voice accurately.”

For the sake of comparison, Microsoft declared in October 2016 that it had reached speech recognition parity with humans. Its word error rate at the time was 5.9 percent, though it’s not clear if the two companies are following the same standards of evaluation.

Google has been touting its speech recognition improvements for a while now. Earlier this year, the company said it had slashed its speech recognition word error rate by more than 30 percent since 2012. The main reason for the drastic improvement? Google confirmed that it’s the use of neural networks.

Pichai also shared an interesting tidbit about Home’s development: “When we were shipping Google Home, we were originally planning to include eight microphones… But thanks to neural networks, using a technique called ‘neural beam forming’, we were able to ship it with just two microphones and achieve the same quality.”

So if you’re surprised at how well (or poorly) Google understands what you’re saying, this is why. Recognition is getting better and better, but there’s still room to get that word error rate closer to 0 percent.

voneiden on May 18th, 2017 at 10:19 UTC »

Something that doesn't get appreciated enough is its bilingual abilities. I can mix English and Finnish and say "Ok google, navigate to vesiperäntie kolmetoista" and it picks the road address up correctly as Vesiperäntie 13.

One interesting observation is that when my query is fully in English the answer usually comes back in English, but sometimes in Finnish.. seemingly randomly. I might say "set reminder to get the laundry in fifty five minutes" yet I'm greeted with the reminder screen localized into Finnish (and with the phone asking questions in Finnish). Not sure if it's a bug or if it's alive. Wouldn't be surprised if google wonders the same.

Frexxia on May 18th, 2017 at 07:14 UTC »

The automated subtitles on YouTube are getting extremely impressive. Almost perfect if the sound quality is good.

sputnikv on May 18th, 2017 at 02:14 UTC »

meanwhile siri just butchers normal queries into utter nonsense