Hear No Evil - The Current State of Speech Recognition

Bill Loguidice's picture

There is a great article by Robert Fortner on the once promising field of computer speech recognition, cheekily titled, "Rest in Peas: The Unrecognized Death of Speech Recognition". This is something that I think many of us with an interest in technology have thought about one time or another. While we've had many half starts on the personal computing end of the equation since the early 1980's, it's never taken off the way we all hoped, particularly now in late 2010. Even the original Star Trek series from the 1960's famously made it seem like computer generated speech would be a more difficult task than understanding speech. How wrong they were! Interestingly, I had recently experimented with Dragon Naturally Speaking, which is mentioned in the article, and it certainly let me down. You see, we had interviewed quite a few game developers for our upcoming feature film documentary and had about 15 hours of raw interview footage. Since these interviews were conducted with professional audio/video equipment, I thought I'd be clever and use the software to transcribe them for me, even if it meant some clean up on my part. Naturally, I ran into the very real problem of "training". Without being able to train the software in the interviewee's voice, Dragon Naturally Speaking was hopelessly lost in trying to figure out the majority of what was said, making any type of automated transcription useless. In any case, enough about that, check out the article and wonder along with me if the excellent speech recognition in Microsoft's Kinect implementation on the Xbox 360 - which requires no training whatsoever - will ever do more than understand single words and simple sayings. Certainly there have been interesting attempts even in the recent past, albeit some sloppy ones.