IBM announced an important milestone in conversational speech recognition last year. The company managed to develop a system that achieves a 6.9 percent word error rate. Despite the success, IBM continued to work hard on its speech recognition technology and has recently achieved a new industry record of 5.5 percent.
In an official blog post, the company said that the word error rate was measured with the help of recorded conversations between people discussing usual everyday topics like buying a car. These recordings, which are known as the “SWITCHBOARD” corpus, have been used in the industry to benchmark speech recognition systems for more than 20 years. To achieve the new milestone, IBM combined LSTM (Long Short Term Memory) and WaveNet language models with three strong acoustic models.
Companies like IBM and others have always had a goal of reaching human parity with their speech recognition technology. Some have stated in the past that achieving a word error rate of 5.9 percent would get the job done, but IBM is not so sure about this. The company said it believes that human parity is a bit lower than originally thought and has not yet been achieved — 5.1 percent.
What this means is that IBM, as well as other companies working in the field of speech recognition, still have more work to do before the technology reaches human parity. We hope that they will be able to make additional progress soon, as achieving a lower word error rate means that digital assistants like Amazon’s Alexa and others will be able to understand your voice commands a lot better.