Mozilla Releases DeepSpeech 0.6 With Better Performance, Leaner Speech-To-Text Engine

Written by Michael Larabel in Mozilla on 8 December 2019 at 08:00 AM EST. 26 Comments
MOZILLA
One of the side projects Mozilla continues to develop is DeepSpeech, a speech-to-text engine derived from research by Baidu and built atop TensorFlow with both CPU and NVIDIA CUDA acceleration. This week marked the release of Mozilla DeepSpeech 0.6 with performance optimizations, Windows builds, lightening up the language models, and other changes.

DeepSpeech 0.6 currently achieved a 7.5% word error rate for this open-source speech-to-text engine. The new release has various API changes, better training performance with TensorFlow 1.14 cuDNN RNN support for their training graph, trimmed down their language model to be using the top 500k words, adding various data augmentation techniques, a tool for bulk transcribing large audio files, and various other changes.

Those wanting to try DeepSpeech 0.6 for transcribing speech from audio files can grab the binary builds from GitHub. Over on the Mozilla Hacks blog is also more details on the DeepSpeech 0.6 improvements. On my TODO list as well is seeing if DeepSpeech 0.6 could work out well as another Phoronix Test Suite benchmark. (Update: DeepSpeech is now available as a PTS / OpenBenchmarking.org test profile.)
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week