Speech Recognition using Sphinx : Don’t Try This At Home
As part of my ongoing research I needed a quick and easy way to recognize speech. After seeing how effortless products like Siri are at recognition, I naively thought that the technology has been developing nicely, and I was a few short clicks away from glorious, well-supported recognition with moderate accuracy. The reality of the situation was not quite this. Carnegie Mellon current puts out the best open-source speech recognition toolkit, CMUSphinx. It’s great, but poorly documented for the beginning. When I visited the page I had one task I wanted to accomplish: Recognize arbitrary English quickly, preferably from within a language like Python. While this is certainly possible with Sphinx, it’s not intuitive.
So many options.. which one to choose?
By far the hardest aspect of using Sphinx is installing it. It seems the authors, in an effort to cut down on support requests, have actively tried to make it unintuitive.
We must install Sphinx, but which one? On the downloads page the maintainer helpfully points out that it’s tough to know which package to install, we have a good half-dozen available to us, from SphinxBase, to Sphinx1-3, written in C, to Sphinx4, which has been rewritten in Java, to PocketSphinx, which seems as if it’s designed for a mobile platform.
Which one of these to install is not obvious. At first Sphinx4 seems like the obvious choice, but because it’s written in Java, and relatively new it has no language bindings for Python, and seems very beta-ish.
Looking back, Sphinx3 was written in C, and seems decent, so I tried that next. No dice. It’s a mess, reading along there’s a blurb hidden in a wiki page somewhere noting that it’s for research use only.
Finally, after reading an obscure forum post somewhere it was mentioned that PocketSphinx is actually intended for desktop usage too, and has Python bindings! This makes a lot of sense. After face-palming myself for missing the connection, made obvious by the title, I decided PocketSphinx was the application I needed to install!
Luckily for us, Ubuntu has packages available. Pulling out my apt-get shotgun, a quick command installed everything I needed (and more).
sudo apt-get install sphinx*
Actually Doing Recognition
After installing things, life started looking up. Throwing together a quick Python script, using the documentation found here, buried in the CMUSphinx labyrinth actually wasn’t too difficult.
You’ll need a test audio file. Raw 16-bit audio, formatted as a binary stream of unsigned integers works really well. A freely available utility called sox comes with Ubuntu and will help you convert almost anything into raw audio. I’d also suggest looking into Python Audio Tools for on the fly conversions, however don’t try to use PCMConverter, it’s a pile of garbage.
Just open up a raw binary audio file, and invoke the decoder:
import audiotools as at
hmmd = '/usr/share/pocketsphinx/model/hmm/wsj1'
lmd = '/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP'
dictd = '/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic'
fRaw1 = open('tmp1.raw', 'r')
speechRec = ps.Decoder(hmm = hmmd, lm = lmd, dict = dictd)
speechRec.decode_raw(fRaw1)
result = speechRec.get_hyp()
print result[0]
hmmd,lmd, and dictd are files used by the Decoder to give it the sense of the language necessary to decode words. By default PocketSphinx comes with a corpus of general text that works alright. If you’re using Sphinx for domain-specific work I’d highly recommend creating your own dictionary with a limited number of words, you’ll achieve much greater accuracy that way.
And we’re done!
So hopefully by now if you followed these steps loosely you’ll have a working speech recognizer. Playing around with my own voice, I’ve found the accuracy to be alright, but not great. Training it to your voice apparently yields better results. From what I’ve read commercial recognizers are using slightly more advanced algorithms than what Sphinx currently uses, and more community time is needed to bring open-source recognition up to speed with something like Siri.
Getting Python audiotools via http://sourceforge.net/projects/audiotools/files/latest/download?source=files may help too
them download it, extract it
cd audiotools*\
sudo make install
cd ~
python
umm NameError: name ‘ps’ is not defined
Andrew, you’re missing something here.
Luke Stanley
3 Jul 12 at 9:38 pm
Ok I give up. How did you get this to work, given that the FAQ mentions that it won’t build with Python unless you either
* fix your python installation, or
* build it without Python
Seems you have a solution to a problem the rest of us have struggled with. C’mon out with it.
RJM
9 Nov 12 at 10:18 am
Help regarding the project topic.
I am new user to cmusphinx.I am interested in building an application or improving an exixsting system feature in cmusphinx as an final year institutional project, which is of 4months duration. we are a group of 3 person. I want an idea on what can be done in this domain. basically my domain is NLP and AI.
thanks in advance.
Neha
5 Dec 12 at 8:31 am
[...] Speech Recognition using Sphinx : Don’t Try This At Home | Moving Forward – December 29th ( tags: sphinx speech recognition howto tutorial guide ) [...]
Delicious Bookmarks for December 29th through December 30th « Lâmôlabs
30 Dec 12 at 3:01 pm
I think you’re missing the following:-
import pocketsphinx as ps
then the ps.Decoder() will work
Also, I converted my input file into a 16 bit wav sampled at 16kHz with audacity.
My error level is still high but I’m working on it.
Andrew Prayle
28 Jan 13 at 1:33 pm
So what do people need to do and learn to get the right product?
arm cuff blood pressure monitor
15 Feb 13 at 2:50 am