Moving Forward

Homepage of Andrew Robinson

Speech Recognition using Sphinx : Don’t Try This At Home

with 6 comments

As part of my ongoing research I needed a quick and easy way to recognize speech. After seeing how effortless products like Siri are at recognition, I naively thought that the technology has been developing nicely, and I was a few short clicks away from glorious, well-supported recognition with moderate accuracy. The reality of the situation was not quite this. Carnegie Mellon current puts out the best open-source speech recognition toolkit, CMUSphinx. It’s great, but poorly documented for the beginning. When I visited the page I had one task I wanted to accomplish: Recognize arbitrary English quickly, preferably from within a language like Python. While this is certainly possible with Sphinx, it’s not intuitive.

So many options.. which one to choose?

By far the hardest aspect of using Sphinx is installing it. It seems the authors, in an effort to cut down on support requests, have actively tried to make it unintuitive.

We must install Sphinx, but which one? On the downloads page the maintainer helpfully points out that it’s tough to know which package to install, we have a good half-dozen available to us, from SphinxBase, to Sphinx1-3, written in C, to Sphinx4, which has been rewritten in Java, to PocketSphinx, which seems as if it’s designed for a mobile platform.

Which one of these to install is not obvious. At first Sphinx4 seems like the obvious choice, but because it’s written in Java, and relatively new it has no language bindings for Python, and seems very beta-ish.

Looking back, Sphinx3 was written in C, and seems decent, so I tried that next. No dice. It’s a mess, reading along there’s a blurb hidden in a wiki page somewhere noting that it’s for research use only.

Finally, after reading an obscure forum post somewhere it was mentioned that PocketSphinx is actually intended for desktop usage too, and has Python bindings! This makes a lot of sense. After face-palming myself for missing the connection, made obvious by the title, I decided PocketSphinx was the application I needed to install!

Luckily for us, Ubuntu has packages available. Pulling out my apt-get shotgun, a quick command installed everything I needed (and more).

sudo apt-get install sphinx*

Actually Doing Recognition

After installing things, life started looking up. Throwing together a quick Python script, using the documentation found here, buried in the CMUSphinx labyrinth actually wasn’t too difficult.

You’ll need a test audio file. Raw 16-bit audio, formatted as a binary stream of unsigned integers works really well. A freely available utility called sox comes with Ubuntu and will help you convert almost anything into raw audio. I’d also suggest looking into Python Audio Tools for on the fly conversions, however don’t try to use PCMConverter, it’s a pile of garbage.

Just open up a raw binary audio file, and invoke the decoder:

import audiotools as at

hmmd = '/usr/share/pocketsphinx/model/hmm/wsj1'
lmd = '/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP'
dictd = '/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic'

fRaw1 = open('tmp1.raw', 'r')

speechRec = ps.Decoder(hmm = hmmd, lm = lmd, dict = dictd)

speechRec.decode_raw(fRaw1)
result = speechRec.get_hyp()

print result[0]

hmmd,lmd, and dictd are files used by the Decoder to give it the sense of the language necessary to decode words. By default PocketSphinx comes with a corpus of general text that works alright. If you’re using Sphinx for domain-specific work I’d highly recommend creating your own dictionary with a limited number of words, you’ll achieve much greater accuracy that way.

And we’re done!

So hopefully by now if you followed these steps loosely you’ll have a working speech recognizer. Playing around with my own voice, I’ve found the accuracy to be alright, but not great. Training it to your voice apparently yields better results. From what I’ve read commercial recognizers are using slightly more advanced algorithms than what Sphinx currently uses, and more community time is needed to bring open-source recognition up to speed with something like Siri.

Written by Andrew Robinson

February 29th, 2012 at 2:47 am

Posted in Uncategorized

6 Responses to 'Speech Recognition using Sphinx : Don’t Try This At Home'

Subscribe to comments with RSS or TrackBack to 'Speech Recognition using Sphinx : Don’t Try This At Home'.

  1. Getting Python audiotools via http://sourceforge.net/projects/audiotools/files/latest/download?source=files may help too

    them download it, extract it
    cd audiotools*\
    sudo make install
    cd ~
    python

    umm NameError: name ‘ps’ is not defined

    Andrew, you’re missing something here.

    Luke Stanley

    3 Jul 12 at 9:38 pm

  2. Ok I give up. How did you get this to work, given that the FAQ mentions that it won’t build with Python unless you either

    * fix your python installation, or
    * build it without Python

    Seems you have a solution to a problem the rest of us have struggled with. C’mon out with it.

    RJM

    9 Nov 12 at 10:18 am

  3. Help regarding the project topic.
    I am new user to cmusphinx.I am interested in building an application or improving an exixsting system feature in cmusphinx as an final year institutional project, which is of 4months duration. we are a group of 3 person. I want an idea on what can be done in this domain. basically my domain is NLP and AI.
    thanks in advance.

    Neha

    5 Dec 12 at 8:31 am

  4. [...] Speech Recognition using Sphinx : Don’t Try This At Home | Moving Forward – December 29th ( tags: sphinx speech recognition howto tutorial guide ) [...]

  5. I think you’re missing the following:-

    import pocketsphinx as ps

    then the ps.Decoder() will work

    Also, I converted my input file into a 16 bit wav sampled at 16kHz with audacity.
    My error level is still high but I’m working on it.

    Andrew Prayle

    28 Jan 13 at 1:33 pm

  6. So what do people need to do and learn to get the right product?

Leave a Reply