Moving Forward

Homepage of Andrew Robinson

Archive for February, 2012

Speech Recognition using Sphinx : Don’t Try This At Home

without comments

As part of my ongoing research I needed a quick and easy way to recognize speech. After seeing how effortless products like Siri are at recognition, I naively thought that the technology has been developing nicely, and I was a few short clicks away from glorious, well-supported recognition with moderate accuracy. The reality of the situation was not quite this. Carnegie Mellon current puts out the best open-source speech recognition toolkit, CMUSphinx. It’s great, but poorly documented for the beginning. When I visited the page I had one task I wanted to accomplish: Recognize arbitrary English quickly, preferably from within a language like Python. While this is certainly possible with Sphinx, it’s not intuitive.

So many options.. which one to choose?

By far the hardest aspect of using Sphinx is installing it. It seems the authors, in an effort to cut down on support requests, have actively tried to make it unintuitive.

We must install Sphinx, but which one? On the downloads page the maintainer helpfully points out that it’s tough to know which package to install, we have a good half-dozen available to us, from SphinxBase, to Sphinx1-3, written in C, to Sphinx4, which has been rewritten in Java, to PocketSphinx, which seems as if it’s designed for a mobile platform.

Which one of these to install is not obvious. At first Sphinx4 seems like the obvious choice, but because it’s written in Java, and relatively new it has no language bindings for Python, and seems very beta-ish.

Looking back, Sphinx3 was written in C, and seems decent, so I tried that next. No dice. It’s a mess, reading along there’s a blurb hidden in a wiki page somewhere noting that it’s for research use only.

Finally, after reading an obscure forum post somewhere it was mentioned that PocketSphinx is actually intended for desktop usage too, and has Python bindings! This makes a lot of sense. After face-palming myself for missing the connection, made obvious by the title, I decided PocketSphinx was the application I needed to install!

Luckily for us, Ubuntu has packages available. Pulling out my apt-get shotgun, a quick command installed everything I needed (and more).

sudo apt-get install sphinx*

Actually Doing Recognition

After installing things, life started looking up. Throwing together a quick Python script, using the documentation found here, buried in the CMUSphinx labyrinth actually wasn’t too difficult.

You’ll need a test audio file. Raw 16-bit audio, formatted as a binary stream of unsigned integers works really well. A freely available utility called sox comes with Ubuntu and will help you convert almost anything into raw audio. I’d also suggest looking into Python Audio Tools for on the fly conversions, however don’t try to use PCMConverter, it’s a pile of garbage.

Just open up a raw binary audio file, and invoke the decoder:

import audiotools as at

hmmd = '/usr/share/pocketsphinx/model/hmm/wsj1'
lmd = '/usr/share/pocketsphinx/model/lm/wsj/wlist5o.3e-7.vp.tg.lm.DMP'
dictd = '/usr/share/pocketsphinx/model/lm/wsj/wlist5o.dic'

fRaw1 = open('tmp1.raw', 'r')

speechRec = ps.Decoder(hmm = hmmd, lm = lmd, dict = dictd)

speechRec.decode_raw(fRaw1)
result = speechRec.get_hyp()

print result[0]

hmmd,lmd, and dictd are files used by the Decoder to give it the sense of the language necessary to decode words. By default PocketSphinx comes with a corpus of general text that works alright. If you’re using Sphinx for domain-specific work I’d highly recommend creating your own dictionary with a limited number of words, you’ll achieve much greater accuracy that way.

And we’re done!

So hopefully by now if you followed these steps loosely you’ll have a working speech recognizer. Playing around with my own voice, I’ve found the accuracy to be alright, but not great. Training it to your voice apparently yields better results. From what I’ve read commercial recognizers are using slightly more advanced algorithms than what Sphinx currently uses, and more community time is needed to bring open-source recognition up to speed with something like Siri.

Written by Andrew Robinson

February 29th, 2012 at 2:47 am

Posted in Uncategorized

Reading AAC Encoded Audio in Python

without comments

Using the freely available Python Audio Tools decoding an audio stream is pretty simple. Their site doesn’t have a solid example of using the APIs available, so I’ve written a short demonstration to decode a AAC-encoded audio file. The file I’m decoding was generated using the voice recording application on an iPod touch, although this should work for almost any audio file supported by Python Audio Tools.

import audiotools as at
print 'Opening input data auio stream... decoding'
#Create a AudioFile object out of an input file
aF = at.open('inputData.m4a')
pcmAf = aF.to_pcm()

# We'll store the data in a list, although this algorithm is suitable for
# passing the data to a second stage for online processing.
rawData = []

while True:
    # This file is setup with 2 audio channels, sampled at 44.1kHz
    # we'll read 256 bytes of raw data at a time,
    # or 256 / 2 channels / 2 bytes per sample = 64 frames
    # Since our data is only mono, we discard one of the channels.
    frame = pcmAf.read(256)
    for i in range(0, frame.frames):
        byteArray = frame.channel(0).frame(i).to_bytes(True, True)
        pcmVal = struct.unpack('h', byteArray)
        rawData.append(pcmVal)
    if frame.frames < 64:
        # Smaller frame numbers indicate the end of file has
        # been reached.
        print 'End of file found. Breaking.'
        break

Make sure you’ve installed Python Audio Tools first, it’s freely available from the project page. So far I’ve only had success using these libraries in Linux, Mac OS support seems doable but would take some effort to properly do.

Written by Andrew Robinson

February 27th, 2012 at 1:29 am

Posted in Uncategorized

Generating SVN Statistics

without comments

Recently I became very interested in generating some statistics from a SVN repo. In our research group we have a repository for all the currently in progress papers, which are written in LaTeX, and doing some rudimentary reporting on the number of committed lines by author sounded like a fun way to gamify the process of writing. You can see below one of the highlights of this reporting. As would be expected by a graduate student research lab, a large number of commits happen late in the night, with a large void during working business hours.

I found a great tool to generate some statistics from SVN repos, appropriately called StatSVN. It’s decent out of the box, but lacked some customizability, and automation.

The way it works by default is you invoke it as shown below, and it uses a generated output file from SVN, along with the path to a checked out local repo, to generate a pile of HTML reports and figures tallying various commit statistics. It automatically invokes subversion, and requests the diffs between commits, storing data in a local cache file.

java -jar statsvn.jar papers/logfile.log papers -include "**/*.tex" -config-file config.txt

This works pretty well, but to really create some fun statistics we need to work a little harder. I wanted to filter out some of the larger bulk-commits that don’t accurately reflect actual work, and I wanted to customize the generated report. Naturally I fired up vim and started writing some Python…

Filtering Out Certain Revisions

The first problem was that this repository is pretty new, and a lot of the first commits involved setting up templates and doing other administrative tasks. I want to collect statistics on who produced the most content, not who can push the metaphorical broom hardest in cleaning up templates and moving directories around, so I needed a method to filter out certain commits. The way StatSVN works is by first parsing an exported svn log file, containing a list of commits. What I found is that by simply removing the associated log entry for a commit StatSVN will simply ignore it.

A Sample Log Entry from the SVN Log

<logentry revision="172">
<author>androbin</author>
<date>2012-02-15T19:06:10.225746Z</date>
<paths>
<path kind="file" action="M">/papers/mobicom12-audio/tex/design.tex</path>
</paths>
<msg>Fixed broken paper by updating design.tex</msg>
</logentry>

Python Code to Perform an Update and Generate the Log

print 'Updating SVN repo'
os.system('cd papers; svn up')

print 'Running XML export from SVN repo'
os.system('cd papers; svn log -v --xml > logfile.log')

Before removing it, we update the repository, which I’ve checked out into a directory called papers/, and generate a fresh log file. Next using lxml we load the log file, and an exclude list, and perform the deletion.

Removing Revisions from Statistics based on Number

listToExclude = []
with open('exclude-list.txt', 'r') as f:
    listToExclude = map(lambda x: x.strip(), f.readlines())

print 'Exclude list: ' ,
print listToExclude 

doc = le.parse('papers/logfile.log')
elementsToRemove = []
for pat in listToExclude:
    for elt in doc.findall('logentry[@revision=\'' + pat + '\']'):
        print 'Removing element...'
        elt.getparent().remove(elt)

print 'Writing fille back to disk...'
with open('papers/logfile.log', 'w') as f:
    f.write(le.tostring(doc))

exclude-list.txt simply consists of revision numbers, separate by newlines.

After we’ve modified the logfile we invoke the statistics generation program manually.

Invoking StatSVN

print 'Invoking graph generation software...'
os.system('java -jar statsvn.jar papers/logfile.log papers -include "**/*.tex" -config-file config.txt')

Of interest here is the fact that we’ve passed it a configuration file. I’ve identified three key graphs I’d like to include in my final repo, and resizing them to appropriately fit in the spaces I’ve allocated for them is a little challenging, so I’ve used StatSVN’s ability to specify a config file to resize them and pump up the plot lineStroke to be a little more readable.

StatSVN Config File

chart.loc_per_author.lineStroke=4
chart.loc_per_author.width=600
chart.loc_per_author.height=300

chart.activity_time.width=600
chart.activity_day.width=600
chart.activity_time.height=370
chart.activity_day.height=408

Making an Aggregate Report

So now I’ve filtered out all the commits I don’t care about, but I’m not that happy with the default reports. My goal is to load these stats on a display-case monitor, and none of the default reports are attractive enough, or contain the right information, to make the cut. The approach I decided to take here was to use BeautifulSoup to extract the information I wanted from each of the reports, and then composite it into one report using a template file. This works really well in practice, since the report software’s format won’t change BeautifulSoup has no problems selecting the elements of interest.

HTML Template for the Final Report

<html>
<head>
<title>Group Dangerzone Paper Log</title>
<link rel="stylesheet" href="ocss.css" type="text/css">
</head>

<body>
<h1>Dangerzone Paper Commit Log</h1>
<table width="100%">
<tr>
    <td valign="top" width="70%">
    <table width="100%">
        <tr>
            <td valign="top">[A]</td>
            <td align="right"><img src="loc_per_author.png" /></td>
        </tr>
    </table>
    <br><br><br>
    <table width="100%">
        <tr>
            <td valign="top"><img src="activity_time.png" /></td>
            <td><img src="activity_day.png" /></td>
        </tr>
    </table>
    <h2>Commit Message Tag Cloud</h2>
    [T]
    </td>
    <td>
        [C]
    </td>
</tr>
</table>
</body>
</html>

In the template shown above we use placeholders [C], [T], and [A] for the commit log, tag cloud, and list of author contribution by percentage respectively. Below the python script will extract those elements from the generated reports, and push them into the template, before writing it to output.html.

Making a Pretty Report


print 'Generating output HTML...'

def getSoup(fileName):
    with open(filename, 'r') as f:
        return BeautifulSoup(f.read())

template = ''
with open('template.html', 'r') as f:
    template = f.read()

developers = getSoup('developers.html')
index = getSoup('index.html')
clog = getSoup('commitlog.html')

authorTable = developers.html.body.table
template = template.replace('[A]', str(authorTable))

tagCloud = index.html.body.findAll('div')[2].p
template = template.replace('[T]', str(tagCloud))

commitList = clog.html.body.findAll('dl')[1]
for i in range(24,len(commitList.contents)):
    commitList.contents[len(commitList.contents) - 1].extract()
template = template.replace('[C]', str(commitList))

with open('output.html', 'w') as f:
    f.write(template)

The End Result

This whole script is saved in a file, set to run with a cron job every half-hour, and a line is added to the template file to cause the browser to refresh the page every so often. The finished product is shown below.

Written by Andrew Robinson

February 16th, 2012 at 6:10 am

Posted in Uncategorized