Skip to content
Sylvain Chevalier edited this page Jun 25, 2013 · 3 revisions

This page summarizes frequent questions about pocketsphinx.js.

It does not recognize anything

You are probably hitting a Chrome bug that is quite common. Try updating to the latest version on the Canary channel. You can also try this sample app to see if some audio is recorded:http://chromium.googlecode.com/svn/trunk/samples/audio/visualizer-live.html

Why are there so many levels of wrapping?

There are many reasons:

  1. The public C API of PocketSphinx is pretty large and some actions like initialization and adding grammars require many steps that involve passing and receiving pointers. That is why we have added another layer which makes it easy to initialize the recognizer and pass and get data from it with limited interaction through pointers.
  2. We have used C++ to implement that layer above the original PocketSphinx API, which allows us to use convenient C++ features such as containers and strings.
  3. Emscripten does not work well with C++ interfaces (we have not had a lot of luck with embind), that's why we have a very thin C layer which is basically the API accessible to JavaScript.
  4. Interaction with Emscripten-generated JavaScript API requires the use of Module.cwrap('...') and other which we want to hide.
  5. Interaction with C API limits the complexity of the data that can be passed. That's why, for instance, we have convenience JavaScript functions to add Grammars in one call rather than starting a new grammar, adding individual transitions and ending the grammar.
  6. The JavaScript generated from PocketSphinx is fairly large and loading it directly in the HTML gives a quite unpleasant experience as it blocks the UI thread. That's why we have wrapped it inside a Web Worker.

Recognition is not very accurate

This is a wide question, and to start with, you should have reasonable expectations based on other experiences with open-source speech recognizers. A few areas where you could look at to improve accuracy are:

  • Make sure your grammar is able to catch what you or your users actually say.
  • If you have audio data from your expected users, you can try training your own acoustic model or adapt an existing one. You can also use another available acoustic model, PocketSphinx ships with a bunch of them.

I would like to recognize another language than English

Refer to Sphinx documents, what you will need is:

  • A pronunciation dictionary for your language
  • An acoustic model
  • Grammars that use words from your dictionary.

You can find a lot of resources on the CMU Sphinx website, or on Voxforge for instance.

Clone this wiki locally