-
Notifications
You must be signed in to change notification settings - Fork 263
Sylvain Chevalier edited this page Jun 25, 2013
·
3 revisions
This page summarizes frequent questions about pocketsphinx.js.
You are probably hitting a Chrome bug that is quite common. Try updating to the latest version on the Canary channel. You can also try this sample app to see if some audio is recorded:http://chromium.googlecode.com/svn/trunk/samples/audio/visualizer-live.html
There are many reasons:
- The public C API of PocketSphinx is pretty large and some actions like initialization and adding grammars require many steps that involve passing and receiving pointers. That is why we have added another layer which makes it easy to initialize the recognizer and pass and get data from it with limited interaction through pointers.
- We have used C++ to implement that layer above the original PocketSphinx API, which allows us to use convenient C++ features such as containers and strings.
- Emscripten does not work well with C++ interfaces (we have not had a lot of luck with embind), that's why we have a very thin C layer which is basically the API accessible to JavaScript.
- Interaction with Emscripten-generated JavaScript API requires the use of
Module.cwrap('...')
and other which we want to hide. - Interaction with C API limits the complexity of the data that can be passed. That's why, for instance, we have convenience JavaScript functions to add Grammars in one call rather than starting a new grammar, adding individual transitions and ending the grammar.
- The JavaScript generated from PocketSphinx is fairly large and loading it directly in the HTML gives a quite unpleasant experience as it blocks the UI thread. That's why we have wrapped it inside a Web Worker.
This is a wide question, and to start with, you should have reasonable expectations based on other experiences with open-source speech recognizers. A few areas where you could look at to improve accuracy are:
- Make sure your grammar is able to catch what you or your users actually say.
- If you have audio data from your expected users, you can try training your own acoustic model or adapt an existing one. You can also use another available acoustic model, PocketSphinx ships with a bunch of them.
Refer to Sphinx documents, what you will need is:
- A pronunciation dictionary for your language
- An acoustic model
- Grammars that use words from your dictionary.
You can find a lot of resources on the CMU Sphinx website, or on Voxforge for instance.