-
Notifications
You must be signed in to change notification settings - Fork 263
Sylvain Chevalier edited this page Jun 1, 2014
·
3 revisions
This page summarizes frequent questions about pocketsphinx.js.
With the new API based on embind
, there are not that many levels of wrapping anymore. Here are the reasons why we have these different levels:
- The public C API of PocketSphinx is pretty large and some actions like initialization and adding grammars require many steps that involve passing and receiving pointers. That is why we have added another layer which makes it easy to initialize the recognizer and pass and get data from it with limited interaction through pointers.
- We have used C++ to implement that layer above the original PocketSphinx API, which allows us to use convenient C++ features such as containers and strings. Using
embind
, that layer is directly accessible from JavaScript.. - The JavaScript generated from PocketSphinx is fairly large and loading it directly in the HTML gives a quite unpleasant experience as it blocks the UI thread. That's why we have wrapped it inside a Web Worker.
This is a wide question, and to start with, you should have reasonable expectations based on other experiences with open-source speech recognizers. A few areas where you could look at to improve accuracy are:
- Make sure your grammar is able to catch what you or your users actually say.
- If you have audio data from your expected users, you can try training your own acoustic model or adapt an existing one. You can also use another available acoustic model, PocketSphinx ships with a bunch of them.
Refer to Sphinx documents, what you will need is:
- A pronunciation dictionary for your language
- An acoustic model
- Grammars that use words from your dictionary.
You can find a lot of resources on the CMU Sphinx website, or on Voxforge for instance.