Automatic Speech Recognition (ASR) for Indian languages using IndicConformer models. The default model is set to Kannada ASR.
Watch a quick demo of our project in action! Click the image below to view the video on YouTube.
- Supported Languages
- Live Server
- Getting Started
- Downloading Translation Models
- Running with FastAPI Server
- Live Server
- Evaluating Results
- Building Docker Image
- Troubleshooting
- References
- Additional Resources
22 Indian languages are supported, thanks to AIBharat organisation
Language | Code |
---|---|
Assamese | as |
Bengali | bn |
Bodo | brx |
Dogri | doi |
Gujarati | gu |
Hindi | hi |
Kannada | kn |
Kashmiri | ks |
Konkani | kok |
Maithili | mai |
Malayalam | ml |
Manipuri | mni |
Marathi | mr |
Nepali | ne |
Odia | or |
Punjabi | pa |
Sanskrit | sa |
Santali | sat |
Sindhi | sd |
Tamil | ta |
Telugu | te |
Urdu | ur |
We have hosted an Automatic Speech Recognition (ASR) service that can be used to verify the accuracy of audio transcriptions.
- With curl
You can test the service using curl
commands. Below are examples for both service modes:
curl -X 'POST' \
'https://gaganyatri-asr-indic-server-cpu.hf.space/transcribe/?language=kannada' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_2.wav;type=audio/x-wav'
- Via Swagger UI
- Ensure that the audio file path (
samples/kannada_sample_2.wav
) is correct and accessible. - The
language
parameter in the URL specifies the language of the audio file. In the examples above, it is set tokannada
. - The service expects the audio file to be in WAV format.
- Prerequisites: Docker and Docker Compose
- Steps:
- Start the server: For GPU
For CPU onlydocker compose -f compose.yaml up -d
docker compose -f cpu-compose.yaml up -d
- Update source and target languages:
Modify the
compose.yaml
file to set the desired language. Example configurations:
- Kannada:
language: kn
- Hindi:
language: hi
- Prerequisites: Python 3.6+
- Steps:
- Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
On Windows, use:source venv/bin/activate
venv\Scripts\activate
- Install dependencies:
- For GPU
pip install -r requirements.txt
- For CPU only
pip install -r cpu-requirements.txt
Models can be downloaded from AI4Bharat's HuggingFace repository:
huggingface-cli download ai4bharat/indicconformer_stt_kn_hybrid_rnnt_large
huggingface-cli download ai4bharat/indicconformer_stt_ml_hybrid_rnnt_large
huggingface-cli download ai4bharat/indicconformer_stt_hi_hybrid_rnnt_large
Run the server using FastAPI with the desired language (e.g., Kannada):
- for GPU
python src/asr_api.py --port 7860 --language kn --host 0.0.0.0 --device gpu
- for CPU only
python src/asr_api.py --port 7860 --language kn --host 0.0.0.0 --device cpu
You can evaluate the ASR transcription results using curl
commands. Below are examples for Kannada audio samples.
Note: GitHub doesn’t support audio playback in READMEs. Download the sample audio files and test them locally with the provided curl
commands to verify transcription results.
- Audio File: samples/kannada_sample_1.wav
- Command:
curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_1.wav;type=audio/x-wav'
- Expected Output:
ಕರ್ನಾಟಕದ ರಾಜಧಾನಿ ಯಾವುದು
Translation: "What is the capital of Karnataka"
- Audio File: samples/kannada_sample_2.wav
- Command:
curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_2.wav;type=audio/x-wav'
- Expected Output:
ಬೆಂಗಳೂರು ಕರ್ನಾಟಕ ರಾಜ್ಯದ ರಾಜಧಾನಿ ಆಗಿದೆ ಕರ್ನಾಟಕದಲ್ಲಿ ನಾವು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇವೆ
- YT Video- Navaduva Nudiye
- Audio File: samples/kannada_sample_3.wav
- Command:
curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_3.wav;type=audio/x-wav'
- Expected Output: kannada_sample_3_out.md
- YT Video- Aagadu Yendu
- Audio File: samples/kannada_sample_4.wav
- Command:
curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_4.wav;type=audio/x-wav'
- Expected Output: kannada_sample_4_out.md
Note: The ASR does not provide sentence breaks or punctuation (e.g., question marks). We plan to integrate an LLM parser for improved context in future updates.
The /transcribe_batch
endpoint allows you to transcribe multiple audio files in a single request. This is useful for batch processing of audio files.
- Command:
curl -X 'POST' \
'http://localhost:7860/transcribe_batch/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'files=@samples/kannada_sample_1.wav;type=audio/x-wav' \
-F 'files=@samples/kannada_sample_2.wav;type=audio/x-wav'
- Expected Output:
{
"transcriptions": [
"ಕರ್ನಾಟಕದ ರಾಜಧಾನಿ ಯಾವುದು",
"ಬೆಂಗಳೂರು ಕರ್ನಾಟಕ ರಾಜ್ಯದ ರಾಜಧಾನಿ ಆಗಿದೆ ಕರ್ನಾಟಕದಲ್ಲಿ ನಾವು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇವೆ"
]
}
Build the Docker image locally:
docker build -t slabstech/asr_indic_server -f Dockerfile .
docker run --gpus all -it --rm -p 7860:7860 slabstech/asr_indic_server
- Docker fails to start: Ensure Docker is running and the
compose.yaml
file is correctly formatted. - Transcription errors: Verify the audio file is in WAV format, mono, and sampled at 16kHz. Adjust using:
ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav -y
- Model not found: Download the required models using the
huggingface-cli download
commands above. - Port conflicts: Ensure port 7860 is free when running the FastAPI server.
We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.
Also you can join the discord group to collaborate
- AI4Bharat IndicConformerASR GitHub Repository
- Nemo - AI4Bharat
- IndicConformer Collection on HuggingFace
- Download the Nemo model:
wget https://objectstore.e2enetworks.net/indic-asr-public/indicConformer/ai4b_indicConformer_kn.nemo -O kannada.nemo
- Adjust the audio:
ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav -y
- Run the program:
python nemo_asr.py
python hf_asr.py
- server-setup.sh - Use for container deployment on OlaKrutrim AI Pod