Skip to content

Princccee/asr-indic-server

 
 

Repository files navigation

ASR Indic Server

Overview

Automatic Speech Recognition (ASR) for Indian languages using IndicConformer models. The default model is set to Kannada ASR.

Demo Video

Watch a quick demo of our project in action! Click the image below to view the video on YouTube.

Watch the video

Table of Contents

Supported Languages

22 Indian languages are supported, thanks to AIBharat organisation

Language Code
Assamese as
Bengali bn
Bodo brx
Dogri doi
Gujarati gu
Hindi hi
Kannada kn
Kashmiri ks
Konkani kok
Maithili mai
Malayalam ml
Manipuri mni
Marathi mr
Nepali ne
Odia or
Punjabi pa
Sanskrit sa
Santali sat
Sindhi sd
Tamil ta
Telugu te
Urdu ur

Live Server

We have hosted an Automatic Speech Recognition (ASR) service that can be used to verify the accuracy of audio transcriptions.

High Latency, Slow System (Available 24/7)

How to Use the Service

  1. With curl

You can test the service using curl commands. Below are examples for both service modes:

CPU / Available 24/7 - Free, Slow

curl -X 'POST' \
  'https://gaganyatri-asr-indic-server-cpu.hf.space/transcribe/?language=kannada' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@samples/kannada_sample_2.wav;type=audio/x-wav'
  1. Via Swagger UI

Notes

  • Ensure that the audio file path (samples/kannada_sample_2.wav) is correct and accessible.
  • The language parameter in the URL specifies the language of the audio file. In the examples above, it is set to kannada.
  • The service expects the audio file to be in WAV format.

Getting Started - Development

For Production (Docker)

  • Prerequisites: Docker and Docker Compose
  • Steps:
    1. Start the server: For GPU
    docker compose -f compose.yaml up -d
    For CPU only
    docker compose -f cpu-compose.yaml up -d
    1. Update source and target languages: Modify the compose.yaml file to set the desired language. Example configurations:
    • Kannada:
    language: kn
    • Hindi:
    language: hi

For Development (Local)

  • Prerequisites: Python 3.6+
  • Steps:
    1. Create a virtual environment:
    python -m venv venv
    1. Activate the virtual environment:
    source venv/bin/activate
    On Windows, use:
    venv\Scripts\activate
    1. Install dependencies:
    • For GPU
      pip install -r requirements.txt
    • For CPU only
      pip install -r cpu-requirements.txt
      

Downloading Translation Models

Models can be downloaded from AI4Bharat's HuggingFace repository:

Kannada

huggingface-cli download ai4bharat/indicconformer_stt_kn_hybrid_rnnt_large

Other Languages

Malayalam

huggingface-cli download ai4bharat/indicconformer_stt_ml_hybrid_rnnt_large

Hindi

huggingface-cli download ai4bharat/indicconformer_stt_hi_hybrid_rnnt_large

Running with FastAPI Server

Run the server using FastAPI with the desired language (e.g., Kannada):

  • for GPU
    python src/asr_api.py --port 7860 --language kn --host 0.0.0.0 --device gpu
  • for CPU only
    python src/asr_api.py --port 7860 --language kn --host 0.0.0.0 --device cpu

Evaluating Results

You can evaluate the ASR transcription results using curl commands. Below are examples for Kannada audio samples. Note: GitHub doesn’t support audio playback in READMEs. Download the sample audio files and test them locally with the provided curl commands to verify transcription results.

Kannada Transcription Examples

Sample 1: kannada_sample_1.wav

curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_1.wav;type=audio/x-wav'
  • Expected Output: ಕರ್ನಾಟಕದ ರಾಜಧಾನಿ ಯಾವುದು Translation: "What is the capital of Karnataka"

Sample 2: kannada_sample_2.wav

curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_2.wav;type=audio/x-wav'
  • Expected Output: ಬೆಂಗಳೂರು ಕರ್ನಾಟಕ ರಾಜ್ಯದ ರಾಜಧಾನಿ ಆಗಿದೆ ಕರ್ನಾಟಕದಲ್ಲಿ ನಾವು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇವೆ

Sample 3 - Song - 4 minutes

curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_3.wav;type=audio/x-wav'

Sample 4 - Song - 6.4 minutes

curl -X 'POST' \
'http://localhost:7860/transcribe/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@samples/kannada_sample_4.wav;type=audio/x-wav'

Note: The ASR does not provide sentence breaks or punctuation (e.g., question marks). We plan to integrate an LLM parser for improved context in future updates.

Batch Transcription Examples

Transcribe Batch Endpoint

The /transcribe_batch endpoint allows you to transcribe multiple audio files in a single request. This is useful for batch processing of audio files.

  • Command:
curl -X 'POST' \
'http://localhost:7860/transcribe_batch/' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'files=@samples/kannada_sample_1.wav;type=audio/x-wav' \
-F 'files=@samples/kannada_sample_2.wav;type=audio/x-wav'
  • Expected Output:
{
  "transcriptions": [
    "ಕರ್ನಾಟಕದ ರಾಜಧಾನಿ ಯಾವುದು",
    "ಬೆಂಗಳೂರು ಕರ್ನಾಟಕ ರಾಜ್ಯದ ರಾಜಧಾನಿ ಆಗಿದೆ ಕರ್ನಾಟಕದಲ್ಲಿ ನಾವು ಕನ್ನಡ ಮಾತನಾಡುತ್ತೇವೆ"
  ]
}

Building Docker Image

Build the Docker image locally:

docker build -t slabstech/asr_indic_server -f Dockerfile .

Run the Docker Image

docker run --gpus all -it --rm -p 7860:7860 slabstech/asr_indic_server

Troubleshooting

  • Docker fails to start: Ensure Docker is running and the compose.yaml file is correctly formatted.
  • Transcription errors: Verify the audio file is in WAV format, mono, and sampled at 16kHz. Adjust using:
ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav -y
  • Model not found: Download the required models using the huggingface-cli download commands above.
  • Port conflicts: Ensure port 7860 is free when running the FastAPI server.

Contributing

We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.

Also you can join the discord group to collaborate

References

Additional Resources

Running Nemo Model

  1. Download the Nemo model:
wget https://objectstore.e2enetworks.net/indic-asr-public/indicConformer/ai4b_indicConformer_kn.nemo -O kannada.nemo
  1. Adjust the audio:
ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav -y
  1. Run the program:
python nemo_asr.py

Running with Transformers

python hf_asr.py
  • server-setup.sh - Use for container deployment on OlaKrutrim AI Pod

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.9%
  • Shell 3.6%
  • Dockerfile 2.5%