This is the final project for Georgia Tech's Deep Learning course (CS 7643). We aim to enhance Japanese Automatic Speech Recognition (ASR) by fine-tuning OpenAI’s Whisper model on Japanese-specific data, assessing whether it can outperform the monolingual ReazonSpeech models in accuracy.
- Install conda and poetry
- Create env
conda create -n "tokyo_whisperers" python=3.10
conda activate tokyo_whisperers
poetry run pip install cython ipykernel
poetry run pip install --no-use-pep517 youtokentome
poetry install
poetry run pip install sherpa-onnx==1.10.16+cuda -f https://k2-fsa.github.io/sherpa/onnx/cuda.html
cp .env.example .env
# fill in the values in .env
- Run the script
sh run.sh
- Clone the repository and navigate to its root directory:
- Build the Docker image:
docker build -t tokyo_whisperers .
- Run following command to start the contrainer, mounting the local volume
docker run -it \
--platform linux/amd64 \
--mount type=bind,src=$(pwd),dst=/app \
tokyo_whisperers bash
- In the container terminal, execute the script
sh run.sh
Follow these steps to set up the environment to run the code in google colab:
- Clone the repository's main branch
- Move to the directory:
!git clone https://github.com/ryujimorita/tokyo_whisperers.git
%cd tokyo_whisperers
!git checkout -b <new-branch-name>
- Install Poetry (with --pre command)
- Set the PATH environment variable for Poetry
- Install required dependencies:
!pip install -q --pre poetry
!export PATH="/root/.local/bin:$PATH"
!poetry run pip install cython setuptools wheel ipykernel
!poetry run pip install --no-use-pep517 youtokentome
!poetry install --no-root
!touch .env
!echo $'WANDB_API_KEY="xxx"\nWANDB_PROJECT="tokyo_whisperers"' >.env
!cat .env
Make sure to change --wandb_run_name
argument in run.sh
as well.
Run the provided shell script:
!sh run.sh