This repository is related to automatic speech recognition (ASR).
The repository includes the following .ipynb
files:
This notebook outlines the primary goals and objectives of the analysis. It includes instructions on how to download a video, extract audio, convert a text transcript to an SRT transcript, and describes the main tools and libraries used for transcription.
Open source project: whisper.cpp (based on OpenAI Whisper)
This notebook describes the results of using whisper.cpp, which is based on OpenAI Whisper.
Open source project: SeamlessM4T
This notebook describes the results of using the SeamlessM4T.
Open source project: faster-whisper (based on OpenAI Whisper)
This notebook describes the results of using faster-whisper, which is based on OpenAI Whisper.
This notebook describes how to use Llama on Groq to get summary.
This notebook describes how to get summary using different GPT models over API and compare the results.
Additionally, there are a few folders:
- The
data
folder contains the transcripts and summaries. MP3, WAV, and MP4 files are excluded due to their significant size, but they can be extracted as described in the.ipynb
files. - The
utils
folder contains several Python files that are excluded from the.ipynb
files to avoid overloading them with code. Links to these files are included in the.ipynb
files.