Skip to content

lexust1/av2txtsum

Repository files navigation

av2txtsum

This repository is related to automatic speech recognition (ASR).

Repository Structure

The repository includes the following .ipynb files:

This notebook outlines the primary goals and objectives of the analysis. It includes instructions on how to download a video, extract audio, convert a text transcript to an SRT transcript, and describes the main tools and libraries used for transcription.

Open source project: whisper.cpp (based on OpenAI Whisper)

This notebook describes the results of using whisper.cpp, which is based on OpenAI Whisper.

Open source project: SeamlessM4T

This notebook describes the results of using the SeamlessM4T.

Open source project: faster-whisper (based on OpenAI Whisper)

This notebook describes the results of using faster-whisper, which is based on OpenAI Whisper.

This notebook describes how to use Llama on Groq to get summary.

This notebook describes how to get summary using different GPT models over API and compare the results.

Additionally, there are a few folders:

  • The data folder contains the transcripts and summaries. MP3, WAV, and MP4 files are excluded due to their significant size, but they can be extracted as described in the .ipynb files.
  • The utils folder contains several Python files that are excluded from the .ipynb files to avoid overloading them with code. Links to these files are included in the .ipynb files.