Skip to content

Open-source subtitle generation for seamless content translation.

License

Notifications You must be signed in to change notification settings

innovatorved/subtitle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

6f06800 ยท Dec 9, 2023

History

20 Commits
Nov 18, 2023
Nov 18, 2023
Nov 18, 2023
Nov 17, 2023
Nov 17, 2023
Nov 18, 2023
Nov 17, 2023
Nov 18, 2023
Nov 26, 2023
Nov 17, 2023
Dec 9, 2023
Dec 9, 2023
Nov 25, 2023
Dec 9, 2023
Dec 9, 2023

Repository files navigation

Subtitle

Open-source subtitle generation for seamless content translation.

Key Features:

  • Open-source: Freely available for use, modification, and distribution.
  • Self-hosted: Run the tool on your own servers for enhanced control and privacy.
  • AI-powered: Leverage advanced machine learning for accurate and natural-sounding subtitles.
  • Multilingual support: Generate subtitles for videos in a wide range of languages.
  • Easy integration: Seamlessly integrates into your existing workflow.

I made this project for fun, but I think it could also be useful for other people.

Installation

FFmpeg

First, you need to install FFmpeg. Here's how you can do it:

# On Linux
sudo apt install ffmpeg

Run

You can run the script from the command line using the following command:

python subtitle.py <filepath | video_url> [--model <modelname>]

Replace <filepath | video_url> with the path to your video file. The --model argument is optional. If not provided, it will use 'base' as the default model.

For example:

python subtitle.py /path/to/your/video.mp4 --model base

This will run the script on the video at /path/to/your/video.mp4 using the base model. Please replace /path/to/your/video.mp4 with the actual path to your video file.

Models

Here are the models you can use: Note: Use the .en model only when the video is in English.

  • tiny.en
  • tiny
  • tiny-q5_1
  • tiny.en-q5_1
  • base.en
  • base
  • base-q5_1
  • base.en-q5_1
  • small.en
  • small.en-tdrz
  • small
  • small-q5_1
  • small.en-q5_1
  • medium
  • medium.en
  • medium-q5_0
  • medium.en-q5_0
  • large-v1
  • large-v2
  • large
  • large-q5_0

Advance

You can modify the behaviour by using these parameters whisper binary as follows:

./whisper [options] file0.wav file1.wav ...

Options

Here are the options you can use with the whisper binary:

Option Default Description
-h, --help Show help message and exit
-t N, --threads N 4 Number of threads to use during computation
-p N, --processors N 1 Number of processors to use during computation
-ot N, --offset-t N 0 Time offset in milliseconds
-on N, --offset-n N 0 Segment index offset
-d N, --duration N 0 Duration of audio to process in milliseconds
-mc N, --max-context N -1 Maximum number of text context tokens to store
-ml N, --max-len N 0 Maximum segment length in characters
-sow, --split-on-word false Split on word rather than on token
-bo N, --best-of N 2 Number of best candidates to keep
-bs N, --beam-size N -1 Beam size for beam search
-wt N, --word-thold N 0.01 Word timestamp probability threshold
-et N, --entropy-thold N 2.40 Entropy threshold for decoder fail
-lpt N, --logprob-thold N -1.00 Log probability threshold for decoder fail
-debug, --debug-mode false Enable debug mode (eg. dump log_mel)
-tr, --translate false Translate from source language to English
-di, --diarize false Stereo audio diarization
-tdrz, --tinydiarize false Enable tinydiarize (requires a tdrz model)
-nf, --no-fallback false Do not use temperature fallback while decoding
-otxt, --output-txt true Output result in a text file
-ovtt, --output-vtt false Output result in a vtt file
-osrt, --output-srt false Output result in a srt file
-olrc, --output-lrc false Output result in a lrc file
-owts, --output-words false Output script for generating karaoke video
-fp, --font-path /System/Library/Fonts/Supplemental/Courier New Bold.ttf Path to a monospace font for karaoke video
-ocsv, --output-csv false Output result in a CSV file
-oj, --output-json false Output result in a JSON file
-ojf, --output-json-full false Include more information in the JSON file
-of FNAME, --output-file FNAME Output file path (without file extension)
-ps, --print-special false Print special tokens
-pc, --print-colors false Print colors
-pp, --print-progress false Print progress
-nt, --no-timestamps false Do not print timestamps
-l LANG, --language LANG en Spoken language ('auto' for auto-detect)
-dl, --detect-language false Exit after automatically detecting language
--prompt PROMPT Initial prompt
-m FNAME, --model FNAME models/ggml-base.en.bin Model path
-f FNAME, --file FNAME Input WAV file path
-oved D, --ov-e-device DNAME CPU The OpenVINO device used for encode inference
-ls, --log-score false Log best decoder scores of tokens
-ng, --no-gpu false Disable GPU

Example for running Binary

Here's an example of how to use the whisper binary:

./whisper -m models/ggml-tiny.en.bin -f Rev.mp3 out.wav -nt --output-vtt

License

MIT

Reference & Credits

Authors

๐Ÿš€ About Me

Just try to being a Developer!

Support

For support, email vedgupta@protonmail.com

About

Open-source subtitle generation for seamless content translation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published