Skip to content

Commit

Permalink
feat: update transcription engine (#16)
Browse files Browse the repository at this point in the history
* feat: update transcription engine

* fix engine values
  • Loading branch information
jveldboom authored Jul 6, 2023
1 parent dead6cc commit 25d7a85
Show file tree
Hide file tree
Showing 6 changed files with 37 additions and 14 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RUN apt-get install -y ffmpeg nodejs=18.* \
&& rm -rf /var/lib/apt/lists/*

# install python dependencies
RUN pip install -U openai-whisper
RUN pip install -U openai-whisper whisper-ctranslate2

# add source files
COPY src /app
Expand Down
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ test-clean:
whisper:
docker run -it --rm \
-v ${DIR}:/app \
-v ${DIR}/.whisper:/root/.cache/whisper \
-v ${DIR}/.whisper:/app/.whisper \
${IMAGE_TAG} \
whisper ${VIDEO_FILE} \
--model ${MODEL} \
Expand All @@ -43,11 +43,12 @@ whisper:
node:
docker run -it --rm \
-v ${DIR}:/app \
-v ${DIR}/.whisper:/root/.cache/whisper \
-v ${DIR}/.whisper:/app/.whisper \
${IMAGE_TAG} \
node /app/src/cut-video.js -t test -v test.mp4

bash:
docker run -it --rm \
-v ${DIR}:/data \
-v ${DIR}/.whisper:/app/.whisper \
${IMAGE_TAG} /bin/bash
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ docker run --rm -it \
- `--input` - path to video file
- `--model` - whisper model name - `tiny`, `tiny.en` (default), `base`, `base.en`, `small`, `small.en`, `medium`, `medium.sm`, `large`. View [official docs](https://github.com/openai/whisper#available-models-and-languages) for break down of model size and performance
- `--language` - language code. Typically improves in transcription to set language instead of allowing Whisper to auto-detect.
- `--engine` - transcription engine `whisper-ctranslate2` (default) or `whisper`. `whisper` is likely to be removed in the near future if/when `whisper-ctranslate2` proves to be just as good but 4x faster

### Known Issues
- `Error: Command "whisper" exited with code null` - this is likely caused by the container needing more allocated memory. Allocating at least 4 GB memory for the `small.en` usually resolved the issue but your mileage may vary.
Expand All @@ -47,7 +48,7 @@ Allows you to manually create a list of timestamps to cut the video.

Usage:
```shell
docker run --rm -it -v $(pwd):/data video-swear-jar \
docker run --rm -it -v $(pwd):/data video-swear-jar:v1 \
cut-video --timestamp timestamps.txt --video video.mkv
```

Expand All @@ -60,7 +61,7 @@ This is the `whisper` CLI if you need to further customize the command. Visit ht

Usage:
```shell
docker run --rm -it -v $(pwd):/data video-swear-jar \
docker run --rm -it -v $(pwd):/data video-swear-jar:v1 \
whisper my-video.mp4 \
--model tiny.en \
--language en \
Expand All @@ -71,7 +72,7 @@ docker run --rm -it -v $(pwd):/data video-swear-jar \
### ffmpeg
Usage:
```shell
docker run --rm -it -v $(pwd):/data video-swear-jar \
docker run --rm -it -v $(pwd):/data video-swear-jar:v1 \
ffmpeg -i input.mp4 output.avi
```

Expand Down
7 changes: 6 additions & 1 deletion docs/notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,9 @@ ffmpeg \
-i input.mkv \
-vcodec copy \
-acodec copy output.mkv
```
```

## Whisper Research
- [Making OpenAI Whisper faster](https://nikolas.blog/making-openai-whisper-faster/)
- https://github.com/guillaumekln/faster-whisper - Faster Whisper transcription with CTranslate2
- https://github.com/Softcatala/whisper-ctranslate2 - Whisper command line client compatible with original OpenAI client based on CTranslate2
12 changes: 9 additions & 3 deletions src/clean.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,24 @@ const log = require('./log')
const utils = require('./utils')
const video = require('./video')

const argv = yargs.usage('clean')
const argv = yargs.usage('clean-fast')
.options({
input: {
description: 'Input video filename',
demandOption: true,
alias: 'i'
},
engine: {
description: 'Transcription engine',
alias: 'e',
default: 'whisper-ctranslate2',
choices: ['whisper', 'whisper-ctranslate2']
},
model: {
description: 'Whisper model name',
alias: 'm',
default: 'tiny.en',
choices: ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small']
choices: ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large']
},
language: {
description: 'Video file language',
Expand All @@ -41,7 +47,7 @@ const run = async () => {
try {
log.info('[1 of 4] Starting video transcribe...')
const { model, language } = argv
await video.transcribe({ inputFile: paths.inputFile, model, language, outputDir: argv['output-dir'] })
await video.transcribe({ engine: argv.engine, inputFile: paths.inputFile, model, language, outputDir: argv['output-dir'] })
} catch (err) {
log.error(`Unable to transcribe ${paths.inputFile}`, err)
throw err
Expand Down
18 changes: 14 additions & 4 deletions src/video.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,27 @@ const fs = require('fs')
const swearWords = require('./swear-words.json')
const utils = require('./utils')

const transcribe = async ({ inputFile, model = 'tiny.en', language = 'en', outputDir = '.' }) => {
const transcribe = async ({ engine = 'whisper-ctranslate2', inputFile, model = 'tiny.en', language = 'en', outputDir = '.' }) => {
const args = [
inputFile,
'--model', model,
'--model_dir', '/app/.whisper',
'--language', language,
'--output_format', 'json',
'--output_dir', outputDir,
'--fp16', 'False' // TODO: make CLI argument to use GPU
'--output_dir', outputDir
]
await utils.asyncSpawn('whisper', args)

// engine specific args
switch (engine) {
case 'whisper':
args.push('--fp16', 'False')
break
case 'whisper-ctranslate2':
args.push('--compute_type', 'int8')
break
}

await utils.asyncSpawn(engine, args)
}

const cut = async ({ cutFile, outputFile }) => {
Expand Down

0 comments on commit 25d7a85

Please sign in to comment.