Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade to whisper.cpp version v1.5.4 #25

Merged
merged 31 commits into from
Jan 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
4d10247
copy whisper.cpp version v1.5.4 to src, update license with the lates…
jwijffels Jan 25, 2024
d0a50d8
copy common
jwijffels Jan 25, 2024
a368df9
copy main
jwijffels Jan 25, 2024
fce6df4
makevars
jwijffels Jan 25, 2024
008ff4b
disable benchmark, change to whisper_init_from_file_with_params
jwijffels Jan 25, 2024
e8c3502
copy models folder
jwijffels Jan 25, 2024
ed53d50
loosen unit test on punctuation symbols - test only on lowercased let…
jwijffels Jan 25, 2024
1b629c4
loosen unit test on punctuation symbols - test only on lowercased let…
jwijffels Jan 25, 2024
c72fb0d
loosen unit test on punctuation symbols - test only on lowercased let…
jwijffels Jan 25, 2024
df1ba1f
put benchmark back
jwijffels Jan 25, 2024
4a4938a
makevars
jwijffels Jan 25, 2024
f1d6cc1
makevars
jwijffels Jan 25, 2024
60ab48b
put benchmark back
jwijffels Jan 25, 2024
8eb7878
makevars
jwijffels Jan 26, 2024
6f65aed
makevars
jwijffels Jan 26, 2024
3d062da
makevars
jwijffels Jan 26, 2024
c6a699b
printf > Rprintf
jwijffels Jan 26, 2024
2badbf0
callback as it was printing the transcription with timestamps
jwijffels Jan 26, 2024
a19a344
callback on progress with Rprintf
jwijffels Jan 26, 2024
ac0d68d
makevars
jwijffels Jan 27, 2024
77547d0
exit/abort/fprintf/printf/fflush/rand
jwijffels Jan 27, 2024
11eaa09
makevars
jwijffels Jan 27, 2024
17c005a
makevars
jwijffels Jan 27, 2024
203472f
Change download location to models from whisper.cpp 1.5.4 and have a …
jwijffels Jan 27, 2024
ca20977
exit/abort/fprintf/printf/fflush/rand - we are using Rprintf so that …
jwijffels Jan 27, 2024
33523be
makevars
jwijffels Jan 27, 2024
4d797f6
remove notes
jwijffels Jan 27, 2024
b404f43
include qqml-tiny.en-q5_1.bin (30Mb) for testing purposes
jwijffels Jan 27, 2024
95eefa5
docs
jwijffels Jan 27, 2024
123bd66
README
jwijffels Jan 27, 2024
b84cce8
refer to model large-v1, large-v2, large-v3 instead of the larg model
jwijffels Jan 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: audio.whisper
Type: Package
Title: Transcribe Audio Files using the "Whisper" Automatic Speech Recognition Model
Version: 0.2.2
Version: 0.3
Maintainer: Jan Wijffels <jwijffels@bnosac.be>
Authors@R: c(
person('Jan', 'Wijffels', role = c('aut', 'cre', 'cph'), email = 'jwijffels@bnosac.be', comment = "R wrapper"),
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.note
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2022 Jan Wijffels, BNOSAC, Georgi Gerganov and David Reid
Copyright (c) 2023 Jan Wijffels, BNOSAC, Georgi Gerganov and David Reid

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## CHANGES IN audio.whisper VERSION 0.3

- Upgrade to whisper.cpp version v1.5.4
- whisper_download_model allows to download 'large-v1', 'large-v2', 'large-v3' while model 'large' should no longer be used

## CHANGES IN audio.whisper VERSION 0.2.2

- Add option to pass on float entropy_thold (similar to compression_ratio_threshold), logprob_thold, beam_size, best_of, split_on_word, max_context when doing the prediction
Expand Down
26 changes: 19 additions & 7 deletions R/whisper.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@
#' trans <- predict(model, newdata = audio, language = "en")
#' trans <- predict(model, newdata = audio, language = "en", token_timestamps = TRUE)
#' }
#'
#' ## Predict using a quantised model
#' audio <- system.file(package = "audio.whisper", "samples", "jfk.wav")
#' path <- system.file(package = "audio.whisper", "repo", "ggml-tiny.en-q5_1.bin")
#' model <- whisper(path)
#' trans <- predict(model, newdata = audio, language = "en")
#' trans <- predict(model, newdata = audio, language = "en", token_timestamps = TRUE)
predict.whisper <- function(object, newdata, language = "auto", trim = FALSE, ...){
stopifnot(length(newdata) == 1)
stopifnot(file.exists(newdata))
Expand Down Expand Up @@ -72,7 +79,7 @@ predict.whisper <- function(object, newdata, language = "auto", trim = FALSE, ..
#' model <- whisper("medium")
#' trans <- predict(model, newdata = system.file(package = "audio.whisper", "samples", "jfk.wav"))
#' trans
#' model <- whisper("large")
#' model <- whisper("large-v1")
#' trans <- predict(model, newdata = system.file(package = "audio.whisper", "samples", "jfk.wav"))
#' trans
#'
Expand All @@ -94,7 +101,7 @@ predict.whisper <- function(object, newdata, language = "auto", trim = FALSE, ..
#' language = "en", duration = 1000)
#' }
whisper <- function(x, overwrite = FALSE, model_dir = getwd(), ...){
if(x %in% c("tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large")){
if(x %in% c("tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3", "large")){
x <- whisper_download_model(x, overwrite = overwrite, model_dir = model_dir)
}
if(inherits(x, "whisper_download")){
Expand All @@ -114,7 +121,7 @@ whisper <- function(x, overwrite = FALSE, model_dir = getwd(), ...){
#' \item{base & base.en: 142 MB, RAM required: ~500 MB. Multilingual and English only version.}
#' \item{small & small.en: 466 MB, RAM required: ~1.0 GB. Multilingual and English only version.}
#' \item{medium & medium.en: 1.5 GB, RAM required: ~2.6 GB. Multilingual and English only version.}
#' \item{large-v1 & large: 2.9 GB, RAM required: ~4.7 GB. Multilingual version 1 and version 2}
#' \item{large-v1, large-v2, large-v3: 2.9 GB, RAM required: ~4.7 GB. Multilingual}
#' }
#' @param x the name of the model
#' @param model_dir a path where the model will be downloaded to. Defaults to the current working directory
Expand Down Expand Up @@ -149,26 +156,31 @@ whisper <- function(x, overwrite = FALSE, model_dir = getwd(), ...){
#' whisper_download_model("medium")
#' whisper_download_model("medium.en")
#' whisper_download_model("large-v1")
#' whisper_download_model("large")
#' whisper_download_model("large-v2")
#' whisper_download_model("large-v3")
#' }
#' \dontshow{
#' if(file.exists(path$file_model)) file.remove(path$file_model)
#' }
whisper_download_model <- function(x = c("tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large"),
whisper_download_model <- function(x = c("tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3", "large"),
model_dir = getwd(),
repos = c("huggingface", "ggerganov"),
version = "1.2.1",
version = c("1.5.4", "1.2.1"),
overwrite = TRUE,
...){
version <- match.arg(version)
x <- match.arg(x)
if(!"force" %in% names(list(...))){
x <- match.arg(x)
}
repos <- match.arg(repos)
if(repos == "huggingface"){
f <- sprintf("ggml-%s.bin", x)
url <- sprintf("https://huggingface.co/datasets/ggerganov/whisper.cpp/resolve/main/%s", f)
url <- sprintf("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/%s", f)
if(version == "1.2.1"){
url <- sprintf("https://huggingface.co/ggerganov/whisper.cpp/resolve/80da2d8bfee42b0e836fc3a9890373e5defc00a6/%s", f)
}else if(version == "1.5.4"){
url <- sprintf("https://huggingface.co/ggerganov/whisper.cpp/resolve/d15393806e24a74f60827e23e986f0c10750b358/%s", f)
}
}else if(repos == "ggerganov"){
.Deprecated(msg = "whisper_download_model with argument repos = 'ggerganov' is deprecated as that resource might become unavailable for certain models, please use repos = 'huggingface'")
Expand Down
45 changes: 31 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,23 @@ This repository contains an R package which is an Rcpp wrapper around the [whisp

## Available models

| Model | Language | Size | RAM needed |
|:-----------------------|:---------------------------:|-------:|-----------:|
| `tiny` & `tiny.en` | Multilingual & English only | 75 MB | 390 MB |
| `base` & `base.en` | Multilingual & English only | 142 MB | 500 MB |
| `small` & `small.en` | Multilingual & English only | 466 MB | 1.0 GB |
| `medium` & `medium.en` | Multilingual & English only | 1.5 GB | 2.6 GB |
| `large-v1` & `large` | Multilingual | 2.9 GB | 4.7 GB |
| Model | Language | Size | RAM needed | Comment |
|:-----------------------|:---------------------------:|-------:|-----------:|-----------------------------:|
| `tiny` & `tiny.en` | Multilingual & English only | 75 MB | 390 MB | audio.whisper >=0.3 & 0.2.2 |
| `base` & `base.en` | Multilingual & English only | 142 MB | 500 MB | audio.whisper >=0.3 & 0.2.2 |
| `small` & `small.en` | Multilingual & English only | 466 MB | 1.0 GB | audio.whisper >=0.3 & 0.2.2 |
| `medium` & `medium.en` | Multilingual & English only | 1.5 GB | 2.6 GB | audio.whisper >=0.3 & 0.2.2 |
| `large-v1` | Multilingual | 2.9 GB | 4.7 GB | audio.whisper >=0.3 & 0.2.2 |
| `large-v2` | Multilingual | 2.9 GB | 4.7 GB | audio.whisper >=0.3 |
| `large-v3` | Multilingual | 2.9 GB | 4.7 GB | audio.whisper >=0.3 |

### Installation

For the *stable* version of this package: `remotes::install_github("bnosac/audio.whisper", ref = "0.2.2")` <br>
For the *stable* version of this package:

- `remotes::install_github("bnosac/audio.whisper", ref = "0.3")` (uses whisper.cpp version 1.5.4)
- `remotes::install_github("bnosac/audio.whisper", ref = "0.2.2")` (uses whisper.cpp version 1.2.1)

Look to the documentation of the functions: `help(package = "audio.whisper")`

- For the *development* version of this package: `remotes::install_github("bnosac/audio.whisper")`
Expand All @@ -29,14 +35,19 @@ Look to the documentation of the functions: `help(package = "audio.whisper")`

**Load the model** either by providing the full path to the model or specify the shorthand which will download the model
- see the help of `whisper_download_model` for a list of available models and to download a model
- you can always download the model manually at https://huggingface.co/ggerganov/whisper.cpp

```{r}
library(audio.whisper)
model <- whisper("tiny")
model <- whisper("base")
model <- whisper("small")
model <- whisper("medium")
model <- whisper("large")
model <- whisper("large-v1")
model <- whisper("large-v2")
model <- whisper("large-v3")
path <- system.file(package = "audio.whisper", "repo", "ggml-tiny.en-q5_1.bin")
model <- whisper(path)
```

**Transcribe a `.wav` audio file**
Expand Down Expand Up @@ -225,22 +236,28 @@ The tensor operations contained in [ggml.h](src/whisper_cpp/ggml.h) / [ggml.c](s

- It has AVX intrinsics support for x86 architectures, VSX intrinsics support for POWER architectures, Mixed F16 / F32 precision, for Apple silicon allows optimisation via Arm Neon and the Accelerate framework
- In order to gain from these **massive transcription speedups**, you need to set the correct C compilation flags when you install the R package, *otherwise transcription speed will be suboptimal*.
- You can set these compilation C flags as follows right before you install the package such that [/src/Makevars](/src/Makevars) knows you want these optimisations
- Normally using the installation as described above, some of these compilation flags are detected and you'll see these printed when doing the installation
- It is however advised to set these compilation C flags yourself as follows right before you install the package such that [/src/Makevars](/src/Makevars) knows you want these optimisations for sure. This can be done by defining the environment variables `WHISPER_CFLAGS`, `WHISPER_CPPFLAGS`, `WHISPER_LIBS` as follows.

```
Sys.setenv(WHISPER_CFLAGS = "-mavx -mavx2 -mfma -mf16c")
remotes::install_github("bnosac/audio.whisper", ref = "0.2.2", force = TRUE)
remotes::install_github("bnosac/audio.whisper", ref = "0.3", force = TRUE)
Sys.unsetenv("WHISPER_CFLAGS")
```

To find out which hardware accelleration options your hardware supports, you can go to https://github.com/bnosac/audio.whisper/issues/15 and look for the CFLAGS (and optionally CXXFLAGS) settings which make sense on your hardware
To find out which hardware acceleration options your hardware supports, you can go to https://github.com/bnosac/audio.whisper/issues/26 and look for the CFLAGS (and optionally CXXFLAGS and LDFLAGS) settings which make sense on your hardware

- Common settings for Mac/Linux/Windows are `-mavx -mavx2 -mfma -mf16c` and extra possible flags for Linux: `-msse3`, PowerPC `-mpower9-vector`, Mac M1 `-DGGML_USE_ACCELERATE`. E.g. on my local Windows machine I could set `-mavx -mavx2 -mfma -mf16c`, on my older local Ubuntu machine there were no optimisation possibilities. Your mileage may vary.
- Common settings to set for `WHISPER_CFLAGS` on Mac/Linux/Windows are `-mavx -mavx2 -mfma -mf16c` and extra possible flags `-msse3` and `mssse3`
- E.g. on my local Windows machine I could set `-mavx -mavx2 -mfma -mf16c`
- For Mac users you can set `Sys.setenv(WHISPER_ACCELERATE = "1")` if your computer has the Accelerate framework
- On my older local Ubuntu machine there were no optimisation possibilities. Your mileage may vary.
- If you need extra settings in `PKG_CPPFLAGS` (`CXXFLAGS`), you can e.g. use `Sys.setenv(WHISPER_CPPFLAGS = "-mcpu=native")` before installing the package
- If you need custom settings, you can update `PKG_CFLAGS` / `PKG_CPPFLAGS` in [/src/Makevars](/src/Makevars) directly.
- If you need extra settings in `PKG_LIBS`, you can e.g. use `Sys.setenv(WHISPER_LIBS = "-framework Accelerate")` before installing the package
- If you need custom settings, you can update `PKG_CFLAGS` / `PKG_CPPFLAGS` / `PKG_LIBS` in [/src/Makevars](/src/Makevars) directly.

Note that *if your hardware does not support these compilation flags, you'll get a crash* when transcribing audio.


-----

## Support in text mining
Expand Down
28 changes: 0 additions & 28 deletions inst/NOTES

This file was deleted.

77 changes: 57 additions & 20 deletions inst/models/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
## Whisper model files in custom ggml format

The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27)
have been converted to custom `ggml` format in order to be able to load them in C/C++. The conversion has been performed
using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script. You can either obtain the original models and generate
the `ggml` files yourself using the conversion script, or you can use the [download-ggml-model.sh](download-ggml-model.sh)
script to download the already converted models. Currently, they are hosted on the following locations:
are converted to custom `ggml` format in order to be able to load them in C/C++.
Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.

- https://huggingface.co/datasets/ggerganov/whisper.cpp
You can either obtain the original models and generate the `ggml` files yourself using the conversion script,
or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
Currently, they are hosted on the following locations:

- https://huggingface.co/ggerganov/whisper.cpp
- https://ggml.ggerganov.com

Sample usage:
Sample download:

```java
$ ./download-ggml-model.sh base.en
Expand All @@ -21,24 +23,35 @@ You can now use it like this:
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
```

To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage.
The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
```
mkdir models/whisper-medium
python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
rmdir models/whisper-medium
```

A third option to obtain the model files is to download them from Hugging Face:

https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main
https://huggingface.co/ggerganov/whisper.cpp/tree/main

## Available models

| Model | Disk | Mem | SHA |
| --- | --- | --- | --- |
| tiny | 75 MB | ~390 MB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
| tiny.en | 75 MB | ~390 MB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
| base | 142 MB | ~500 MB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
| base.en | 142 MB | ~500 MB | `137c40403d78fd54d454da0f9bd998f78703390c` |
| small | 466 MB | ~1.0 GB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
| small.en | 466 MB | ~1.0 GB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
| medium | 1.5 GB | ~2.6 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
| medium.en | 1.5 GB | ~2.6 GB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
| large-v1 | 2.9 GB | ~4.7 GB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
| large | 2.9 GB | ~4.7 GB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
| Model | Disk | SHA |
| --- | --- | --- |
| tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
| tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
| base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
| base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
| small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
| small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
| medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
| medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
| large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
| large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
| large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |

## Model files for testing purposes

Expand All @@ -58,8 +71,32 @@ git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp

# clone HF fine-tuned model (this is just an example)
git clone https://huggingface.co/openai/whisper-base.en
git clone https://huggingface.co/openai/whisper-medium

# convert the model to ggml
python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
```

## Distilled models

Initial support for https://huggingface.co/distil-whisper is available.

Currently, the chunk-based transcription strategy is not implemented, so there can be sub-optimal quality when using the distilled models with `whisper.cpp`.

```bash
# clone OpenAI whisper and whisper.cpp
git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp

# get the models
cd whisper.cpp/models
git clone https://huggingface.co/distil-whisper/distil-medium.en
git clone https://huggingface.co/distil-whisper/distil-large-v2

# convert to ggml
python3 ./convert-h5-to-ggml.py ./distil-medium.en/ ../../whisper .
mv ggml-model.bin ggml-medium.en-distil.bin

python3 ./convert-h5-to-ggml.py ./distil-large-v2/ ../../whisper .
mv ggml-model.bin ggml-large-v2-distil.bin
```
Loading
Loading