Skip to content

Commit 28d365c

Browse files
committed
copy models folder
1 parent f7d5792 commit 28d365c

13 files changed

+915
-92
lines changed

inst/models/README.md

+57-20
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,17 @@
11
## Whisper model files in custom ggml format
22

33
The [original Whisper PyTorch models provided by OpenAI](https://github.com/openai/whisper/blob/main/whisper/__init__.py#L17-L27)
4-
have been converted to custom `ggml` format in order to be able to load them in C/C++. The conversion has been performed
5-
using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script. You can either obtain the original models and generate
6-
the `ggml` files yourself using the conversion script, or you can use the [download-ggml-model.sh](download-ggml-model.sh)
7-
script to download the already converted models. Currently, they are hosted on the following locations:
4+
are converted to custom `ggml` format in order to be able to load them in C/C++.
5+
Conversion is performed using the [convert-pt-to-ggml.py](convert-pt-to-ggml.py) script.
86

9-
- https://huggingface.co/datasets/ggerganov/whisper.cpp
7+
You can either obtain the original models and generate the `ggml` files yourself using the conversion script,
8+
or you can use the [download-ggml-model.sh](download-ggml-model.sh) script to download the already converted models.
9+
Currently, they are hosted on the following locations:
10+
11+
- https://huggingface.co/ggerganov/whisper.cpp
1012
- https://ggml.ggerganov.com
1113

12-
Sample usage:
14+
Sample download:
1315

1416
```java
1517
$ ./download-ggml-model.sh base.en
@@ -21,24 +23,35 @@ You can now use it like this:
2123
$ ./main -m models/ggml-base.en.bin -f samples/jfk.wav
2224
```
2325

26+
To convert the files yourself, use the convert-pt-to-ggml.py script. Here is an example usage.
27+
The original PyTorch files are assumed to have been downloaded into ~/.cache/whisper
28+
Change `~/path/to/repo/whisper/` to the location for your copy of the Whisper source:
29+
```
30+
mkdir models/whisper-medium
31+
python models/convert-pt-to-ggml.py ~/.cache/whisper/medium.pt ~/path/to/repo/whisper/ ./models/whisper-medium
32+
mv ./models/whisper-medium/ggml-model.bin models/ggml-medium.bin
33+
rmdir models/whisper-medium
34+
```
35+
2436
A third option to obtain the model files is to download them from Hugging Face:
2537

26-
https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main
38+
https://huggingface.co/ggerganov/whisper.cpp/tree/main
2739

2840
## Available models
2941

30-
| Model | Disk | Mem | SHA |
31-
| --- | --- | --- | --- |
32-
| tiny | 75 MB | ~390 MB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
33-
| tiny.en | 75 MB | ~390 MB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
34-
| base | 142 MB | ~500 MB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
35-
| base.en | 142 MB | ~500 MB | `137c40403d78fd54d454da0f9bd998f78703390c` |
36-
| small | 466 MB | ~1.0 GB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
37-
| small.en | 466 MB | ~1.0 GB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
38-
| medium | 1.5 GB | ~2.6 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
39-
| medium.en | 1.5 GB | ~2.6 GB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
40-
| large-v1 | 2.9 GB | ~4.7 GB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
41-
| large | 2.9 GB | ~4.7 GB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
42+
| Model | Disk | SHA |
43+
| --- | --- | --- |
44+
| tiny | 75 MiB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
45+
| tiny.en | 75 MiB | `c78c86eb1a8faa21b369bcd33207cc90d64ae9df` |
46+
| base | 142 MiB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
47+
| base.en | 142 MiB | `137c40403d78fd54d454da0f9bd998f78703390c` |
48+
| small | 466 MiB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
49+
| small.en | 466 MiB | `db8a495a91d927739e50b3fc1cc4c6b8f6c2d022` |
50+
| medium | 1.5 GiB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
51+
| medium.en | 1.5 GiB | `8c30f0e44ce9560643ebd10bbe50cd20eafd3723` |
52+
| large-v1 | 2.9 GiB | `b1caaf735c4cc1429223d5a74f0f4d0b9b59a299` |
53+
| large-v2 | 2.9 GiB | `0f4c8e34f21cf1a914c59d8b3ce882345ad349d6` |
54+
| large-v3 | 2.9 GiB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
4255

4356
## Model files for testing purposes
4457

@@ -58,8 +71,32 @@ git clone https://github.com/openai/whisper
5871
git clone https://github.com/ggerganov/whisper.cpp
5972

6073
# clone HF fine-tuned model (this is just an example)
61-
git clone https://huggingface.co/openai/whisper-base.en
74+
git clone https://huggingface.co/openai/whisper-medium
6275

6376
# convert the model to ggml
6477
python3 ./whisper.cpp/models/convert-h5-to-ggml.py ./whisper-medium/ ./whisper .
6578
```
79+
80+
## Distilled models
81+
82+
Initial support for https://huggingface.co/distil-whisper is available.
83+
84+
Currently, the chunk-based transcription strategy is not implemented, so there can be sub-optimal quality when using the distilled models with `whisper.cpp`.
85+
86+
```bash
87+
# clone OpenAI whisper and whisper.cpp
88+
git clone https://github.com/openai/whisper
89+
git clone https://github.com/ggerganov/whisper.cpp
90+
91+
# get the models
92+
cd whisper.cpp/models
93+
git clone https://huggingface.co/distil-whisper/distil-medium.en
94+
git clone https://huggingface.co/distil-whisper/distil-large-v2
95+
96+
# convert to ggml
97+
python3 ./convert-h5-to-ggml.py ./distil-medium.en/ ../../whisper .
98+
mv ggml-model.bin ggml-medium.en-distil.bin
99+
100+
python3 ./convert-h5-to-ggml.py ./distil-large-v2/ ../../whisper .
101+
mv ggml-model.bin ggml-large-v2-distil.bin
102+
```

inst/models/convert-h5-to-coreml.py

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
import argparse
2+
import importlib.util
3+
4+
spec = importlib.util.spec_from_file_location('whisper_to_coreml', 'models/convert-whisper-to-coreml.py')
5+
whisper_to_coreml = importlib.util.module_from_spec(spec)
6+
spec.loader.exec_module(whisper_to_coreml)
7+
8+
from whisper import load_model
9+
10+
from copy import deepcopy
11+
import torch
12+
from transformers import WhisperForConditionalGeneration
13+
from huggingface_hub import metadata_update
14+
15+
# https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
16+
WHISPER_MAPPING = {
17+
"layers": "blocks",
18+
"fc1": "mlp.0",
19+
"fc2": "mlp.2",
20+
"final_layer_norm": "mlp_ln",
21+
"layers": "blocks",
22+
".self_attn.q_proj": ".attn.query",
23+
".self_attn.k_proj": ".attn.key",
24+
".self_attn.v_proj": ".attn.value",
25+
".self_attn_layer_norm": ".attn_ln",
26+
".self_attn.out_proj": ".attn.out",
27+
".encoder_attn.q_proj": ".cross_attn.query",
28+
".encoder_attn.k_proj": ".cross_attn.key",
29+
".encoder_attn.v_proj": ".cross_attn.value",
30+
".encoder_attn_layer_norm": ".cross_attn_ln",
31+
".encoder_attn.out_proj": ".cross_attn.out",
32+
"decoder.layer_norm.": "decoder.ln.",
33+
"encoder.layer_norm.": "encoder.ln_post.",
34+
"embed_tokens": "token_embedding",
35+
"encoder.embed_positions.weight": "encoder.positional_embedding",
36+
"decoder.embed_positions.weight": "decoder.positional_embedding",
37+
"layer_norm": "ln_post",
38+
}
39+
40+
# https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
41+
def rename_keys(s_dict):
42+
keys = list(s_dict.keys())
43+
for key in keys:
44+
new_key = key
45+
for k, v in WHISPER_MAPPING.items():
46+
if k in key:
47+
new_key = new_key.replace(k, v)
48+
49+
print(f"{key} -> {new_key}")
50+
51+
s_dict[new_key] = s_dict.pop(key)
52+
return s_dict
53+
54+
# https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets/blob/main/src/multiple_datasets/hub_default_utils.py
55+
def convert_hf_whisper(hf_model_name_or_path: str, whisper_state_path: str):
56+
transformer_model = WhisperForConditionalGeneration.from_pretrained(hf_model_name_or_path)
57+
config = transformer_model.config
58+
59+
# first build dims
60+
dims = {
61+
'n_mels': config.num_mel_bins,
62+
'n_vocab': config.vocab_size,
63+
'n_audio_ctx': config.max_source_positions,
64+
'n_audio_state': config.d_model,
65+
'n_audio_head': config.encoder_attention_heads,
66+
'n_audio_layer': config.encoder_layers,
67+
'n_text_ctx': config.max_target_positions,
68+
'n_text_state': config.d_model,
69+
'n_text_head': config.decoder_attention_heads,
70+
'n_text_layer': config.decoder_layers
71+
}
72+
73+
state_dict = deepcopy(transformer_model.model.state_dict())
74+
state_dict = rename_keys(state_dict)
75+
76+
torch.save({"dims": dims, "model_state_dict": state_dict}, whisper_state_path)
77+
78+
# Ported from models/convert-whisper-to-coreml.py
79+
if __name__ == "__main__":
80+
parser = argparse.ArgumentParser()
81+
parser.add_argument("--model-name", type=str, help="name of model to convert (e.g. tiny, tiny.en, base, base.en, small, small.en, medium, medium.en, large-v1, large-v2, large-v3)", required=True)
82+
parser.add_argument("--model-path", type=str, help="path to the model (e.g. if published on HuggingFace: Oblivion208/whisper-tiny-cantonese)", required=True)
83+
parser.add_argument("--encoder-only", type=bool, help="only convert encoder", default=False)
84+
parser.add_argument("--quantize", type=bool, help="quantize weights to F16", default=False)
85+
parser.add_argument("--optimize-ane", type=bool, help="optimize for ANE execution (currently broken)", default=False)
86+
args = parser.parse_args()
87+
88+
if args.model_name not in ["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en", "large-v1", "large-v2", "large-v3"]:
89+
raise ValueError("Invalid model name")
90+
91+
pt_target_path = f"models/hf-{args.model_name}.pt"
92+
convert_hf_whisper(args.model_path, pt_target_path)
93+
94+
whisper = load_model(pt_target_path).cpu()
95+
hparams = whisper.dims
96+
print(hparams)
97+
98+
if args.optimize_ane:
99+
whisperANE = whisper_to_coreml.WhisperANE(hparams).eval()
100+
whisperANE.load_state_dict(whisper.state_dict())
101+
102+
encoder = whisperANE.encoder
103+
decoder = whisperANE.decoder
104+
else:
105+
encoder = whisper.encoder
106+
decoder = whisper.decoder
107+
108+
# Convert encoder
109+
encoder = whisper_to_coreml.convert_encoder(hparams, encoder, quantize=args.quantize)
110+
encoder.save(f"models/coreml-encoder-{args.model_name}.mlpackage")
111+
112+
if args.encoder_only is False:
113+
# Convert decoder
114+
decoder = whisper_to_coreml.convert_decoder(hparams, decoder, quantize=args.quantize)
115+
decoder.save(f"models/coreml-decoder-{args.model_name}.mlpackage")
116+
117+
print("done converting")

inst/models/convert-h5-to-ggml.py

+18-22
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
import code
2424
import torch
2525
import numpy as np
26+
from pathlib import Path
2627

2728
from transformers import WhisperForConditionalGeneration
2829

@@ -56,7 +57,7 @@ def bytes_to_unicode():
5657
The reversible bpe codes work on unicode strings.
5758
This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
5859
When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
59-
This is a signficant percentage of your normal, say, 32K bpe vocab.
60+
This is a significant percentage of your normal, say, 32K bpe vocab.
6061
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
6162
And avoids mapping to whitespace/control characters the bpe code barfs on.
6263
"""
@@ -75,16 +76,13 @@ def bytes_to_unicode():
7576
print("Usage: convert-h5-to-ggml.py dir_model path-to-whisper-repo dir-output [use-f32]\n")
7677
sys.exit(1)
7778

78-
dir_model = sys.argv[1]
79-
dir_whisper = sys.argv[2]
80-
dir_out = sys.argv[3]
79+
dir_model = Path(sys.argv[1])
80+
dir_whisper = Path(sys.argv[2])
81+
dir_out = Path(sys.argv[3])
8182

82-
with open(dir_model + "/vocab.json", "r") as f:
83-
encoder = json.load(f)
84-
with open(dir_model + "/added_tokens.json", "r") as f:
85-
encoder_added = json.load(f)
86-
with open(dir_model + "/config.json", "r") as f:
87-
hparams = json.load(f)
83+
encoder = json.load((dir_model / "vocab.json").open("r", encoding="utf8"))
84+
encoder_added = json.load((dir_model / "added_tokens.json").open( "r", encoding="utf8"))
85+
hparams = json.load((dir_model / "config.json").open("r", encoding="utf8") )
8886

8987
model = WhisperForConditionalGeneration.from_pretrained(dir_model)
9088

@@ -96,16 +94,15 @@ def bytes_to_unicode():
9694

9795
dir_tokenizer = dir_model
9896

99-
fname_out = dir_out + "/ggml-model.bin"
97+
fname_out = dir_out / "ggml-model.bin"
10098

101-
with open(dir_tokenizer + "/vocab.json", "r", encoding="utf8") as f:
102-
tokens = json.load(f)
99+
tokens = json.load(open(dir_tokenizer / "vocab.json", "r", encoding="utf8"))
103100

104101
# use 16-bit or 32-bit floats
105102
use_f16 = True
106103
if len(sys.argv) > 4:
107104
use_f16 = False
108-
fname_out = dir_out + "/ggml-model-f32.bin"
105+
fname_out = dir_out / "ggml-model-f32.bin"
109106

110107
fout = open(fname_out, "wb")
111108

@@ -171,18 +168,17 @@ def bytes_to_unicode():
171168
data = data.astype(np.float16)
172169

173170
# reshape conv bias from [n] to [n, 1]
174-
if name == "encoder.conv1.bias" or \
175-
name == "encoder.conv2.bias":
171+
if name in ["encoder.conv1.bias", "encoder.conv2.bias"]:
176172
data = data.reshape(data.shape[0], 1)
177-
print(" Reshaped variable: " + name + " to shape: ", data.shape)
173+
print(" Reshaped variable: " , name , " to shape: ", data.shape)
178174

179175
n_dims = len(data.shape)
180176
print(name, n_dims, data.shape)
181177

182178
# looks like the whisper models are in f16 by default
183179
# so we need to convert the small tensors to f32 until we fully support f16 in ggml
184180
# ftype == 0 -> float32, ftype == 1 -> float16
185-
ftype = 1;
181+
ftype = 1
186182
if use_f16:
187183
if n_dims < 2 or \
188184
name == "encoder.conv1.bias" or \
@@ -197,16 +193,16 @@ def bytes_to_unicode():
197193
ftype = 0
198194

199195
# header
200-
str = name.encode('utf-8')
201-
fout.write(struct.pack("iii", n_dims, len(str), ftype))
196+
str_ = name.encode('utf-8')
197+
fout.write(struct.pack("iii", n_dims, len(str_), ftype))
202198
for i in range(n_dims):
203199
fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
204-
fout.write(str);
200+
fout.write(str_)
205201

206202
# data
207203
data.tofile(fout)
208204

209205
fout.close()
210206

211-
print("Done. Output file: " + fname_out)
207+
print("Done. Output file: " , fname_out)
212208
print("")

0 commit comments

Comments
 (0)