gguf: add CLI #1221

ngxson · 2025-02-24T21:29:30Z

Ref discussion: https://huggingface.slack.com/archives/C02CLHA19TL/p1740399079674399?thread_ts=1739968558.574099&cid=C02CLHA19TL

I'm trying with this command:

pnpm run build && npx . ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

Output:

* Dumping 36 key/value pair(s)
  Idx | Count  | Value                                                                            
  ----|--------|----------------------------------------------------------------------------------
    1 |      1 | version = 3                                                                      
    2 |      1 | tensor_count = 292                                                               
    3 |      1 | kv_count = 33                                                                    
    4 |      1 | general.architecture = "llama"                                                   
    5 |      1 | general.type = "model"                                                           
    6 |      1 | general.name = "Meta Llama 3.1 8B Instruct"                                      
    7 |      1 | general.finetune = "Instruct"                                                    
    8 |      1 | general.basename = "Meta-Llama-3.1"                                              
    9 |      1 | general.size_label = "8B"                                                        
   10 |      1 | general.license = "llama3.1"                                                     
   11 |      6 | general.tags = ["facebook","meta","pytorch","llama","llama-3","te...             
   12 |      8 | general.languages = ["en","de","fr","it","pt","hi","es","th"]                    
   13 |      1 | llama.block_count = 32                                                           
   14 |      1 | llama.context_length = 131072                                                    
   15 |      1 | llama.embedding_length = 4096                                                    
   16 |      1 | llama.feed_forward_length = 14336                                                
   17 |      1 | llama.attention.head_count = 32                                                  
   18 |      1 | llama.attention.head_count_kv = 8                                                
   19 |      1 | llama.rope.freq_base = 500000                                                    
   20 |      1 | llama.attention.layer_norm_rms_epsilon = 0.000009999999747378752                 
   21 |      1 | general.file_type = 15                                                           
   22 |      1 | llama.vocab_size = 128256                                                        
   23 |      1 | llama.rope.dimension_count = 128                                                 
   24 |      1 | tokenizer.ggml.model = "gpt2"                                                    
   25 |      1 | tokenizer.ggml.pre = "llama-bpe"                                                 
   26 | 128256 | tokenizer.ggml.tokens = ["!","\"","#","$","%","&","'","(",")","*","+",",",...    
   27 | 128256 | tokenizer.ggml.token_type = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1...
   28 | 280147 | tokenizer.ggml.merges = ["Ġ Ġ","Ġ ĠĠĠ","ĠĠ ĠĠ","ĠĠĠ Ġ","i n","Ġ t","Ġ ĠĠĠĠ...    
   29 |      1 | tokenizer.ggml.bos_token_id = 128000                                             
   30 |      1 | tokenizer.ggml.eos_token_id = 128009                                             
   31 |      1 | tokenizer.chat_template = "{{- bos_token }}\n{%- if custom_tools is defined ...  
   32 |      1 | general.quantization_version = 2                                                 
   33 |      1 | quantize.imatrix.file = "/models_out/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-...    
   34 |      1 | quantize.imatrix.dataset = "/training_dir/calibration_datav3.txt"                
   35 |      1 | quantize.imatrix.entries_count = 224                                             
   36 |      1 | quantize.imatrix.chunks_count = 125                                              

* Dumping 292 tensor(s)
  Idx | Num Elements | Shape                          | Data Type | Name                     
  ----|--------------|--------------------------------|-----------|--------------------------
    1 |           64 |     64,      1,      1,      1 | F32       | rope_freqs.weight        
    2 |    525336576 |   4096, 128256,      1,      1 | Q4_K      | token_embd.weight        
    3 |         4096 |   4096,      1,      1,      1 | F32       | blk.0.attn_norm.weight   
    4 |     58720256 |  14336,   4096,      1,      1 | Q6_K      | blk.0.ffn_down.weight

...(truncated)

For reference, here is the output of gguf_dump.py:

$ python gguf_dump.py ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf 
INFO:gguf-dump:* Loading: /Users/ngxson/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 36 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 292
      3: UINT64     |        1 | GGUF.kv_count = 33
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.type = 'model'
      6: STRING     |        1 | general.name = 'Meta Llama 3.1 8B Instruct'
      7: STRING     |        1 | general.finetune = 'Instruct'
      8: STRING     |        1 | general.basename = 'Meta-Llama-3.1'
      9: STRING     |        1 | general.size_label = '8B'
     10: STRING     |        1 | general.license = 'llama3.1'
     11: [STRING]   |        6 | general.tags
     12: [STRING]   |        8 | general.languages
     13: UINT32     |        1 | llama.block_count = 32
     14: UINT32     |        1 | llama.context_length = 131072
     15: UINT32     |        1 | llama.embedding_length = 4096
     16: UINT32     |        1 | llama.feed_forward_length = 14336
     17: UINT32     |        1 | llama.attention.head_count = 32
     18: UINT32     |        1 | llama.attention.head_count_kv = 8
     19: FLOAT32    |        1 | llama.rope.freq_base = 500000.0
     20: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     21: UINT32     |        1 | general.file_type = 15
     22: UINT32     |        1 | llama.vocab_size = 128256
     23: UINT32     |        1 | llama.rope.dimension_count = 128
     24: STRING     |        1 | tokenizer.ggml.model = 'gpt2'
     25: STRING     |        1 | tokenizer.ggml.pre = 'llama-bpe'
     26: [STRING]   |   128256 | tokenizer.ggml.tokens
     27: [INT32]    |   128256 | tokenizer.ggml.token_type
     28: [STRING]   |   280147 | tokenizer.ggml.merges
     29: UINT32     |        1 | tokenizer.ggml.bos_token_id = 128000
     30: UINT32     |        1 | tokenizer.ggml.eos_token_id = 128009
     31: STRING     |        1 | tokenizer.chat_template = '{{- bos_token }}\n{%- if custom_tools is defined %}\n    {%- s'
     32: UINT32     |        1 | general.quantization_version = 2
     33: STRING     |        1 | quantize.imatrix.file = '/models_out/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8'
     34: STRING     |        1 | quantize.imatrix.dataset = '/training_dir/calibration_datav3.txt'
     35: INT32      |        1 | quantize.imatrix.entries_count = 224
     36: INT32      |        1 | quantize.imatrix.chunks_count = 125
* Dumping 292 tensor(s)
      1:         64 |    64,     1,     1,     1 | F32     | rope_freqs.weight
      2:  525336576 |  4096, 128256,     1,     1 | Q4_K    | token_embd.weight
      3:       4096 |  4096,     1,     1,     1 | F32     | blk.0.attn_norm.weight
      4:   58720256 | 14336,  4096,     1,     1 | Q6_K    | blk.0.ffn_down.weight
      5:   58720256 |  4096, 14336,     1,     1 | Q4_K    | blk.0.ffn_gate.weight
      6:   58720256 |  4096, 14336,     1,     1 | Q4_K    | blk.0.ffn_up.weight
      7:       4096 |  4096,     1,     1,     1 | F32     | blk.0.ffn_norm.weight
      8:    4194304 |  4096,  1024,     1,     1 | Q4_K    | blk.0.attn_k.weight

julien-c · 2025-02-25T09:10:17Z

I'm trying with this command:

pnpm run build && npx . ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

What will be the "official" command once it's released?

UPDATE: npx @huggingface/gguf my_model.gguf

julien-c

very nice, thanks

ngxson · 2025-02-25T11:35:57Z

I deployed a test version of the package in my namespace too:

npx @ngxson/gguf-test Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

User can even install the package as global via npm i -g and use the CLI via gguf-dump command

ngxson · 2025-02-25T11:50:19Z

I changed the binary name to gguf-view so it won't clash if user already had gguf-dump from pip install gguf

Small note @mishig25 , we briefly discussed about console.table this morning. While the output looks nice, it lacks control for number alignment (i.e. numbers must be on the right of cell). So I'm sticking with the DIY version for now 😂

julien-c · 2025-02-25T12:08:00Z

So I'm sticking with the DIY version for now 😂

perfectly fine IMO, and btw we might already have an implementation for this somewhere (i'm pretty sure i did some in the past)

mishig25

lgtm!

packages/gguf/README.md

mishig25 · 2025-02-25T12:44:56Z

packages/gguf/src/cli.ts

+		allowLocalFile: true,
+	});
+
+	// TODO: print info about endianess


do we still need this todo ?

Yes, would be nice if we can make it outputs the same as gguf-py script

gguf: add CLI

d03ff6d

ngxson requested review from mishig25 and julien-c as code owners February 24, 2025 21:29

julien-c approved these changes Feb 25, 2025

View reviewed changes

ngxson added 2 commits February 25, 2025 12:41

change binary name to gguf-view

c69b9fc

Merge branch 'main' into xsn/gguf_cli

e8bc8ab

mishig25 approved these changes Feb 25, 2025

View reviewed changes

mishig25 reviewed Feb 25, 2025

View reviewed changes

packages/gguf/README.md Show resolved Hide resolved

mishig25 reviewed Feb 25, 2025

View reviewed changes

Update packages/gguf/README.md

004942e

ngxson merged commit f0518b3 into huggingface:main Feb 25, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf: add CLI #1221

gguf: add CLI #1221

ngxson commented Feb 24, 2025 •

edited

Loading

julien-c commented Feb 25, 2025 •

edited

Loading

julien-c left a comment

ngxson commented Feb 25, 2025

ngxson commented Feb 25, 2025

julien-c commented Feb 25, 2025

mishig25 left a comment

mishig25 Feb 25, 2025

ngxson Feb 25, 2025

gguf: add CLI #1221

gguf: add CLI #1221

Conversation

ngxson commented Feb 24, 2025 • edited Loading

julien-c commented Feb 25, 2025 • edited Loading

julien-c left a comment

Choose a reason for hiding this comment

ngxson commented Feb 25, 2025

ngxson commented Feb 25, 2025

julien-c commented Feb 25, 2025

mishig25 left a comment

Choose a reason for hiding this comment

mishig25 Feb 25, 2025

Choose a reason for hiding this comment

ngxson Feb 25, 2025

Choose a reason for hiding this comment

ngxson commented Feb 24, 2025 •

edited

Loading

julien-c commented Feb 25, 2025 •

edited

Loading