Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf: add CLI #1221

Merged
merged 4 commits into from
Feb 25, 2025
Merged

gguf: add CLI #1221

merged 4 commits into from
Feb 25, 2025

Conversation

ngxson
Copy link
Member

@ngxson ngxson commented Feb 24, 2025

Ref discussion: https://huggingface.slack.com/archives/C02CLHA19TL/p1740399079674399?thread_ts=1739968558.574099&cid=C02CLHA19TL

I'm trying with this command:

pnpm run build && npx . ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

Output:

* Dumping 36 key/value pair(s)
  Idx | Count  | Value                                                                            
  ----|--------|----------------------------------------------------------------------------------
    1 |      1 | version = 3                                                                      
    2 |      1 | tensor_count = 292                                                               
    3 |      1 | kv_count = 33                                                                    
    4 |      1 | general.architecture = "llama"                                                   
    5 |      1 | general.type = "model"                                                           
    6 |      1 | general.name = "Meta Llama 3.1 8B Instruct"                                      
    7 |      1 | general.finetune = "Instruct"                                                    
    8 |      1 | general.basename = "Meta-Llama-3.1"                                              
    9 |      1 | general.size_label = "8B"                                                        
   10 |      1 | general.license = "llama3.1"                                                     
   11 |      6 | general.tags = ["facebook","meta","pytorch","llama","llama-3","te...             
   12 |      8 | general.languages = ["en","de","fr","it","pt","hi","es","th"]                    
   13 |      1 | llama.block_count = 32                                                           
   14 |      1 | llama.context_length = 131072                                                    
   15 |      1 | llama.embedding_length = 4096                                                    
   16 |      1 | llama.feed_forward_length = 14336                                                
   17 |      1 | llama.attention.head_count = 32                                                  
   18 |      1 | llama.attention.head_count_kv = 8                                                
   19 |      1 | llama.rope.freq_base = 500000                                                    
   20 |      1 | llama.attention.layer_norm_rms_epsilon = 0.000009999999747378752                 
   21 |      1 | general.file_type = 15                                                           
   22 |      1 | llama.vocab_size = 128256                                                        
   23 |      1 | llama.rope.dimension_count = 128                                                 
   24 |      1 | tokenizer.ggml.model = "gpt2"                                                    
   25 |      1 | tokenizer.ggml.pre = "llama-bpe"                                                 
   26 | 128256 | tokenizer.ggml.tokens = ["!","\"","#","$","%","&","'","(",")","*","+",",",...    
   27 | 128256 | tokenizer.ggml.token_type = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1...
   28 | 280147 | tokenizer.ggml.merges = ["Ġ Ġ","Ġ ĠĠĠ","ĠĠ ĠĠ","ĠĠĠ Ġ","i n","Ġ t","Ġ ĠĠĠĠ...    
   29 |      1 | tokenizer.ggml.bos_token_id = 128000                                             
   30 |      1 | tokenizer.ggml.eos_token_id = 128009                                             
   31 |      1 | tokenizer.chat_template = "{{- bos_token }}\n{%- if custom_tools is defined ...  
   32 |      1 | general.quantization_version = 2                                                 
   33 |      1 | quantize.imatrix.file = "/models_out/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-...    
   34 |      1 | quantize.imatrix.dataset = "/training_dir/calibration_datav3.txt"                
   35 |      1 | quantize.imatrix.entries_count = 224                                             
   36 |      1 | quantize.imatrix.chunks_count = 125                                              

* Dumping 292 tensor(s)
  Idx | Num Elements | Shape                          | Data Type | Name                     
  ----|--------------|--------------------------------|-----------|--------------------------
    1 |           64 |     64,      1,      1,      1 | F32       | rope_freqs.weight        
    2 |    525336576 |   4096, 128256,      1,      1 | Q4_K      | token_embd.weight        
    3 |         4096 |   4096,      1,      1,      1 | F32       | blk.0.attn_norm.weight   
    4 |     58720256 |  14336,   4096,      1,      1 | Q6_K      | blk.0.ffn_down.weight

...(truncated)

For reference, here is the output of gguf_dump.py:

$ python gguf_dump.py ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf 
INFO:gguf-dump:* Loading: /Users/ngxson/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
* Dumping 36 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 292
      3: UINT64     |        1 | GGUF.kv_count = 33
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.type = 'model'
      6: STRING     |        1 | general.name = 'Meta Llama 3.1 8B Instruct'
      7: STRING     |        1 | general.finetune = 'Instruct'
      8: STRING     |        1 | general.basename = 'Meta-Llama-3.1'
      9: STRING     |        1 | general.size_label = '8B'
     10: STRING     |        1 | general.license = 'llama3.1'
     11: [STRING]   |        6 | general.tags
     12: [STRING]   |        8 | general.languages
     13: UINT32     |        1 | llama.block_count = 32
     14: UINT32     |        1 | llama.context_length = 131072
     15: UINT32     |        1 | llama.embedding_length = 4096
     16: UINT32     |        1 | llama.feed_forward_length = 14336
     17: UINT32     |        1 | llama.attention.head_count = 32
     18: UINT32     |        1 | llama.attention.head_count_kv = 8
     19: FLOAT32    |        1 | llama.rope.freq_base = 500000.0
     20: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     21: UINT32     |        1 | general.file_type = 15
     22: UINT32     |        1 | llama.vocab_size = 128256
     23: UINT32     |        1 | llama.rope.dimension_count = 128
     24: STRING     |        1 | tokenizer.ggml.model = 'gpt2'
     25: STRING     |        1 | tokenizer.ggml.pre = 'llama-bpe'
     26: [STRING]   |   128256 | tokenizer.ggml.tokens
     27: [INT32]    |   128256 | tokenizer.ggml.token_type
     28: [STRING]   |   280147 | tokenizer.ggml.merges
     29: UINT32     |        1 | tokenizer.ggml.bos_token_id = 128000
     30: UINT32     |        1 | tokenizer.ggml.eos_token_id = 128009
     31: STRING     |        1 | tokenizer.chat_template = '{{- bos_token }}\n{%- if custom_tools is defined %}\n    {%- s'
     32: UINT32     |        1 | general.quantization_version = 2
     33: STRING     |        1 | quantize.imatrix.file = '/models_out/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8'
     34: STRING     |        1 | quantize.imatrix.dataset = '/training_dir/calibration_datav3.txt'
     35: INT32      |        1 | quantize.imatrix.entries_count = 224
     36: INT32      |        1 | quantize.imatrix.chunks_count = 125
* Dumping 292 tensor(s)
      1:         64 |    64,     1,     1,     1 | F32     | rope_freqs.weight
      2:  525336576 |  4096, 128256,     1,     1 | Q4_K    | token_embd.weight
      3:       4096 |  4096,     1,     1,     1 | F32     | blk.0.attn_norm.weight
      4:   58720256 | 14336,  4096,     1,     1 | Q6_K    | blk.0.ffn_down.weight
      5:   58720256 |  4096, 14336,     1,     1 | Q4_K    | blk.0.ffn_gate.weight
      6:   58720256 |  4096, 14336,     1,     1 | Q4_K    | blk.0.ffn_up.weight
      7:       4096 |  4096,     1,     1,     1 | F32     | blk.0.ffn_norm.weight
      8:    4194304 |  4096,  1024,     1,     1 | Q4_K    | blk.0.attn_k.weight

@julien-c
Copy link
Member

julien-c commented Feb 25, 2025

I'm trying with this command:

pnpm run build && npx . ~/work/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

What will be the "official" command once it's released?

UPDATE: npx @huggingface/gguf my_model.gguf

Copy link
Member

@julien-c julien-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice, thanks

@ngxson
Copy link
Member Author

ngxson commented Feb 25, 2025

I deployed a test version of the package in my namespace too:

npx @ngxson/gguf-test Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

User can even install the package as global via npm i -g and use the CLI via gguf-dump command

@ngxson
Copy link
Member Author

ngxson commented Feb 25, 2025

I changed the binary name to gguf-view so it won't clash if user already had gguf-dump from pip install gguf

Small note @mishig25 , we briefly discussed about console.table this morning. While the output looks nice, it lacks control for number alignment (i.e. numbers must be on the right of cell). So I'm sticking with the DIY version for now 😂

@julien-c
Copy link
Member

So I'm sticking with the DIY version for now 😂

perfectly fine IMO, and btw we might already have an implementation for this somewhere (i'm pretty sure i did some in the past)

Copy link
Collaborator

@mishig25 mishig25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

allowLocalFile: true,
});

// TODO: print info about endianess
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this todo ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, would be nice if we can make it outputs the same as gguf-py script

@ngxson ngxson merged commit f0518b3 into huggingface:main Feb 25, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants