Which script were used for 4bit quantization? #100

alex4321 · 2023-05-11T17:34:51Z

As far as I can see the model don't use GGML weights here, is that correct? (did not checked yet).

If so - what script is used to make 4bit quantization here?

For instance - if I am going to try vicuna (which is LLAMA + some delta weights) I can generate full vicuna weights, but than I will need to perform quantization - which one is used in this library case?

alex4321 · 2023-05-11T18:46:37Z

Okay, I see it should be able to use gptq weights (safetensor format). I am facing another issues while trying to generate text, but that's offtopic for this issue.

So my problem should be solveable (well, at least if vicuna code itself is not different).

johnsmith0031 · 2023-05-12T02:48:08Z

Currently this repo does not have any code related to model quantization. I think you can use it from GPTQ repo or GPTQ-LLAMA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which script were used for 4bit quantization? #100

Which script were used for 4bit quantization? #100

alex4321 commented May 11, 2023

alex4321 commented May 11, 2023

johnsmith0031 commented May 12, 2023

Which script were used for 4bit quantization? #100

Which script were used for 4bit quantization? #100

Comments

alex4321 commented May 11, 2023

alex4321 commented May 11, 2023

johnsmith0031 commented May 12, 2023