You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And it only generates garbage. I'm just doing inference of the model, not training yet.
This is an example output:
Loading Model ...
The safetensors archive passed at ./models/llama-13b-4bit/llama-13b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
Loaded the model in 6.03 seconds.
Fitting 4bit scales and zeros to half
Apply AMP Wrapper ...
I think the meaning of life isbahrist InitSTMo�Ãbahbah�OF MomoãbahSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTSTMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMoMo
Any guidance about how to load those models?
The text was updated successfully, but these errors were encountered:
File "/home/guest/ai/finetune/alpaca_lora_4bit/matmul_utils_4bit.py", line 131, in matmul4bit
output = _matmul4bit_v2(x, qweight, scales, zeros, g_idx)
File "/home/guest/ai/finetune/alpaca_lora_4bit/matmul_utils_4bit.py", line 73, in _matmul4bit_v2
quant_cuda.vecquant4matmul(x, qweight, y, scales, zeros, g_idx)
RuntimeError: expected scalar type Float but found Half
Hi! I'm trying to load the model at https://huggingface.co/Neko-Institute-of-Science/LLaMA-13B-4bit-128g
And it only generates garbage. I'm just doing inference of the model, not training yet.
This is an example output:
Any guidance about how to load those models?
The text was updated successfully, but these errors were encountered: