Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging LoRA after finetune #145

Open
gameveloster opened this issue Aug 7, 2023 · 1 comment
Open

Merging LoRA after finetune #145

gameveloster opened this issue Aug 7, 2023 · 1 comment

Comments

@gameveloster
Copy link

Some questions about merging the LoRA back to the base model...

  1. Should the LoRA finetuned on the 4-bit GPTQ model be merged back to the fp16 version of the same model?
  2. When merging the LoRA into the fp16 model, is it recommended to use the PeftModel.merge_and_unload method?
  3. Do you expect the generation speed to increase when using the merged model that is GPTQ'ed after merging, compared to the base GPTQ model with LoRA applied on top of it?
@johnsmith0031
Copy link
Owner

  1. Technically you can do it but the performance would be worse than that on the model where the lora is originally trained.
  2. Not sure about it.
  3. I think you can try exllama. It's inference speed is very fast (even with lora, it's still faster than the original fp16 model).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants