How can I use GPTQ+ LoRA method to finetune, and directly merge GPTQ and LoRA modules to int4, and do inference? #639
Unanswered
RanchiZhao
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
quite like QA-LoRA:https://github.com/yuhuixu1993/qa-lora, but I wonder how to do inference using int4, instead of fp16/bf16, because i want to accelerate the inference stage.
Beta Was this translation helpful? Give feedback.
All reactions