How can I use GPTQ+ LoRA method to finetune, and directly merge GPTQ and LoRA modules to int4, and do inference? #639

RanchiZhao · 2024-04-12T08:02:19Z

RanchiZhao
Apr 12, 2024

quite like QA-LoRA:https://github.com/yuhuixu1993/qa-lora, but I wonder how to do inference using int4, instead of fp16/bf16, because i want to accelerate the inference stage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I use GPTQ+ LoRA method to finetune, and directly merge GPTQ and LoRA modules to int4, and do inference? #639

{{title}}

Replies: 0 comments

Select a reply

How can I use GPTQ+ LoRA method to finetune, and directly merge GPTQ and LoRA modules to int4, and do inference? #639

RanchiZhao Apr 12, 2024

Replies: 0 comments

RanchiZhao
Apr 12, 2024