How to estimate VRAM requirement for GPTQ Quantization? #356

attkap · 2023-09-27T15:27:33Z

attkap
Sep 27, 2023

I am trying to quantize Llama-2-7b-hf using GPTQ with a bitsize of 4 on the c4 dataset, but I keep getting an OOM error.

I am using a Nvidia 4090 with 24GB VRAM. Inference of the base model only takes up about 12GB of VRAM, so I would have expected for it to be no problem...

Can anyone explain to me how much VRAM I need to quantize successfully?

Answered by TheBloke

Sep 27, 2023

At 4096 sequence length you will need less than 24GB VRAM to quantise 7B. You need more than 24GB VRAM to quantise 13B at 4096, unless cache_examples_on_gpu=False is used.

Try my quantising wrapper script: https://github.com/TheBlokeAI/AIScripts/blob/main/quant_autogptq.py

Run with params:

python3 ./quant_autogptq.py meta-llama/Llama-2-7b-hf llama-2-7b-hf-gptq c4 --bits 4 --group_size 128 --desc_act 1 --damp 0.1 --seqlen 4096

It should work fine. Adjust the GPTQ params if you want different parameters, eg group_size 32.

View full answer

TheBloke · 2023-09-27T15:30:47Z

TheBloke
Sep 27, 2023

At 4096 sequence length you will need less than 24GB VRAM to quantise 7B. You need more than 24GB VRAM to quantise 13B at 4096, unless cache_examples_on_gpu=False is used.

Try my quantising wrapper script: https://github.com/TheBlokeAI/AIScripts/blob/main/quant_autogptq.py

Run with params:

python3 ./quant_autogptq.py meta-llama/Llama-2-7b-hf llama-2-7b-hf-gptq c4 --bits 4 --group_size 128 --desc_act 1 --damp 0.1 --seqlen 4096

It should work fine. Adjust the GPTQ params if you want different parameters, eg group_size 32.

1 reply

attkap Sep 27, 2023
Author

Worked like a charm - thanks @TheBloke! Big fan of your work in general❤️

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to estimate VRAM requirement for GPTQ Quantization? #356

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to estimate VRAM requirement for GPTQ Quantization? #356

attkap Sep 27, 2023

Replies: 1 comment · 1 reply

TheBloke Sep 27, 2023

attkap Sep 27, 2023 Author

attkap
Sep 27, 2023

Replies: 1 comment 1 reply

TheBloke
Sep 27, 2023

attkap Sep 27, 2023
Author