data type during inference? #641

RanchiZhao · 2024-04-15T08:12:26Z

RanchiZhao
Apr 15, 2024

I would like to know, when using GPTQ for inference, if my model weights have been quantized into the int8 data type, whether the matrix operations performed during inference are int8int8, or if they are converted back to fp16fp16 after dequantization. Or, is it int8 (weight matrix) * fp16 (activation matrix)? I know dequantization is necessary, but I am not clear on exactly which layer the dequantization occurs.

Answered by RanchiZhao

Apr 15, 2024

w4a16/w8a16?

View full answer

RanchiZhao · 2024-04-15T08:36:25Z

RanchiZhao
Apr 15, 2024
Author

w4a16/w8a16?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data type during inference? #641

{{title}}

Replies: 1 comment

{{title}}

Select a reply

data type during inference? #641

RanchiZhao Apr 15, 2024

Replies: 1 comment

RanchiZhao Apr 15, 2024 Author

RanchiZhao
Apr 15, 2024

RanchiZhao
Apr 15, 2024
Author