data type during inference? #641
Answered
by
RanchiZhao
RanchiZhao
asked this question in
Q&A
-
I would like to know, when using GPTQ for inference, if my model weights have been quantized into the int8 data type, whether the matrix operations performed during inference are int8int8, or if they are converted back to fp16fp16 after dequantization. Or, is it int8 (weight matrix) * fp16 (activation matrix)? I know dequantization is necessary, but I am not clear on exactly which layer the dequantization occurs. |
Beta Was this translation helpful? Give feedback.
Answered by
RanchiZhao
Apr 15, 2024
Replies: 1 comment
-
w4a16/w8a16? |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
RanchiZhao
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
w4a16/w8a16?