Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

question: How can I quantize BERT to FP16 ? #104

Open
hexiaoyupku opened this issue Oct 31, 2019 · 1 comment
Open

question: How can I quantize BERT to FP16 ? #104

hexiaoyupku opened this issue Oct 31, 2019 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@hexiaoyupku
Copy link

I have only P100 and V100 which dosen't support INT8. So what should I do to quantize BERT to FP16 ?
Thanks in advance!

@hexiaoyupku hexiaoyupku added the question Further information is requested label Oct 31, 2019
@ofirzaf
Copy link
Collaborator

ofirzaf commented Nov 5, 2019

Our framework for the time being only simulates quantized inference, meaning, that the quantized GEMM operations are still done using FP32 arithmetic with integer values.
We don't have quantization aware training for FP16 implemented.
What you can do is train BERT regularly in FP32 and than convert the model to FP16 by calling half() method on the model.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants