Loading quant_bert pretrained weights for MRPC #150

amrnag01 · 2020-03-23T19:31:09Z

When I run the following command to fine-tune Quantized BERT on MRPC,
nlp-train transformer_glue
--task_name mrpc
--model_name_or_path bert-base-uncased
--model_type quant_bert
--learning_rate 2e-5
--output_dir /tmp/mrpc-8bit
--evaluate_during_training
--data_dir /path/to/MRPC
--do_lower_case

I get the following message:
INFO Weights of QuantizedBertForSequenceClassification not initialized from pretrained model: ['bert.embeddings.word_embeddings._step', 'bert.embeddings.position_embeddings._step', 'bert.embeddings.token_type_embeddings._step', 'bert.encoder.layer.0.attention.self.query._step', 'bert.encoder.layer.0.attention.self.query.input_thresh', 'bert.encoder.layer.0.attention.self.query.output_thresh', 'bert.encoder.layer.0.attention.self.key._step', 'bert.encoder.layer.0.attention.self.key.input_thresh', 'bert.encoder.layer.0.attention.self.key.output_thresh', 'bert.encoder.layer.0.attention.self.value._step', 'bert.encoder.layer.0.attention.self.value.input_thresh', 'bert.encoder.layer.0.attention.output.dense._step', 'bert.encoder.layer.0.attention.output.dense.input_thresh', 'bert.encoder.layer.0.intermediate.dense._step', 'bert.encoder.layer.0.intermediate.dense.input_thresh', 'bert.encoder.layer.0.output.dense._step', 'bert.encoder.layer.0.output.dense.input_thresh', 'bert.encoder.layer.1.attention.self.query._step',

... for all the layers. Can you please help figure out why all the weights are not initialized from the pretrained model? It works when I set model_type to bert instead of quant_bert.
Thanks a lot.

ofirzaf · 2020-04-16T13:29:56Z

Note that this message says that the input/output_threshold and _step attributes are not initialized from the pre-trained model which is OK since the pre-trained model wasn't trained with quantization in mind. If the weights weren't initialized you would see the list bert.encoder.layer.1.attention.self.query.weight and bert.encoder.layer.1.attention.self.query.bias.

The quantized FC layers that are used in the quantized BERT model requires more information such as the input threshold and the output threshold (the threshold is used to quantize the input and output tensors) which are not available in the pre-trained model.

Meaning everything is working correctly for you and when you will load a model you trained with quantization for inference you would see that these attributes are loaded from the quantized model.

peteriz assigned ofirzaf Mar 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading quant_bert pretrained weights for MRPC #150

Loading quant_bert pretrained weights for MRPC #150

amrnag01 commented Mar 23, 2020 •

edited

ofirzaf commented Apr 16, 2020

Loading quant_bert pretrained weights for MRPC #150

Loading quant_bert pretrained weights for MRPC #150

Comments

amrnag01 commented Mar 23, 2020 • edited

ofirzaf commented Apr 16, 2020

amrnag01 commented Mar 23, 2020 •

edited