Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Loading quant_bert pretrained weights for MRPC #150

Open
amrnag01 opened this issue Mar 23, 2020 · 1 comment
Open

Loading quant_bert pretrained weights for MRPC #150

amrnag01 opened this issue Mar 23, 2020 · 1 comment
Assignees

Comments

@amrnag01
Copy link

amrnag01 commented Mar 23, 2020

When I run the following command to fine-tune Quantized BERT on MRPC,
nlp-train transformer_glue
--task_name mrpc
--model_name_or_path bert-base-uncased
--model_type quant_bert
--learning_rate 2e-5
--output_dir /tmp/mrpc-8bit
--evaluate_during_training
--data_dir /path/to/MRPC
--do_lower_case

I get the following message:
INFO Weights of QuantizedBertForSequenceClassification not initialized from pretrained model: ['bert.embeddings.word_embeddings._step', 'bert.embeddings.position_embeddings._step', 'bert.embeddings.token_type_embeddings._step', 'bert.encoder.layer.0.attention.self.query._step', 'bert.encoder.layer.0.attention.self.query.input_thresh', 'bert.encoder.layer.0.attention.self.query.output_thresh', 'bert.encoder.layer.0.attention.self.key._step', 'bert.encoder.layer.0.attention.self.key.input_thresh', 'bert.encoder.layer.0.attention.self.key.output_thresh', 'bert.encoder.layer.0.attention.self.value._step', 'bert.encoder.layer.0.attention.self.value.input_thresh', 'bert.encoder.layer.0.attention.output.dense._step', 'bert.encoder.layer.0.attention.output.dense.input_thresh', 'bert.encoder.layer.0.intermediate.dense._step', 'bert.encoder.layer.0.intermediate.dense.input_thresh', 'bert.encoder.layer.0.output.dense._step', 'bert.encoder.layer.0.output.dense.input_thresh', 'bert.encoder.layer.1.attention.self.query._step',

... for all the layers. Can you please help figure out why all the weights are not initialized from the pretrained model? It works when I set model_type to bert instead of quant_bert.
Thanks a lot.

@ofirzaf
Copy link
Collaborator

ofirzaf commented Apr 16, 2020

Note that this message says that the input/output_threshold and _step attributes are not initialized from the pre-trained model which is OK since the pre-trained model wasn't trained with quantization in mind. If the weights weren't initialized you would see the list bert.encoder.layer.1.attention.self.query.weight and bert.encoder.layer.1.attention.self.query.bias.

The quantized FC layers that are used in the quantized BERT model requires more information such as the input threshold and the output threshold (the threshold is used to quantize the input and output tensors) which are not available in the pre-trained model.

Meaning everything is working correctly for you and when you will load a model you trained with quantization for inference you would see that these attributes are loaded from the quantized model.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants