CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

This repo contains our codes for the paper "CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing" (ACL 2022). We propose a parameter-efficient ensemble approach for large-scale language models based on consistency-regularized perturbed models with weight sharing.

Getting Start

Pull and run docker
pytorch/pytorch:1.5.1-cuda10.1-cudnn7-devel
Install requirements
pip install -r requirements.txt

Data and Model

Download data and pre-trained models following download.sh. Please refer to this link for details on the GLUE benchmark.
Preprocess data following experiments/glue/prepro.sh. For the most updated data processing details, please refer to the mt-dnn repo.

Training CAMERO

We provide several example scripts for fine-tuning consistency regularized ensemble of perturbed models with weight-sharing. To fine-tune consistency regularized ensemble of perturbed BERT-base models on MNLI dataset, run

./scripts/train_mnli.sh GPUID

CAMERO has several important hyper-parameters that you can play with:

--n_models: The number of models, e.g., 2 and 4.
--teaching_type: The types consistency regularization.
- "ensemble": the consistency loss is computed based on the average distance between the ensemble of all models' logits and individual models' logits.
- "pairwise": the consistency loss is computed based on the average distance between every two models' logits.
--pert_type: The types of perturbation added to the models' hidden representations.
- "dropout": dropout (Srivastava et al., 2014).
- "adv": virtual adversarial perturbation (Jiang et al., 2019).
- "r3f": random noise perturbation (Aghajanyan et al., 2020).
  Using "dropout" is often sufficient to get good results. "adv" may lead to better results in certain tasks but require longer training time.
--kd_alpha: The weight of consistency loss. Sensitive to the type of tasks.

A few other notices:

To fine-tune a RoBERTa model, download the model checkpoint following download.sh, set --init_checkpoint to the checkpoint path and set --encoder_type to 2. Other supported models are listed in pretrained_models.py.
To fine-tune models on other tasks, set --train_datasets and --test_datasets to the corresponding task names.
All models share their encoder weights. The final saved checkpoint is a single encoder with n_models classification heads.

Citation

Coming out soon.

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang (cliang73@gatech.edu).

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_utils		data_utils
docker		docker
experiments		experiments
module		module
mt_dnn		mt_dnn
scripts		scripts
tasks		tasks
LICENSE		LICENSE
README.md		README.md
calc_metrics.py		calc_metrics.py
collaborative_train.py		collaborative_train.py
download.sh		download.sh
predict.py		predict.py
prepro_std.py		prepro_std.py
pretrained_models.py		pretrained_models.py
requirements.txt		requirements.txt

cliang1453/CAMERO

Folders and files

Latest commit

History

Repository files navigation

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

Getting Start

Data and Model

Training CAMERO

Citation

Contact Information

About

Topics

Resources

Stars

Watchers

Forks

Languages