Speculative Decoding with Big Little Decoder (BiLD)

This repo implements Speculative Decoding with Big Little Decoder (BiLD) on top of the HuggingFace framework.

Check out the paper for more details.

What is Big Little Decoder?

Big Little Decoder is a simple framework that enables faster generative inference. It can dramatically accelerate text generation by ~2x, without compromising performance on a variety of text generation scenarios. Furthermore, it is a simple plug-and-play solution that requires no training or architecture redesign.

Here's the key underlying idea:

BiLD offloads the majority of simple word decisions to a smaller model, and only switches the control back to the larger model when needed.
The small model "fallbacks" to the large model, when it runs into a hard-to-predict word.
In case the small model makes a misstep, the larger model can "rollback" the predictions to correct the error
This collaborative text generation combines the small model's fast autoregressive execution with the large model's accurate and efficient non-autoregressive execution!

Running BiLD for Machine Translation

Prerequisite

You need to prepare your own large and small models. You can either use HuggingFace's pretrained models or finetune them on your target tasks. Please refer to the HuggingFace's official instructions for more detail on loading and/or finetuning pretrained models.

Evaluation

We provide a script that evaluates BiLD on machine translation tasks: examples/pytorch/run_bild_translation.py.

BiLD evaluation command:

CUDA_VISIBLE_DEVICES=0 python run_bild_translation.py --model bild --small [small_model_path] --large [large_model_path] \
    --dataset_name iwslt2017 --dataset_config iwslt2017-de-en --source_lang de --target_lang en --bild_rollback [RB] --bild_fallback [FB]

This command runs bild on the IWSLT 2017 De-En translation task.
[small_model_path] and [large_model_path] are paths to the small and the large model, respectively (prepared as prerequisite).
[RB] is the rollback threshold (normally 2~5 works fine). [FB] is the fallback threshold that can have a value from 0 to 1. For more details of these two hyperparameters, please refer to our paper.

We also provide a command for running the baseline model:

CUDA_VISIBLE_DEVICES=0 python run_bild_translation.py --model [model_path] \
    --dataset_name iwslt2017 --dataset_config iwslt2017-de-en --source_lang de --target_lang en

[model_path] is the path to the baseline model (e.g. [small_model_path] or [large_model_path])

Pretrained Checkpoints

We provide finetuned checkpoints that were used for the evaluations in our paper.

Dataset	Model	Link
IWSLT-2017-De-En	mT5-small	link
IWSLT-2017-De-En	mT5-small (aligned)	link
IWSLT-2017-De-En	mT5-large	link
WMT-2014-De-En	mT5-small	link
WMT-2014-De-En	mT5-small (aligned)	link
WMT-2014-De-En	mT5-large	link
XSUM	T5-small	link
XSUM	T5-small (aligned)	link
XSUM	T5-large	link
CNNDM	T5-small	link
CNNDM	T5-small (aligned)	link
CNNDM	T5-large	link

Name		Name	Last commit message	Last commit date
Latest commit History 11,217 Commits
.circleci		.circleci
.github		.github
docker		docker
docs		docs
examples		examples
model_cards		model_cards
notebooks		notebooks
scripts		scripts
src/transformers		src/transformers
templates		templates
tests		tests
utils		utils
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
ISSUES.md		ISSUES.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README_es.md		README_es.md
README_ja.md		README_ja.md
README_ko.md		README_ko.md
README_zh-hans.md		README_zh-hans.md
README_zh-hant.md		README_zh-hant.md
conftest.py		conftest.py
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

License

kssteven418/BigLittleDecoder

Folders and files

Latest commit

History

Repository files navigation

Speculative Decoding with Big Little Decoder (BiLD)

What is Big Little Decoder?

Running BiLD for Machine Translation

Prerequisite

Evaluation

Pretrained Checkpoints

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages