Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert vocabulary types and load model concurrently #832

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Apr 21, 2024

Converting vocabulary types and JIT-compiling Numba functions can represent a substantial amount of the compile time for very simple regular expressions. In particular, type conversion needs to happen every time the code is run and takes a few seconds at each session. Here we perform these operations while the model weights are downloaded and loaded on GPU, thus removing some of the overhead associated with the index compilation.

Closes #768.

@rlouf rlouf added the structured generation Linked to structured generation label Apr 21, 2024
@rlouf
Copy link
Member Author

rlouf commented Apr 24, 2024

A few API tweaks are necessary to implement this properly. First, the code that transforms a regex into an index should be decomposed in:

  1. A function that takes a regex as an argument and returns a byte-level deterministic FSM;
  2. A function that adapts the vocabulary and then converts the vocabulary types
  3. A function that takes the converted vocabulary, tokenizer and returns an index

We then initialize model wrappers (e.g. outlines.models.vLLM) with the model instance and the converted vocabulary. Initializing functions (e.g. outlines.model.vllm) accept a model name or a model class.

@rlouf rlouf force-pushed the move-vocab-type-conversion branch from 29742a9 to 6deef74 Compare April 24, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
structured generation Linked to structured generation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up index construction by converting vocabulary types while loading the model
1 participant