Convert vocabulary types and load model concurrently #832

rlouf · 2024-04-21T11:07:35Z

Converting vocabulary types and JIT-compiling Numba functions can represent a substantial amount of the compile time for very simple regular expressions. In particular, type conversion needs to happen every time the code is run and takes a few seconds at each session. Here we perform these operations while the model weights are downloaded and loaded on GPU, thus removing some of the overhead associated with the index compilation.

Closes #768.

rlouf · 2024-04-24T11:31:24Z

A few API tweaks are necessary to implement this properly. First, the code that transforms a regex into an index should be decomposed in:

A function that takes a regex as an argument and returns a byte-level deterministic FSM;
A function that adapts the vocabulary and then converts the vocabulary types
A function that takes the converted vocabulary, tokenizer and returns an index

We then initialize model wrappers (e.g. outlines.models.vLLM) with the model instance and the converted vocabulary. Initializing functions (e.g. outlines.model.vllm) accept a model name or a model class.

rlouf added the structured generation Linked to structured generation label Apr 21, 2024

rlouf added 2 commits April 24, 2024 13:59

[temp] remove cache and add profiling script

d5068d6

Convert vocabulary types and load model concurrently

6deef74

rlouf force-pushed the move-vocab-type-conversion branch from 29742a9 to 6deef74 Compare April 24, 2024 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert vocabulary types and load model concurrently #832

Convert vocabulary types and load model concurrently #832

rlouf commented Apr 21, 2024

rlouf commented Apr 24, 2024 •

edited

Convert vocabulary types and load model concurrently #832

Are you sure you want to change the base?

Convert vocabulary types and load model concurrently #832

Conversation

rlouf commented Apr 21, 2024

rlouf commented Apr 24, 2024 • edited

rlouf commented Apr 24, 2024 •

edited