Skip to content

Latest commit

 

History

History

yaml_guide

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

YAML Guide

TokenMonster supports YAML vocabularies for both creating custom vocabularies (vocabularies not trained by TokenMonster), and for editing existing TokenMonster vocabularies. You can import and export any TokenMonster vocabulary to and from YAML format with exportvocab from the training directory, or with the Python and Go libraries.

See example.yaml for a sample of the TokenMonster YAML vocabulary format.

convert_gpt2tokenizer.py converts the GPT2 Tokenizer from Hugging Face into a TokenMonster vocabulary. It runs faster, tokenizes better, and is a good example of how to import a vocabulary into TokenMonster format using YAML as an intermediary.