Skip to content

krozzzis/smol_llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smol Language Model

It's like Large Language Model, but smol.

I don't use any dependencies in this projects(except for Rand). All subprojects will made from scratch and only in Rust.

Roadmap

  • BPE Tokenizer
  • Interactive tokenizer and vocab viewer in HTML/WASM
  • Simple language model(e.g. using Markov chain)
  • Telegram bot and web app for models
  • Simple neural network implementation
  • Neural network training/inference framework
  • Word embedding, word2vec
  • Interactive word similarity viewer in HTML/WASM
  • Generative model for text

Usage

Tokenizer

$ cargo run --bin tokenizer_cli

Markov chain

Interactive mode:

$ cargo run --bin markov_chain -- content/vocab.vcb content/corpus.txt