DistillClassifier

About

DistillClassifier is a tool built on top of LLM-VM to easily generate synthetic data for classification tasks using LLMs for distilling LLM knowledge for classification task into much smaller and faster-to-run classification models.

This project was build for the ANARCHY October 2023 Hackathon. Checkout ANARCHY on their github and website.

Team Members:

Setup

clone the project from github

git clone https://github.com/daspartho/DistillClassifier

`cd` into the project

cd DistillClassifier

install LLM-VM

git clone https://github.com/anarchy-ai/LLM-VM.git
cd LLM-VM
pip3 install .
cd ..

install python dependencies

pip3 install -r requirements.txt

create an `.env` file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):

OPENAI_API_KEY=
HF_HUB_TOKEN=

Run

You can run the tool from command line like this:

python3 generation.py <columns> <n_examples> [-m <model>] [-f <filename>] [-r <repo>]

Arguments:

<columns>: Column information as a dictionary.
<n_examples>: Number of examples to be generated.
-m, --model: (Optional) Model name. Defaults to "chat_gpt".
-f, --filename: (Optional) Dataset filename. Defaults to "dataset.json".
-r, --repo: (Optional) HuggingFace repo ID". Defaults to "None"

Example:

python3 generation.py '{"text": "either spoiler or not spoiler text", "label": "if text is spoiler or not"}' 25 -m 'chat_gpt' -f 'dataset.json' -r 'spoiler_or_not'

or run the `demo.py` file directly:

python3 demo.py

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
demo_dataset.json		demo_dataset.json
generation.py		generation.py
logo.png		logo.png
requirements.txt		requirements.txt

License

daspartho/DistillClassifier

Folders and files

Latest commit

History

Repository files navigation

DistillClassifier

About

Team Members:

Setup

clone the project from github

cd into the project

install LLM-VM

install python dependencies

create an .env file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):

Run

You can run the tool from command line like this:

Arguments:

Example:

or run the demo.py file directly:

example output dataset:

LICENSE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

`cd` into the project

create an `.env` file and set OpenAI API key (if you want to use openai models) and Huggingface Hub Token (if you want to push the dataset to huggingface):

or run the `demo.py` file directly: