GitHub - ShelbyJenkins/llm_client: Rust library for integrating local LLMs (with llama.cpp) and external LLM APIs.

Table of Contents

About The Project
Getting Started
Roadmap
Contributing
License
Contact

A rust interface for the OpenAI API and Llama.cpp ./server API

A unified API for testing and integrating OpenAI and HuggingFace LLM models.
Load models from HuggingFace with just a URL.
Uses Llama.cpp server API rather than bindings, so as long as the Llama.cpp server API remains stable this project will remain usable.
Prebuilt agents - not chatbots - to unlock the true power of LLMs.

Easily switch between models and APIs

// Use an OpenAI model
let llm_definition = LlmDefinition::OpenAiLlm(OpenAiDef::Gpt35Turbo)

// Or use a model from hugging face
let llm_definition: LlmDefinition = LlmDefinition::LlamaLlm(LlamaDef::new(
    MISTRAL7BCHAT_MODEL_URL,
    LlamaPromptFormat::Mistral7BChat,
    Some(9001),  // Max tokens for model AKA context size
    Some(2),     // Number of threads to use for server
    Some(22),    // Layers to load to GPU. Dependent on VRAM
    Some(false), // This starts the llama.cpp server with embedding flag disabled
    Some(true),  // Logging enabled
));

let response = basic_text_gen::generate(
        &LlmDefinition::LlamaLlm(llm_definition),
        Some("Howdy!"),
    )
    .await?;
eprintln!(response)

Get deterministic responses from LLMs

if !boolean_classifier::classify(
        llm_definition,
        Some(hopefully_a_list),
        Some("Is the attached feature a list of content split into discrete entries?"),
    )
    .await?
    {
        panic!("{}, was not properly split into a list!", hopefully_a_list)
    }

Create embeddings*

let client_openai: ProviderClient =
    ProviderClient::new(&LlmDefinition::OpenAiLlm(OpenAiDef::EmbeddingAda002), None).await;

let _: Vec<Vec<f32>> = client_openai
    .generate_embeddings(
        &vec![
            "Hello, my dog is cute".to_string(),
            "Hello, my cat is cute".to_string(),
        ],
        Some(EmbeddingExceedsMaxTokensBehavior::Panic),
    )
    .await
    .unwrap();

Currently with limited support for llama.cpp

Start Llama.cpp via CLI

cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"

$ llama server listening at http://localhost:8080

cargo run -p llm_client --bin server_runner stop

Download HF models via CLI

cargo run -p llm_client --bin model_loader_cli --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"

Dependencies

async-openai is used to interact with the OpenAI API. A modifed version of the async-openai crate is used for the Llama.cpp server. If you just need an OpenAI API interface, I suggest using the async-openai crate.

Hugging Face's rust client is used for model downloads from the huggingface hub.

(back to top)

Getting Started

Step-by-step guide

Clone repo:

git clone https://github.com/ShelbyJenkins/llm_client.git
cd llm_client

Optional: Build devcontainer from llm_client/.devcontainer/devcontainer.json This will build out a dev container with nvidia dependencies installed.
Add llama.cpp:

git submodule init 
git submodule update

Build llama.cpp ( This is dependent on your hardware. Please see full instructions here):

// Example build for nvidia gpus
cd llm_client/src/providers/llama_cpp/llama_cpp
make LLAMA_CUDA=1

Test llama.cpp ./server

cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"

This will download and load the given model, and then start the server.

When you see llama server listening at http://localhost:8080, you can load the llama.cpp UI in your browser.

Stop the server with cargo run -p llm_client --bin server_runner stop.

Using OpenAi: Add a .env file in the llm_client dir with the var OPENAI_API_KEY=<key>

Examples

Roadmap

Handle the various prompt formats of LLM models more gracefully
Unit tests
Add additional classifier agents:
- many from many
- one from many
Implement all openai functionality with llama.cpp
More external apis (claude/etc)

(back to top)

Contributing

This is my first Rust crate. All contributions or feedback is more than welcomed!

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Shelby Jenkins - Here or Linkedin

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.devcontainer		.devcontainer
examples		examples
guides		guides
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.devcontainer

.devcontainer

examples

examples

guides

guides

src

src

tests

tests

.gitignore

.gitignore

.gitmodules

.gitmodules

Cargo.toml

Cargo.toml

README.md

README.md

Repository files navigation

A rust interface for the OpenAI API and Llama.cpp ./server API

Easily switch between models and APIs

Get deterministic responses from LLMs

Create embeddings*

Start Llama.cpp via CLI

Download HF models via CLI

Dependencies

Getting Started

Step-by-step guide

Examples

Roadmap

Contributing

License

Contact

About

Releases

Packages

Languages

ShelbyJenkins/llm_client

Folders and files

Latest commit

History

Repository files navigation

A rust interface for the OpenAI API and Llama.cpp ./server API

Easily switch between models and APIs

Get deterministic responses from LLMs

Create embeddings*

Start Llama.cpp via CLI

Download HF models via CLI

Dependencies

Getting Started

Step-by-step guide

Examples

Roadmap

Contributing

License

Contact

About

Topics

Resources

Stars

Watchers

Forks

Languages