A Local GenAI API Server: A drop-in replacement for OpenAI's API for Local GenAI

EdgenChat, a local chat app powered by ⚡Edgen

OpenAI Compliant API: ⚡Edgen implements an OpenAI compatible API, making it a drop-in replacement.
Multi-Endpoint Support: ⚡Edgen exposes multiple AI endpoints such as chat completions (LLMs) and speech-to-text (Whisper) for audio transcriptions.
Model Agnostic: LLMs (Llama2, Mistral, Mixtral...), Speech-to-text (whisper) and many others.
Optimized Inference: You don't need to take a PhD in AI optimization. ⚡Edgen abstracts the complexity of optimizing inference for different hardware, platforms and models.
Modular: ⚡Edgen is model and runtime agnostic. New models can be added easily and ⚡Edgen can select the best runtime for the user's hardware: you don't need to keep up about the latest models and ML runtimes - ⚡Edgen will do that for you.
Model Caching: ⚡Edgen caches foundational models locally, so 1 model can power hundreds of different apps - users don't need to download the same model multiple times.
Native: ⚡Edgen is built in 🦀Rust and is natively compiled to all popular platforms: Windows, MacOS and Linux. No docker required.
Graphical Interface: A graphical user interface to help users efficiently manage their models, endpoints and permissions.

⚡Edgen lets you use GenAI in your app, completely locally on your user's devices, for free and with data-privacy. It's a drop-in replacement for OpenAI (it uses the a compatible API), supports various functions like text generation, speech-to-text and works on Windows, Linux, and MacOS.

Features

Session Caching: ⚡Edgen maintains top performance with big contexts (big chat histories), by caching sessions. Sessions are auto-detected in function of the chat history.
GPU support: CUDA, Vulkan. Metal

Endpoints

[Chat] Completions
[Audio] Transcriptions
[Embeddings] Embeddings
[Image] Generation
[Chat] Multimodal chat completions
[Audio] Speech

Supported Models

Check in the documentation

Supported platforms

Windows
Linux
MacOS

🔥 Hot Topics

Why local GenAI?

Data Private: On-device inference means users' data never leave their devices.
Scalable: More and more users? No need to increment cloud computing infrastructure. Just let your users use their own hardware.
Reliable: No internet, no downtime, no rate limits, no API keys.
Free: It runs locally on hardware the user already owns.

Quickstart

Download and start ⚡Edgen
Chat with ⚡EdgenChat

Ready to start your own GenAI application? Checkout our guides!

⚡Edgen usage:

Usage: edgen [<command>] [<args>]

Toplevel CLI commands and options. Subcommands are optional. If no command is provided "serve" will be invoked with default options.

Options:
  --help            display usage information

Commands:
  serve             Starts the edgen server. This is the default command when no
                    command is provided.
  config            Configuration-related subcommands.
  version           Prints the edgen version to stdout.
  oasgen            Generates the Edgen OpenAPI specification.

edgen serve usage:

Usage: edgen serve [-b <uri...>] [-g]

Starts the edgen server. This is the default command when no command is provided.

Options:
  -b, --uri         if present, one or more URIs/hosts to bind the server to.
                    `unix://` (on Linux), `http://`, and `ws://` are supported.
                    For use in scripts, it is recommended to explicitly add this
                    option to make your scripts future-proof.
  -g, --nogui       if present, edgen will not start the GUI; the default
                    behavior is to start the GUI.
  --help            display usage information

GPU Support

⚡Edgen also supports compilation and execution on a GPU, when building from source, through Vulkan, CUDA and Metal. The following cargo features enable the GPU:

llama_vulkan - execute LLM models using Vulkan. Requires a Vulkan SDK to be installed.
llama_cuda - execute LLM models using CUDA. Requires a CUDA Toolkit to be installed.
llama_metal - execute LLM models using Metal.
whisper_cuda - execute Whisper models using CUDA. Requires a CUDA Toolkit to be installed.

Note that, at the moment, llama_vulkan, llama_cuda and llama_metal cannot be enabled at the same time.

Example usage (building from source, you need to first install the prerequisites):

cargo run --features llama_vulkan --release -- serve

Architecture Overview

⚡Edgen architecture overview

Contribute

If you don't know where to start, check Edgen's roadmap! Before you start working on something, see if there's an existing issue/pull-request. Pop into Discord to check with the team or see if someone's already tackling it.

Communication Channels

Edgen Discord server: Real time discussions with the ⚡Edgen team and other users.
GitHub issues: Feature requests, bugs.
GitHub discussions: Q&A.
Blog: Big announcements.

Special Thanks

llama.cpp, whisper.cpp, and ggml for being an excellent getting-on point for this space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A Local GenAI API Server: A drop-in replacement for OpenAI's API for Local GenAI

Features

Endpoints

Supported Models

Supported platforms

🔥 Hot Topics

Why local GenAI?

Quickstart

GPU Support

Architecture Overview

Contribute

Communication Channels

Special Thanks

Files

README.md

Latest commit

History

README.md

File metadata and controls

A Local GenAI API Server: A drop-in replacement for OpenAI's API for Local GenAI

Features

Endpoints

Supported Models

Supported platforms

🔥 Hot Topics

Why local GenAI?

Quickstart

GPU Support

Architecture Overview

Contribute

Communication Channels

Special Thanks