llm-inference
Here are 384 public repositories matching this topic...
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
-
Updated
May 14, 2024 - Jupyter Notebook
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
-
Updated
May 14, 2024 - C++
Reference implementation of Mistral AI 7B v0.1 model.
-
Updated
Mar 18, 2024 - Jupyter Notebook
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
-
Updated
May 14, 2024 - Python
Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
-
Updated
May 13, 2024 - Python
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.
-
Updated
May 14, 2024 - Python
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
-
Updated
May 14, 2024 - Python
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
-
Updated
Apr 29, 2024 - C++
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
-
Updated
May 10, 2024 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
-
Updated
May 13, 2024 - Python
Code examples and resources for DBRX, a large language model developed by Databricks
-
Updated
May 1, 2024 - Python
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
-
Updated
Mar 22, 2024 - Jupyter Notebook
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
-
Updated
May 14, 2024 - Python
Sparsity-aware deep learning inference runtime for CPUs
-
Updated
May 6, 2024 - Python
irresponsible innovation. Try now at https://chat.dev/
-
Updated
Mar 1, 2024 - Python
Morpheus - A Network For Powering Smart Agents - Compute + Code + Capital + Community
-
Updated
Apr 24, 2024 - JavaScript
🦖 Stateful Serverless Framework for building Geo-distributed Edge AI Infra
-
Updated
May 13, 2024 - Go
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
-
Updated
Apr 18, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."