#

llm-inference

Here are 384 public repositories matching this topic...

nomic-ai / gpt4all

gpt4all: run open-source LLMs anywhere

Updated May 13, 2024
C++

microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated May 14, 2024
Jupyter Notebook

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated May 14, 2024
C++

mistralai / mistral-src

Reference implementation of Mistral AI 7B v0.1 model.

llm llm-inference mistralai

Updated Mar 18, 2024
Jupyter Notebook

BentoML

bentoml / BentoML

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 14, 2024
Python

Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated May 13, 2024
Python

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验。

llm llmops llm-serving llm-training llm-inference

Updated May 10, 2024
HTML

bentoml / OpenLLM

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated May 14, 2024
Python

superduperdb

SuperDuperDB / superduperdb

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated May 14, 2024
Python

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

falcon llama large-language-models llm local-inference llm-inference bamboo-7b

Updated Apr 29, 2024
C++

NVIDIA / GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

microservice gpu-acceleration nemo tensorrt rag triton-inference-server large-language-models llm llm-inference retrieval-augmented-generation

Updated May 10, 2024
Python

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated May 13, 2024
Python

databricks / dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

databricks llm generative-ai gen-ai llm-training llm-inference mosaic-ai

Updated May 1, 2024
Python

liltom-eth / llama2-webui

Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.

llm llm-inference llama2 llama-2

Updated Mar 22, 2024
Jupyter Notebook

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated May 14, 2024
Python

deepsparse

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

Updated May 6, 2024
Python

anarchy-ai / LLM-VM

irresponsible innovation. Try now at https://chat.dev/

machine-learning deep-learning artificial-intelligence distillation distillation-model llm llm-agent llm-training llm-inference llm-local

Updated Mar 1, 2024
Python

MorpheusAIs / Morpheus

Morpheus - A Network For Powering Smart Agents - Compute + Code + Capital + Community

ai ethereum smart-contracts compute agents llms llm-inference smart-agents

Updated Apr 24, 2024
JavaScript

yomo

yomorun / yomo

🦖 Stateful Serverless Framework for building Geo-distributed Edge AI Infra

serverless realtime webassembly stream-processing low-latency quic edge-computing geo-distributed geodistributedsystems yomo edge-ai distributed-cloud llm-inference function-calling

Updated May 13, 2024
Go

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

llm llm-inference

Updated Apr 18, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."