#

inference

Here are 1,186 public repositories matching this topic...

google / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

inference pytorch batching attention llama gemma model-serving tpu llm llm-inference llama2

Updated May 17, 2024
Python

inlab-geo / cofi

Common Framework for Inference

python inversion geoscience inference earth-science

Updated May 17, 2024
Python

google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

cpu neural-network inference multithreading simd matrix-multiplication neural-networks convolutional-neural-networks convolutional-neural-network inference-optimization mobile-inference

Updated May 17, 2024
C

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving mlops llm inferentia llmops llm-serving trainium

Updated May 17, 2024
Python

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

deep-learning inference pytorch

Updated May 16, 2024
Python

deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated May 16, 2024
Java

flozi00 / atra

An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker commands

chatbot speech transformers inference speech-recognition asr llm stable-diffusion

Updated May 16, 2024
Jupyter Notebook

huggingface.js

huggingface / huggingface.js

Utilities to use the Hugging Face Hub API

machine-learning inference hub api-client huggingface

Updated May 16, 2024
TypeScript

superduperdb

SuperDuperDB / superduperdb

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated May 16, 2024
Python

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

machine-learning compression deep-learning gpu inference pytorch zero data-parallelism model-parallelism mixture-of-experts pipeline-parallelism billion-parameters trillion-parameters

Updated May 17, 2024
Python

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated May 17, 2024
C++

ctesta01 / QualPrep

Study materials for taking the Harvard Biostatistics PhD Qualifying Exam, Summer 2024

statistics algorithms probability inference data-structures

Updated May 16, 2024
TeX

huggingface / text-generation-inference

Large Language Model Text Generation Inference

nlp bloom deep-learning inference pytorch falcon transformer gpt starcoder

Updated May 16, 2024
Python

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated May 16, 2024
C++

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

machine-learning cloud deep-learning gpu inference edge datacenter

Updated May 17, 2024
Python

inference

roboflow / inference

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.

Updated May 16, 2024
Python

typedb

vaticle / typedb

TypeDB: the polymorphic database powered by types

database polymorphic logic inference polymorphism knowledge-base type-system strongly-typed knowledge-representation reasoning typedb typeql

Updated May 16, 2024
Java

whisper.cpp

ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++

inference transformer speech-recognition openai speech-to-text whisper

Updated May 16, 2024
C

google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

android c-plus-plus calculator machine-learning framework computer-vision deep-learning inference pipeline-framework stream-processing video-processing perception mobile-development audio-processing graph-framework graph-based mediapipe

Updated May 16, 2024
C++

microsoft / aici

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated May 16, 2024
Rust

Improve this page

Add a description, image, and links to the inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."