multimodal

Here are 662 public repositories matching this topic...

isLinXu / paper-list

autoupdate paper list

reinforcement-learning classification image-generation object-detection transfer-learning optical-flow object-tracking semantic-segmentation action-recognition audio-processing pose-estimation depth-estimation anomaly-detection multimodal scene-understanding graph-neural-networks llm

Updated May 16, 2024
Python

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated May 16, 2024
Rust

parsee-ai / parsee-core

Star

Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

structured-data document-processing multimodal llm

Updated May 16, 2024
Python

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated May 16, 2024
TypeScript

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated May 16, 2024
Python

xlang-ai / OSWorld

Star

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated May 16, 2024
Python

jina-ai / jina

Star

☁️ Build multimodal AI applications with cloud-native stack

Updated May 16, 2024
Python

parvbhullar / superpilot

Star

LLMs based multi-model framework for building AI apps.

ai prompt-toolkit ai-assistants ai-agents multimodal gpt-3 gpt4 midjourney stable-diffusion langchain llmops autogpt llama2 ai-agents-framework

Updated May 16, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs

Updated May 16, 2024
Python

bentoml / BentoML

Star

The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 16, 2024
Python

zorin-egor / ArchitectureComponentsSample

Star

Sample multi modals app developed with android architecture components, conventions plugins

android plugins jetpack compose multimodal build-logic

Updated May 16, 2024
Kotlin

abdulhakkeempa / AccentAce

Star

This is a simple application that generates scripts for the user to read. Based on the audio, the application would provide a score for their pronunciation and suggest possible methods to improve it.

docker nextjs google-cloud multimodal fastapi llm generative-ai gen-ai gemini-pro

Updated May 16, 2024
JavaScript

TIGER-AI-Lab / Mantis

Star

Official code for Paper "Mantis: Multi-Image Instruction Tuning"

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated May 16, 2024
Python

smalltong02 / keras-llm-robot

Star

A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.

text-to-speech chatbot gemini knowledgebase speech-to-text vectorization multimodal faiss rag milvus streamlit llm code-interpreter chatgpt pgvector fastchat

Updated May 16, 2024
Python

kyegomez / BitNet

Sponsor

Star

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

machine-learning deep-neural-networks artificial-intelligence deeplearning multimodal multimodal-deep-learning gpt4

Updated May 16, 2024
Python

pixeltable / pixeltable

Star

Data Infrastructure for Multimodal AI: Data, models, and orchestration in a unified declarative interface.

data-science machine-learning database ai computer-vision chatbot ml artificial-intelligence multimodal vector-database llm genai

Updated May 16, 2024
Python

dusty-nv / NanoLLM

Star

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference

Updated May 16, 2024
Python

InternLM / HuixiangDou

Star

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

application ocr robot pipeline dsl chatbot wechat assistance lark multimodal rag llm

Updated May 16, 2024
Python

PaddlePaddle / PaddleMIX

Star

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

image-to-text clip text-to-image dit multimodal sora text-to-video aigc stable-diffusion controlnet llava blip2 minigpt4 sd-xl ppdiffusers eva-clip stablevideodiffusion qwen-vl

Updated May 16, 2024
Python

zjunlp / EasyDetect

Star

[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.

natural-language-processing artificial-intelligence knowledge-graph generation multimodal hallucination aigc large-language-models generative-ai model-editing knowledge-editing multimodal-large-language-models knowlm easydetect hallucination-detection

Updated May 16, 2024
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 662 public repositories matching this topic...

isLinXu / paper-list

rerun-io / rerun

parsee-ai / parsee-core

enricoros / big-AGI

NVIDIA / NeMo

xlang-ai / OSWorld

jina-ai / jina

parvbhullar / superpilot

modelscope / swift

bentoml / BentoML

zorin-egor / ArchitectureComponentsSample

abdulhakkeempa / AccentAce

TIGER-AI-Lab / Mantis

smalltong02 / keras-llm-robot

kyegomez / BitNet

pixeltable / pixeltable

dusty-nv / NanoLLM

InternLM / HuixiangDou

PaddlePaddle / PaddleMIX

zjunlp / EasyDetect

Improve this page

Add this topic to your repo