autoupdate paper list
-
Updated
May 16, 2024 - Python
autoupdate paper list
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
☁️ Build multimodal AI applications with cloud-native stack
LLMs based multi-model framework for building AI apps.
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
Sample multi modals app developed with android architecture components, conventions plugins
This is a simple application that generates scripts for the user to read. Based on the audio, the application would provide a score for their pronunciation and suggest possible methods to improve it.
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
A web UI Project In order to learn the large language model. This project includes features such as chat, quantization, fine-tuning, prompt engineering templates, and multimodality.
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Data Infrastructure for Multimodal AI: Data, models, and orchestration in a unified declarative interface.
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."