Grounded Multimodal Large Language Model with Localized Visual Tokenization
-
Updated
May 29, 2024 - Python
Grounded Multimodal Large Language Model with Localized Visual Tokenization
We perform functional grounding of LLMs' knowledge in BabyAI-Text
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
A personalized assistant that generates new issues by using earlier issue descriptions of Issue Tracking Systems like JIRA
A python library for design of earthing networks in electrical substations.
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
A biological entity grounding search service
Extracting character conversations in Arknights
Hierarchical Universal Language Conditioned Policies
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
[CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
CLIPort: What and Where Pathways for Robotic Manipulation
[ICRA2023] Grounding Language with Visual Affordances over Unstructured Data
Adapting original Azure OpenAI sample from https://github.com/Azure-Samples/azure-search-openai-demo for newer GPT4-compatible "Chat Completion" syntax.
awesome grounding: A curated list of research papers in visual grounding
This is the official implementation for our paper;"LAR:Look Around and Refer".
Add a description, image, and links to the grounding topic page so that developers can more easily learn about it.
To associate your repository with the grounding topic, visit your repo's landing page and select "manage topics."