a state-of-the-art-level open visual language model | 多模态预训练模型
-
Updated
May 20, 2024 - Python
a state-of-the-art-level open visual language model | 多模态预训练模型
Commanding robots using only Language Models' prompts
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models
Chain of Images for Intuitively Reasoning
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024.
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
CLI for converting UForm models to CoreML.
Add a description, image, and links to the visual-language-models topic page so that developers can more easily learn about it.
To associate your repository with the visual-language-models topic, visit your repo's landing page and select "manage topics."