VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
-
Updated
Sep 13, 2023 - Python
VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
A list of research papers on knowledge-enhanced multimodal learning
VTC: Improving Video-Text Retrieval with User Comments
code for studying OpenAI's CLIP explainability
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Instruction Following Agents with Multimodal Transforemrs
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
LAVIS - A One-stop Library for Language-Vision Intelligence
Add a description, image, and links to the vision-language-transformer topic page so that developers can more easily learn about it.
To associate your repository with the vision-language-transformer topic, visit your repo's landing page and select "manage topics."