Skip to content
@OpenGVLab

OpenGVLab

General Vision Team of Shanghai AI Laboratory

Static Badge Twitter

Welcome to OpenGVLab! 👋

We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.

We develop model architecture and release pre-trained foundation models to the community to motivate further research in this area. We have made promising progress in general vision AI, with 109 SOTA🚀. In 2022, our open-sourced foundation model 65.5 mAP on the COCO object detection benchmark, 91.1% Top1 accuracy in Kinetics 400, achieved landmarks for AI vision👀 tasks for image🖼️ and video📹 understanding.

Based on solid vision foundations, we have expanded to Multi-Modality models and Generative AI(partner with Vchitect). We aim to empower individuals and businesses by offering a higher starting point for developing vision-based AI products and lessening the burden of building an AI model from scratch.

Branches: Alpha (explore lattest advances in vision+language research) and uni-medical (focus on medical AI)

Follow us:    Twitter X logo Twitter   🤗Hugging Face    Medium logo Medium    WeChat logo WeChat    zhihu logo Zhihu

Pinned

  1. InternVL InternVL Public

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源模型

    Jupyter Notebook 1.7k 92

  2. InternVideo InternVideo Public

    Video Foundation Models & Data for Multimodal Understanding

    Python 967 63

  3. DCNv4 DCNv4 Public

    [CVPR 2024] Deformable Convolution v4

    Python 339 20

  4. Ask-Anything Ask-Anything Public

    [CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

    Python 2.7k 214

  5. LLaMA-Adapter LLaMA-Adapter Public

    [ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

    Python 5.5k 360

  6. OmniQuant OmniQuant Public

    [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

    Python 575 45

Repositories

Showing 10 of 57 repositories
  • InternVL Public

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源模型

    Jupyter Notebook 1,682 MIT 92 53 0 Updated May 9, 2024
  • Python 69 MIT 6 1 0 Updated May 9, 2024
  • video-mamba-suite Public

    The suite of modeling video with Mamba

    Python 143 MIT 14 5 1 Updated May 3, 2024
  • InternImage Public

    [CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

    Python 2,327 MIT 222 166 5 Updated Apr 30, 2024
  • InternVideo Public

    Video Foundation Models & Data for Multimodal Understanding

    Python 967 Apache-2.0 63 45 3 Updated Apr 30, 2024
  • MMT-Bench Public

    [ICML 2024] MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

    Python 27 0 1 0 Updated Apr 28, 2024
  • PonderV2 Public

    PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

    Python 301 MIT 5 3 0 Updated Apr 25, 2024
  • .github Public
    0 0 0 0 Updated Apr 24, 2024
  • EgoExoLearn Public

    Data and benchmark code for the EgoExoLearn dataset

    Python 27 MIT 0 1 0 Updated Apr 24, 2024
  • Multi-Modality-Arena Public

    Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

    Python 373 26 13 0 Updated Apr 21, 2024