Skip to content

Releases: Luodian/Otter

OtterHD's release, Dataloading Process Refactored.

18 Nov 04:44
a20f40d
Compare
Choose a tag to compare

[2023-11]: Supporting GPT4V's Evaluation on 8 Benchmarks; Anouncing OtterHD-8B, improved from Fuyu-8B. Checkout OtterHD for details.

  1. 🦦 Added OtterHD, a multimodal fine-tuned from Fuyu-8B to facilitate fine-grained interpretations of high-resolution visual input without a explicit vision encoder module. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with Flash-Attention-2. Try our finetune script at OtterHD.
  2. 🔍 Added MagnifierBench, an evaluation benchmark tailored to assess whether the model can identify the tiny objects' information (1% image size) and spatial relationships.
  3. Improved pipeline for Pretrain | SFT | RLHF with (part of) current leading LMMs.
    1. Models: Otter | OpenFlamingo | Idefics | Fuyu
    2. Training Datasets Interface: (Pretrain) MMC4 | LAION2B | CC3M | CC12M, (SFT) MIMIC-IT | M3IT | LLAVAR | LRV | SVIT...
      • We tested above datasets for both pretraining and instruction tuning with OpenFlamingo and Otter. We also tested the datasets with Idefics and Fuyu for instruction tuning. We will opensource the training scripts gradually.
    3. Benchmark Interface: MagnifierBench/MMBench/MM-VET/MathVista/POPE/MME/SicenceQA/SeedBench. Run them can be in one-click, please see Benchmark for details.
        datasets:
        - name: magnifierbench
            split: test
            prompt: Answer with the option's letter from the given choices directly.
            api_key: [Your API Key] # GPT4 or GPT3.5 to evaluate the answers and ground truth.
            debug: true # put debug=true will save the model response in log file.
        - name: mme
            split: test
            debug: true
        - name: mmbench
            split: test
            debug: true
    
        models:
        - name: gpt4v
            api_key: [Your API Key] # to call GPT4V model.
    1. Code refactorization for organizing multiple groups of datasets with integrated yaml file, see details at managing datasets in MIMIC-IT format. For example,
        IMAGE_TEXT: # Group name should be in [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT]
            LADD: # Dataset name can be assigned at any name you want
                mimicit_path: azure_storage/json/LA/LADD_instructions.json # Path of the instruction json file
                images_path: azure_storage/Parquets/LA.parquet # Path of the image parquet file
                num_samples: -1 # Number of samples you want to use, -1 means use all samples, if not set, default is -1.
            M3IT_CAPTIONING:
                mimicit_path: azure_storage/json/M3IT/captioning/coco/coco_instructions.json
                images_path: azure_storage/Parquets/coco.parquet
                num_samples: 20000
    This is a major change and would result previous code not runnable, please check the details.

MIMIC-IT, Otter-Image/Video released

24 Jun 18:11
Compare
Choose a tag to compare
  • 🧨 Download MIMIC-IT Dataset. For more details on navigating the dataset, please refer to MIMIC-IT Dataset README.

  • 🏎️ Run Otter Locally. You can run our model locally with at least 16G GPU mem for tasks like image/video tagging and captioning and identifying harmful content. We fix a bug related to video inference where frame tensors were mistakenly unsqueezed to a wrong vision_x. You can now try running it again with the updated version.

    Make sure to adjust the sys.path.append("../..") correctly to access otter.modeling_otter in order to launch the model.

v0.1.0 - Initial Release

30 Apr 10:38
Compare
Choose a tag to compare

We are excited to announce the initial release of 🦦 Otter!