Releases · Luodian/Otter

[2023-11]: Supporting GPT4V's Evaluation on 8 Benchmarks; Anouncing OtterHD-8B, improved from Fuyu-8B. Checkout OtterHD for details.

🦦 Added OtterHD, a multimodal fine-tuned from Fuyu-8B to facilitate fine-grained interpretations of high-resolution visual input without a explicit vision encoder module. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with Flash-Attention-2. Try our finetune script at OtterHD.
🔍 Added MagnifierBench, an evaluation benchmark tailored to assess whether the model can identify the tiny objects' information (1% image size) and spatial relationships.

Improved pipeline for Pretrain | SFT | RLHF with (part of) current leading LMMs.

Models: Otter | OpenFlamingo | Idefics | Fuyu
Training Datasets Interface: (Pretrain) MMC4 | LAION2B | CC3M | CC12M, (SFT) MIMIC-IT | M3IT | LLAVAR | LRV | SVIT...
- We tested above datasets for both pretraining and instruction tuning with OpenFlamingo and Otter. We also tested the datasets with Idefics and Fuyu for instruction tuning. We will opensource the training scripts gradually.
Benchmark Interface: MagnifierBench/MMBench/MM-VET/MathVista/POPE/MME/SicenceQA/SeedBench. Run them can be in one-click, please see Benchmark for details.

    datasets:
    - name: magnifierbench
        split: test
        prompt: Answer with the option's letter from the given choices directly.
        api_key: [Your API Key] # GPT4 or GPT3.5 to evaluate the answers and ground truth.
        debug: true # put debug=true will save the model response in log file.
    - name: mme
        split: test
        debug: true
    - name: mmbench
        split: test
        debug: true

    models:
    - name: gpt4v
        api_key: [Your API Key] # to call GPT4V model.

Code refactorization for organizing multiple groups of datasets with integrated yaml file, see details at managing datasets in MIMIC-IT format. For example,

    IMAGE_TEXT: # Group name should be in [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT]
        LADD: # Dataset name can be assigned at any name you want
            mimicit_path: azure_storage/json/LA/LADD_instructions.json # Path of the instruction json file
            images_path: azure_storage/Parquets/LA.parquet # Path of the image parquet file
            num_samples: -1 # Number of samples you want to use, -1 means use all samples, if not set, default is -1.
        M3IT_CAPTIONING:
            mimicit_path: azure_storage/json/M3IT/captioning/coco/coco_instructions.json
            images_path: azure_storage/Parquets/coco.parquet
            num_samples: 20000

This is a major change and would result previous code not runnable, please check the details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: Luodian/Otter

OtterHD's release, Dataloading Process Refactored.

MIMIC-IT, Otter-Image/Video released

v0.1.0 - Initial Release