Skip to content

OtterHD's release, Dataloading Process Refactored.

Latest
Compare
Choose a tag to compare
@Luodian Luodian released this 18 Nov 04:44
· 25 commits to main since this release
a20f40d

[2023-11]: Supporting GPT4V's Evaluation on 8 Benchmarks; Anouncing OtterHD-8B, improved from Fuyu-8B. Checkout OtterHD for details.

  1. 馃Ζ Added OtterHD, a multimodal fine-tuned from Fuyu-8B to facilitate fine-grained interpretations of high-resolution visual input without a explicit vision encoder module. All image patches are linear transformed and processed together with text tokens. This is a very innovative and elegant exploration. We are fascinated and paved in this way, we opensourced the finetune script for Fuyu-8B and improve training throughput by 4-5 times faster with Flash-Attention-2. Try our finetune script at OtterHD.
  2. 馃攳 Added MagnifierBench, an evaluation benchmark tailored to assess whether the model can identify the tiny objects' information (1% image size) and spatial relationships.
  3. Improved pipeline for Pretrain | SFT | RLHF with (part of) current leading LMMs.
    1. Models: Otter | OpenFlamingo | Idefics | Fuyu
    2. Training Datasets Interface: (Pretrain) MMC4 | LAION2B | CC3M | CC12M, (SFT) MIMIC-IT | M3IT | LLAVAR | LRV | SVIT...
      • We tested above datasets for both pretraining and instruction tuning with OpenFlamingo and Otter. We also tested the datasets with Idefics and Fuyu for instruction tuning. We will opensource the training scripts gradually.
    3. Benchmark Interface: MagnifierBench/MMBench/MM-VET/MathVista/POPE/MME/SicenceQA/SeedBench. Run them can be in one-click, please see Benchmark for details.
        datasets:
        - name: magnifierbench
            split: test
            prompt: Answer with the option's letter from the given choices directly.
            api_key: [Your API Key] # GPT4 or GPT3.5 to evaluate the answers and ground truth.
            debug: true # put debug=true will save the model response in log file.
        - name: mme
            split: test
            debug: true
        - name: mmbench
            split: test
            debug: true
    
        models:
        - name: gpt4v
            api_key: [Your API Key] # to call GPT4V model.
    1. Code refactorization for organizing multiple groups of datasets with integrated yaml file, see details at managing datasets in MIMIC-IT format. For example,
        IMAGE_TEXT: # Group name should be in [IMAGE_TEXT, TEXT_ONLY, IMAGE_TEXT_IN_CONTEXT]
            LADD: # Dataset name can be assigned at any name you want
                mimicit_path: azure_storage/json/LA/LADD_instructions.json # Path of the instruction json file
                images_path: azure_storage/Parquets/LA.parquet # Path of the image parquet file
                num_samples: -1 # Number of samples you want to use, -1 means use all samples, if not set, default is -1.
            M3IT_CAPTIONING:
                mimicit_path: azure_storage/json/M3IT/captioning/coco/coco_instructions.json
                images_path: azure_storage/Parquets/coco.parquet
                num_samples: 20000
    This is a major change and would result previous code not runnable, please check the details.