Skip to content

declare-lab/InstrAug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robust Instruction Tuning on MLLMs

Official Implementation of the paper: InstrAug: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning

Introduction

InstrAug is a framework for instruction augmentation. It can expand extant small instruction set to up to 30x larger. The whole pipeline of InstrAug includes (as illustrated in the figure below):

  1. Meta-prompt Generation
  2. Augmented Instruction Generation and Rule-based Filtering
    • Multi-temp sampling ($\rm MIns+_{\rm MT}$)
    • Iterative rephrasing ($\rm MIns+_{\rm Iter}$)
  3. Instruction-following Dataset Construction



We apply InstrAug to Multimodal Instruction Fine-tuning (MIFT) benchmarks and test on 12 downstream tasks from MultiInstruct and InstrutBLIP-Bench and the whole MMMU benchmark. The results show that the model's capability on instruction-augmented dataset (59K) is competitive to or even exceeds non-augmented but larger datasets (564K).

Repo Hierarchy

The file structure in this repository is as below, we only show important folders/files

.
├── IBLIP                   # Implementation code on Instruct-BLIP
├── OFA                     # Implementation code on OFA
├── MultiInstruct           # Code to create MINS+
    ├──llama                # Code to generate augmented instructions using LLaMA
    ├──mminstr_dataset      # folder to store MINS and MINS+ dataset 
    └──instruction_data     # folder to store original and generated instruction set 
├── LICENSE
└── README.md

Usage

Please refer to the README.md under individual folder for more details.

Results

1. Results on MultiInstruct



2. Results on IBLIP-Bench



3. Results on MMMU



Citation

Please cite our paper if you find this work useful for your research and applications

@misc{han2024instraug,
      title={INSTRAUG: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning}, 
      author={Wei Han and Hui Chen and Soujanya Poria},
      year={2024},
      eprint={2402.14492},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}