Skip to content

[Arxiv 2024] Official Implementation of the paper: "Towards Robust Instruction Tuning on Multimodal Large Language Models"

License

Notifications You must be signed in to change notification settings

declare-lab/RobustMIFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

15242ef · Jun 22, 2024

History

23 Commits
Feb 29, 2024
May 2, 2024
May 2, 2024
Mar 10, 2024
Feb 22, 2024
Feb 22, 2024
Jun 22, 2024

Repository files navigation

Robust Instruction Tuning on MLLMs

Official Implementation of the paper: InstrAug: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning

Introduction

InstrAug is a framework for instruction augmentation. It can expand extant small instruction set to up to 30x larger. The whole pipeline of InstrAug includes (as illustrated in the figure below):

  1. Meta-prompt Generation
  2. Augmented Instruction Generation and Rule-based Filtering
    • Multi-temp sampling ( M I n s + M T )
    • Iterative rephrasing ( M I n s + I t e r )
  3. Instruction-following Dataset Construction



We apply InstrAug to Multimodal Instruction Fine-tuning (MIFT) benchmarks and test on 12 downstream tasks from MultiInstruct and InstrutBLIP-Bench and the whole MMMU benchmark. The results show that the model's capability on instruction-augmented dataset (59K) is competitive to or even exceeds non-augmented but larger datasets (564K).

Repo Hierarchy

The file structure in this repository is as below, we only show important folders/files

.
├── IBLIP                   # Implementation code on Instruct-BLIP
├── OFA                     # Implementation code on OFA
├── MultiInstruct           # Code to create MINS+
    ├──llama                # Code to generate augmented instructions using LLaMA
    ├──mminstr_dataset      # folder to store MINS and MINS+ dataset 
    └──instruction_data     # folder to store original and generated instruction set 
├── LICENSE
└── README.md

Usage

Please refer to the README.md under individual folder for more details.

Results

1. Results on MultiInstruct



2. Results on IBLIP-Bench



3. Results on MMMU



Citation

Please cite our paper if you find this work useful for your research and applications

@misc{han2024robust,
      title={Towards Robust Instruction Tuning on Multimodal Large Language Models}, 
      author={Wei Han and Hui Chen and Soujanya Poria},
      year={2024},
      eprint={2402.14492},
      archivePrefix={arXiv},
}