Skip to content

Latest commit

 

History

History
50 lines (35 loc) · 2.13 KB

README.md

File metadata and controls

50 lines (35 loc) · 2.13 KB

🔥🔥🔥 [NAACL 2025] From redundancy to relevance: Enhancing explainability in multimodal large language models

License: MIT GitHub Stars image

Setup

conda env create -f environment.yml
conda activate redundancy
python -m pip install -e transformers-4.29.2

Our modify in llava.py/llava_arch.py/llava_llama.py

retain_grad()
required_grad()=True 

Evaluation

The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.

Besides, it needs you to prepare the following checkpoints of 7B base models:

Visualization 🔥🔥🔥

python demo_smooth_grad_threshold.py

image

Citation

@article{zhang2024redundancy,
  title={From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models},
  author={Zhang, Xiaofeng and  Quan, Yihao and Shen, Chen and Yuan, Xiaosong and Yan, Shaotian and Xie, Liang and Wang, Wenxiao and Gu, Chaochen and Tang, Hao and Ye, Jieping},
  journal={Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
  year={2025}
}

Acknowledgement

This repo is built on LLaVA (models), OPERA (CHAIR evaluation) and FastV (Image Token Truncation). Many thanks for their efforts. The use of our code should also follow the original licenses.