Skip to content

[NAACL 2025 main] From redundancy to relevance: Enhancing explainability in multimodal large language models

License

Notifications You must be signed in to change notification settings

zhangbaijin/From-Redundancy-to-Relevance

Repository files navigation

🔥🔥🔥 [NAACL 2025] From redundancy to relevance: Enhancing explainability in multimodal large language models

License: MIT GitHub Stars image

Setup

conda env create -f environment.yml
conda activate redundancy
python -m pip install -e transformers-4.29.2

Our modify in llava.py/llava_arch.py/llava_llama.py

retain_grad()
required_grad()=True 

Evaluation

The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.

Besides, it needs you to prepare the following checkpoints of 7B base models:

Visualization 🔥🔥🔥

python demo_smooth_grad_threshold.py

image

Citation

@article{zhang2024redundancy,
  title={From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models},
  author={Zhang, Xiaofeng and  Quan, Yihao and Shen, Chen and Yuan, Xiaosong and Yan, Shaotian and Xie, Liang and Wang, Wenxiao and Gu, Chaochen and Tang, Hao and Ye, Jieping},
  journal={Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
  year={2025}
}

Acknowledgement

This repo is built on LLaVA (models), OPERA (CHAIR evaluation) and FastV (Image Token Truncation). Many thanks for their efforts. The use of our code should also follow the original licenses.

About

[NAACL 2025 main] From redundancy to relevance: Enhancing explainability in multimodal large language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published