-
[2025.04.04] Our paper was selected as the Highlight paper of CVPR 2025.
-
[2025.02.29] Our paper was successfully accepted by CVPR 2025.
-
[2025.02.08] We release the official code of VPS, a new interpretation mechanism.
-
[2024.09.30] We begin to investigate the potential of interpretability in object detection.
For our interpretation method, the packages we use are relatively common. Please mainly install pytorch
, etc.
We provide code to explain Grounding DINO, but please install its dependencies first: https://github.com/IDEA-Research/GroundingDINO.
For explaining Florence-2, please install its dependencies: https://huggingface.co/microsoft/Florence-2-large-ft
For explaining traditional detectors, please install MMDetection v3.3: https://github.com/open-mmlab/mmdetection/
In addition, please follow the datasets/readme.md and ckpt/readme.md to organize the dataset and download the weights of the relevant detectors.
You can experience the interpretability of a single image directly in the Jupyter notebook.
- Grounding DINO Interpretation (Detection): tutorial
- Florence-2 Interpretation (Detection): tutorial
- Florence-2 Interpretation (Visaul Grounding): tutorial
We provide some results of our approach on interpreting object detection models.
Note: The tank picture is from the Internet.
Prepare the datasets following here.
Download the benchmark files and put them into ./datasets from https://huggingface.co/datasets/RuoyuChen/VPS_benchmark.
Run (more instructions are in fold ./scripts):
./script/groundingdino_coco_correct.sh
Visualization:
python -m visualization.visualize_ours \
--explanation-dir submodular_results/grounding-dino-coco-correctly/slico-1.0-1.0-division-number-100 \
--Datasets datasets/coco/val2017
Evaluation faithfulness:
python -m evals.eval_AUC_faithfulness \
--explanation-dir submodular_results/grounding-dino-coco-correctly/slico-1.0-1.0-division-number-100
Evaluation location:
python -m evals.eval_energy_pg \
--Datasets datasets/coco/val2017 \
--explanation-dir submodular_results/grounding-dino-coco-correctly/slico-1.0-1.0-division-number-100
SMDL-Attribution: SOTA attribution method based on submodular subset selection
Grounding DINO: an open-set object detector.
Florence-2: a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
MMDetection V3.3: an open source object detection toolbox based on PyTorch.
@article{chen2024interpreting,
title={Interpreting Object-level Foundation Models via Visual Precision Search},
author={Chen, Ruoyu and Liang, Siyuan and Li, Jingzhi and Liu, Shiming and Li, Maosen and Huang, Zheng and Zhang, Hua and Cao, Xiaochun},
journal={arXiv preprint arXiv:2411.16198},
year={2024}
}