Pixel-level Certified Explanations via Randomized Smoothing

Alaa Anani, Tobias Lorenz, Mario Fritz, Bernt Schiele in International Conference on Machine Learning (ICML), 2025

Post-hoc attribution methods aim to explain deep learning predictions by highlighting influential input pixels. However, these explanations are highly non-robust: small, imperceptible input perturbations can drastically alter the attribution map while maintaining the same prediction. This vulnerability undermines their trustworthiness and calls for rigorous robustness guarantees of pixel-level attribution scores. We introduce the first certification framework that guarantees pixel-level robustness for any black-box attribution method using randomized smoothing. By sparsifying and smoothing attribution maps, we reformulate the task as a segmentation problem and certify each pixel’s importance against l2-bounded perturbations. We further propose three evaluation metrics to assess certified robustness, localization, and faithfulness. An extensive evaluation of 12 attribution methods across 5 ImageNet models shows that our certified attributions are robust, interpretable, and faithful, enabling reliable use in downstream tasks.

[Paper]  [arXiv]  [Poster]  [Code] 

Citation

@inproceedings{anani2025pixel,
    title          = {Pixel-level Certified Explanations via Randomized Smoothing}, 
    author         = {Alaa Anani and Tobias Lorenz and Mario Fritz and Bernt Schiele},
    booktitle      = {International Conference on Machine Learning (ICML)},
    year           = {2025}
}