G. V. Sazonov, K. S. Lukyanov, S. K. Boyarsky, I. A. Makarov, “Is AI interpretability safe: the relationship between interpretability and security of machine learning models”, Proceedings of ISP RAS, 2024, Volume 36, Issue 5,Pages <nobr>127

Is AI interpretability safe: the relationship between interpretability and security of machine learning models

G. V. Sazonov^ab, K. S. Lukyanov^bcd, S. K. Boyarsky^e, I. A. Makarov^fd

^a Lomonosov Moscow State University
^b Ivannikov Institute for System Programming of the RAS
^c Moscow Institute of Physics and Technology (National Research University)
^d Research Center of the Trusted Artificial Intelligence ISP RAS
^e Yandex School of Data Analysis
^f AIRI

Abstract: With the growing application of interpretable artificial intelligence (AI) models, increasing attention is being paid to issues of trust and security across all types of data. In this work, we focus on the task of graph node classification, highlighting it as one of the most challenging. To the best of our knowledge, this is the first study to comprehensively explore the relationship between interpretability and robustness. Our experiments are conducted on datasets of citation and purchase graphs. We propose methodologies for constructing black-box attacks on graph models based on interpretation results and demonstrate how adding protection impacts the interpretability of AI models.

Keywords: interpretability, robustness, attacks on AI models, Black-box attacks, graph node classification, trusted AI.

DOI: 10.15514/ISPRAS-2024-36(5)-9