RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 367–377 (Mi danma694)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

RES-LT: extracting higher-order topological features from protein language models for enhanced per-residue classification

M. P. Ivanovaa, I. E. Trofimova, A. V. Mironenkoa, P. V. Strashnovb, M. K. Kravchenkoa, S. A. Barannikovac, E. V. Burnaevab

a Skolkovo Institute of Science and Technology
b Artificial Intelligence Research Institute, Moscow
c CNRS, IMJ, Paris Cité University, Paris, France

Abstract: We introduce RES-LT (Residual Local Topology), a novel topological data analysis (TDA) approach that extracts higher-order structural information from transformer-based protein language models. RES-LT utilizes both H0 and H1 persistent homology to characterize residue-residue interactions in proteins through attention maps, generating biologically relevant features for per-residue classification. Implemented on the ESM-2 model family, our framework integrates H0 and H1 topological features with standard embeddings to create a powerful hybrid representation. Extensive evaluation demonstrates that RES-LT achieves state-of-the-art performance in conservation prediction and significantly outperforms both traditional approaches and comparable transformer-based methods in binding site identification.

Keywords: topological data analysis, persistent homology, protein language models, attention maps, residue-level prediction, binding site identification, conservation prediction, secondary structure prediction.

UDC: 576.8

Received: 21.08.2025
Accepted: 22.09.2025

DOI: 10.7868/S268695432507032X



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025