Abstract:
We introduce RES-LT (Residual Local Topology), a novel topological data analysis (TDA) approach that extracts higher-order structural information from transformer-based protein language models. RES-LT utilizes both H0 and H1 persistent homology to characterize residue-residue interactions in proteins through attention maps, generating biologically relevant features for per-residue classification. Implemented on the ESM-2 model family, our framework integrates H0 and H1 topological features with standard embeddings to create a powerful hybrid representation. Extensive evaluation demonstrates that RES-LT achieves state-of-the-art performance in conservation prediction and significantly outperforms both traditional approaches and comparable transformer-based methods in binding site identification.
Keywords:topological data analysis, persistent homology, protein language models, attention maps, residue-level prediction, binding site identification, conservation prediction, secondary structure prediction.