RUS  ENG
Full version
JOURNALS // Doklady Rossijskoj Akademii Nauk. Mathematika, Informatika, Processy Upravlenia // Archive

Dokl. RAN. Math. Inf. Proc. Upr., 2025 Volume 527, Pages 400–414 (Mi danma697)

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

Cluster validity across attribute and network spaces: empirical benchmarks for attributed networks clustering

S. Shalileh, D. A. Tsyplakova, E. A. Antonov

Sberbank, SberIndex, Moscow, Russian Federation

Abstract: We empirically study internal cluster validity indices for attributed networks by evaluating feature - and network-space criteria under controlled generators that decouple attributes from topology while sharing cluster cardinalities. Using unified notation, Gaussian blobs for features, and a stochastic block model for graphs, we assess Silhouette Width (SW), Calinski-Harabasz (CH), Davies-Bouldin (DBI), S$_{\mathrm{Dbw}}$, Average Isolability (AVI), Average Unifiability (AVU), and ANUI on both ground-truth and random partitions. SW is stable and saturates once enough features are present; CH grows strongly with sample size (suggesting reporting CH/N); DBI and S$_{\mathrm{Dbw}}$ separate ground truth from random partitions but have K-dependent random baselines, motivating baseline normalization. In network space, AVI increases with assortativity and decreases roughly as 1/K, AVU drops with K toward a floor, and ANUI follows these trends; all indices approach random baselines as overlap/mixing increases, while confidence intervals narrow with more samples or informative features. We provide an empirical benchmark, simple scaling heuristics, and practical guidance for applying CVIs in attributed networks.

Keywords: cluster validity index, attributed network, feature-rich networks, clustering.

UDC: 004.891.3

Received: 20.08.2025
Accepted: 29.09.2025

DOI: 10.7868/S2686954325070355



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2025