T. A. Nalbandian, S. A. Shalileh, “An empirical scrutinization of four crisp clustering methods with four distance metrics and one straightforward interpretation rule”, Dokl. RAN. Math. Inf. Proc. Upr., 2024, Volume 520, Number 2,Pages <nobr>267

SPECIAL ISSUE: ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING TECHNOLOGIES

An empirical scrutinization of four crisp clustering methods with four distance metrics and one straightforward interpretation rule

T. A. Nalbandian^a, S. A. Shalileh^ab

^a Laboratory of Artificial Intelligence for Cognitive Sciences, HSE University, Moscow, Russia
^b Sberbank of Russia, SberIndex, Moscow, Russia

Abstract: Clustering has always been in great demand by scientific and industrial communities. However, due to the lack of ground truth, interpreting its obtained results can be debatable. The current research provides an empirical benchmark on the efficiency of three popular and one recently proposed crisp clustering methods. To this end, we extensively analyzed these (four) methods by applying them to nine real-world and 420 synthetic datasets using four different values of $p$ in Minkowski distance. Furthermore, we validated a previously proposed yet not well-known straightforward rule to interpret the recovered clusters. Our computations showed (i) Nesterov gradient descent clustering is the most effective clustering method using our real-world data, while K-Means had edge over it using our synthetic data; (ii) Minkowski distance with $p$ = 1 is the most effective distance function, (iii) the investigated cluster interpretation rule is intuitive and valid.

Keywords: clustering, Minkowski distance, algorithms.

UDC: 004.891.3

Received: 27.09.2024
Accepted: 02.10.2024

DOI: 10.31857/S2686954324700632