RUS  ENG
Full version
JOURNALS // Sibirskii Zhurnal Vychislitel'noi Matematiki // Archive

Sib. Zh. Vychisl. Mat., 2019 Volume 22, Number 2, Pages 121–136 (Mi sjvm705)

This article is cited in 3 papers

Exact algorithms of searching for the largest size cluster in two integer 2-clustering problems

A. V. Kel'manovab, A. V. Panasenkoba, V. I. Khandeevab

a Sobolev Institute of Mathematics, Siberian Branch, Russian Academy of Sciences, pr. Akad. Koptyuga 4, Novosibirsk, 630090 Russia
b Novosibirsk State University, ul. Pirogova 1, Novosibirsk, 630090 Russia

Abstract: We consider two related discrete optimization problems of searching for a subset in a finite set of points in the Euclidean space. Both problems are induced by the versions of the fundamental problem in data analysis, namely, by selecting a subset of similar elements in a set of objects. In each problem, an input set and a positive real number are given, and it is required to find a cluster (i.e., a subset) of the largest size under constraints on the value of a quadratic clusterization function. The points in the input set which are outside the sought for subset are treated as the second (complementary) cluster. In the first problem, the function under the constraint is the sum over both clusters of the intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first (i.e., the sought) cluster is unknown and determined as the centroid, while the center of the second one is fixed at a given point in the Euclidean space (without loss of generality in the origin). In the second problem, the function under the constraint is the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. As in the first problem, the center of the first cluster is unknown and determined as the centroid, while the center of the second one is fixed in the origin. In this paper, we show that both problems are strongly NP-hard. Also, we present the exact algorithms for the cases of these problems in which the input points have integer components. If the space dimension is bounded by some constant, the algorithms are pseudopolynomial.

Key words: Euclidean space, 2-clustering, largest subset, NP-hardness, exact algorithm, pseudopolynomial-time solvability.

UDC: 519.2+621.391

Received: 15.05.2018
Revised: 26.06.2018
Accepted: 21.01.2019

DOI: 10.15372/SJNM20190201


 English version:
Numerical Analysis and Applications, 2019, 12:2, 105–115

Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024