Abstract:
This article describes and compares a number of classical metrics
to compare different approaches to partition a given set, such as
the Rand index, the Larsen and Aone coefficient, among others. We
developed a probabilistic framework to compare these metrics and
unified representation of distances that uses a common set of
parameters. This is done by taking all possible values of
similarity measurements between different possible partitions and
graduating them by using quantiles of a distribution function. Let
${\lambda }_{\alpha }$ be a quantile with $\alpha $ level for
distribution function $F_{\rho }\left(t\right)=P\left(\rho
<t\right)$. Then if the proximity measurement $\rho $ is not less
than ${\lambda }_{\alpha }$, we can conclude that $\alpha \cdot
100\%$ of randomly chosen pairs of partitions have a proximity
measurement less than $\rho $. This means that these partitions
can neither be considered close nor similar. This paper identifies
the general case of distribution functions that describe
similarity measurements, with a special focus on uniform
distributions. The comparison results are presented in tables for
quantiles of probability distributions, using computer simulations
over our selected set of similarity metrics. Refs 9. Table 1.
Keywords:distance between partitions of a set,
probabilistic approach, comparing the distances.
UDC:519.213
Received:October 7, 2017 Accepted: January 11, 2018