M. S. Ionkin, M. V. Ogneva, “Implementation, efficiency analysis and quality evaluation of clustering algorithms for graph models of social networks”, Izv. Saratov Univ. Math. Mech. Inform., 2017, Volume 17, Issue 4,Pages <nobr>441

Scientific Part
Computer Sciences

Implementation, efficiency analysis and quality evaluation of clustering algorithms for graph models of social networks

M. S. Ionkin, M. V. Ogneva

Saratov State University, 83, Astrakhanskaya Str., Saratov, Russia, 410012

Abstract: The article deals with the community detection problem (the clustering problem) for undirected graphs. The clustering (grouping together of similar objects) is one of the fundamental tasks in the data analysis. This task is applied in a wide range of areas: image segmentation, marketing, anti-fraud, forecasting, text analysis and much more. At the moment, there is no universal and effective solution of this problem. There are several dozens of methods and there are many modifications of them which group objects that are as similar as possible to each other. The article describes algorithms for solving this task, presents their asymptotic complexity estimates, traditional metrics and quality functionals needed to evaluate the results of their work. The authors propose a solution to the problem which is the opposite of the resolution limit problem (algorithms find communities that are quite small in relation to the entire graph). The authors implemented the Smart Local Moving algorithm which is an improvement of the well-known Louvain algorithm. Performed an experimental comparison of the considered algorithms efficiency on large sparse graphs containing several hundreds of thousands of vertices and edges which corresponding to real data from YouTube, Amazon, Live Journal. The comparative analysis was performed on these three “impersonal” data sets with a previously known division into communities (ground-truth communities), as well as on a data set with all available information about the vertices (users) from the social network “Vkontakte”. The communities found by different algorithms on the same data set were also compared with each other. The authors examined such characteristics as the execution time of algorithms, values of modularity and normalized mutual information.

Key words: clustering, community detection, graph models, data analysis.

UDC: 519.17, 519.237

DOI: 10.18500/1816-9791-2017-17-4-441-451