Abstract:
The paper is dedicated to cluster analysis of external web sites of large universities (web sites that refer to universities and web sites that are referred by universities). Web sites in Russia, the USA and the UK that have highest webometric ranking in their region were chosen as the subject of the study. The goal of the research is to identify a group of sites for each university that have the same kind of activity. The found clusters have been analyzed to determine the impact of group size and the number of groups on webometric ranking of university sites. To achieve the goal of the research, the authors developed a clustering algorithm based on the probabilistic method of reducing the dimension of multidimensional data (Locality-Sensitive Hashing, or LSH). An experiment that was conducted using the test data showed that the developed algorithm has good clustering quality and fast speed performance during massive dataset mining. The main results of the research are presented.
Keywords:webometrics, external web sites of universities, clustering, locality-sensitive hashing, min hashing, external web sites clustering, hyperlinks analysis.