Abstract:
The article deals with one of the key problems of the social network analysis – the problem of classifying accounts based on media content uploaded by users. The main difficulties are the content heterogeneity (both in format and subject) and the large volumes of data, which leads to excessive computational complexity of its processing and often to the complete inefficiency of traditional analysis methods. In the article, we discuss an approach to the clustering of media content from social networks based on textual annotation using BigData technology – a modern and efficient tool that allows to solve the problem of large data volume processing. To carry out computational experiments, a large sample of heterogeneous images (photographs, paintings, postcards, etc.) was collected from real Twitter accounts. The results confirmed the high quality of media content clustering, the average error was around 5 %.
Keywords:cluster analysis, BigData technology, text annotation, social networks, media content analysis, k-means clustering, GoogLeNet.