A. A. Grusho, N. A. Grusho, M. I. Zabezhailo, D. V. Smirnov, E. E. Timonina, S. Ya. Shorgin, “Statistics and clusters for detection of anomalous insertions in Big Data environment”, Inform. Primen., 2021, Volume 15, Issue 4,Pages <nobr>79

This article is cited in 4 papers

Statistics and clusters for detection of anomalous insertions in Big Data environment

A. A. Grusho^a, N. A. Grusho^a, M. I. Zabezhailo^a, D. V. Smirnov^b, E. E. Timonina^a, S. Ya. Shorgin^a

^a Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation
^b Sberbank of Russia, 19 Vavilov Str., Moscow 117999, Russian Federation

Abstract: The paper builds algorithms for reducing the level of “false alarms” when searching for anomalies in complex heterogeneous sequences of objects (Big Data). Traditionally, in mathematical statistics, such a decrease is achieved by minimizing the error of “false alarms.” However, in the problems of detecting anomalies (rare intrusions of anomalous data), this approach leads to an increase in the probability of losing the required anomalies. In this paper, in order not to lose the required anomalies, on the contrary, in criteria designed for the least complexity of calculations, it is proposed to make a large error of the appearance of “false alarms” but use the fact that the number of objects allocated by such criteria is much smaller than the number of original objects in Big Data. The selected objects can then be grouped into a single cluster and additional information related to the objects in the cluster can be used to identify the required anomalies. The sense of these actions is that more difficult-to-compute characteristics of objects for dropping out “false alarms” will not require large computational resources on a smaller cluster of objects relative to the original data. It is shown that when certain conditions are satisfied, the order of using additional information does not affect the result of its use when filtering “false alarms.” The results of the filtering algorithm in the sequence of objects are generalized to filtering “false alarms” in the form of causal schemes in the initial data. Known schemes show how “false alarms” can be filtered identifying only fragments of schemes.

Keywords: information security, search for anomalies, algorithms for filtering “false alarms”.

Received: 17.09.2021

DOI: 10.14357/19922264210411