RUS  ENG
Full version
JOURNALS // Informatics and Automation // Archive

Informatics and Automation, 2024 Issue 23, volume 1, Pages 65–100 (Mi trspy1281)

Artificial Intelligence, Knowledge and Data Engineering

Evaluation of the informativeness of features in datasets for continuous verification

S. Davydenkoa, E. Kostyuchenkoa, S. Novikovb

a Tomsk State University of Control Systems and Radioelectronics
b Siberian State University of Telecommunications and Informatics

Abstract: Continuous verification eliminates the flaws of existing static authentication, e.g. identifiers can be lost or forgotten, and the user logs in the system only once, which may be dangerous not only for areas requiring a high level of security but also for a regular office. Checking the user dynamically during the whole session of work can improve the security of the system, since while working with the system, the user may be exposed to an attacker (to be assaulted for example) or intentionally transfer rights to him. In this case, the machine will not be operated by the user who performed the initial login. Classifying users continuously will limit access to sensitive data that can be obtained by an attacker. During the study, the methods and datasets used for continuous verification were checked, then some datasets were chosen, which were used in further research: smartphone and smart watch movement data (WISDM) and mouse activity (Chao Shen’s, DFL, Balabit). In order to improve the performance of models in the classification task it is necessary to perform a preliminary selection of features, to evaluate their informativeness. Reducing the number of features makes it possible to reduce the requirements for devices that will be used for their processing, and to increase the volume of enumeration of classifier parameter values at the same time, thereby potentially increasing the proportion of correct answers during classification due to a more complete enumeration of value parameters. For the informativeness evaluation, the Shannon method was used, as well as the algorithms built into programs for data analysis and machine learning (WEKA: Machine Learning Software and RapidMiner). In the course of the study, the informativeness of each feature in the selected datasets was evaluated, and then users were classified with RapidMiner. The used in classifying features selection was decreased gradually with a 20% step. As a result, a table was formed with recommended sets of features for each dataset, as well as dependency graphs of the accuracy and operating time of various models.

Keywords: informativeness, classification, continuous verification, machine learning, feature selection, information security.

UDC: 004.852

Received: 26.02.2023

DOI: 10.15622/ia.23.1.3



© Steklov Math. Inst. of RAS, 2024