O. A. Kovaleva, A. V. Samokhvalov, M. A. Liashkov, S. Yu. Pchelintsev, “The quality improvement method for detecting attacks on web applications using pre-trained natural language models”, Izv. Saratov Univ. Math. Mech. Inform., 2024, Volume 24, Issue 3,Pages <nobr>442

Scientific Part
Computer Sciences

The quality improvement method for detecting attacks on web applications using pre-trained natural language models

O. A. Kovaleva, A. V. Samokhvalov, M. A. Liashkov, S. Yu. Pchelintsev

Derzhavin Tambov State University, 33 Internationalnaya St., Tambov 392036, Russia

Abstract: This paper explores the use of deep learning techniques to improve the performance of web application firewalls (WAFs), describes a specific method for improving the performance of web application firewalls, and presents the results of its testing on publicly available CSIC 2010 data. Most web application firewalls work on the basis of rules that have been compiled by experts. When running, firewalls inspect HTTP requests exchanged between client and server to detect attacks and block potential threats. Manual drafting of rules requires experts' time, and distributed ready-made rule sets do not take into account the specifics of particular user applications, therefore they allow many false positives and miss many network attacks. In recent years, the use of pretrained language models has led to significant improvements in a diverse set of natural language processing tasks as they are able to perform knowledge transfer. The article describes the adaptation of these approaches to the field of information security, i.e. the use of a pretrained language model as a feature extractor to match an HTTP request with a feature vector. These vectors are then used to train the classifier. We offer a solution that consists of two stages. In the first step, we create a deep pre-trained language model based on normal HTTP requests to the web application. In the second step, we use this model as a feature extractor and train a one-class classifier. Both steps are performed for each application. The experimental results show that the proposed approach significantly outperforms the classical Mod-Security approaches based on rules configured using OWASP CRS and does not require the involvement of a security expert to define trigger rules.

Key words: firewalls, HTTP request analysis, pre-trained language models.

UDC: 004.032.2

Received: 28.01.2023
Accepted: 02.02.2023

DOI: 10.18500/1816-9791-2024-24-3-442-451