RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2021 Volume 33, Issue 2, Pages 65–76 (Mi tisp585)

Regular expressions for web advertising detection based on an automatic sliding algorithm

D. Riaño, R. Piñon, G. Molero-Castillo, E. Bárcenas, A. Velázquez-Mena

National Autonomous University of Mexico

Abstract: This paper presents the automation of a Web advertising recognition algorithm, using regular expressions. Currently, the use of regular expressions, optical character recognition, Databases, and automation tests have been critical for multiple Software implementations. The tests were carried out in three Web browsers. As a result, the detection of advertisements in Spanish, that distract attention and that above all extract information from users was achieved. The main feature of the algorithm is that automatic and versatile execution does not require access to the code of the page in question and that in the future it can be an application with background operation. In addition, being supported by optical character recognition gives us acceptable efficiency in detecting advertising.

Keywords: digital marketing, optical character recognition, regular expressions, web advertising.

DOI: 10.15514/ISPRAS-2021-33(2)-3



© Steklov Math. Inst. of RAS, 2024