RUS  ENG
Full version
JOURNALS // Numerical methods and programming // Archive

Num. Meth. Prog., 2011 Volume 12, Issue 3, Pages 58–72 (Mi vmp220)

Программирование

A detection method for mass-generated unnatural texts based on the topical structure analysis

A. S. Pavlova, B. V. Dobrovb

a M. V. Lomonosov Moscow State University, Faculty of Computational Mathematics and Cybernetics
b M.V. Lomonosov Moscow State University, Research Computing Center

Abstract: Web spam is considered to be one of the greatest threats to modern search engines. Spammers use a wide range of algorithms to generate multiple unnatural texts. A new general model for texts generated from samples of natural texts is proposed. A new algorithm for detecting unnatural texts based on the topical structure analysis is also proposed. The proposed algorithm is evaluated on synthetic and real-world data.

Keywords: web spam; topical structure; modeling.

UDC: 681.513.7



© Steklov Math. Inst. of RAS, 2024