Abstract:
Web spam is considered to be one of the greatest threats to modern search engines.
Spammers use a wide range of algorithms to generate multiple unnatural texts.
A new general model for texts generated from samples of natural texts is proposed.
A new algorithm for detecting unnatural texts based on the topical structure
analysis is also proposed. The proposed algorithm is evaluated on synthetic and
real-world data.