Abstract:
The widespread use of the Internet as a source of information and pastime requires a reliable mechanism for filtering. Classification of Web pages is one of the most difficult stages of filtering. It should be borne in mind: html-structure, content and communication with other resources through hyperlinks. Particular attention should be paid to metainformation, that should reflect the basic keywords and a brief summary of Web pages. Classification of Web pages based on the metainformation is considered to be difficult because of the absence of clear boundaries between the communities of web documents. In this situation, it is necessary to use neural network classifiers.