Abstract:
The article deals with a new approach to text classification considering the existence of different types of classification features (binary, nominal, ordinal and interval).
The specialty of the approach is a phased classification process, which makes it possible to not cause different types of classification features to a single range. The author describes a computational experiment using texts included in Russian National Corpus and suggests the set of classification features for Russian text classification based on the age of theirs supposed readers. Text documents included in the sample are divided into two categories – for adults and for children, — according to the views of experts.
Keywords:information extraction; text classification; natural language processing; text features.