Abstract:
The article describes linguistic and algorithmical aspects of the problem of knowledge extraction from the texts in the Internet environment. The means that improve the quality of linguistic processor operation and take into account a special nature of the documents available on the web and large volumes of the texts in English are proposed. It was the reason why additional means for identification of formal and meaningful attributes of the words in English were added to the morphological analysis component. The capabilities of subject catalogues to identify semantic categories of English words were enhanced. The contextual rules of syntactic-semantic analysis of standard forms of the English language were developed. The authors suggest the means for tuning the components for morphological and syntactic-semantic analysis to the language of imputed text (through subject catalogues).