Abstract:
The paper deals with the topical problems of developing a multilingual linguistic resource of semantic syntactic presentations for the systems of machine translation and knowledge extraction from natural language texts. The objective of the research is to create the integral linguistic model comprising grammar rules, statistical methods and the mechanisms of machine learning for extracting new structural syntactic rules from text corpora and for disambiguation of syntactic structures. For developing linguistic knowledge formal presentations the authors apply the instrument of Cognitive Transfer Grammar (CTG), which is a semantically motivated variant of phrase structure grammar with the head features and inheritance mechanisms. For the creation of the system machine learning components and obtaining statistical data about language structures there have been developed the multilingual linguistic resource INTERTEXT which comprises the Treebank and the corpus of semantically aligned parallel texts in the Russian, English, French and German languages.