Abstract:
This paper is dedicated to the vital problems of creating semantic-syntactic presentations for the systems of machine translation and extraction of knowledge from natural language texts. The purpose of our studies is the construction of an integral linguistic model on the basis of a synergetic approach, which uses linguistic knowledge, statistical methods, and mechanisms of machine learning for the extraction of new grammar rules from text corpora and disambiguation of language structures. To formalize linguistic knowledge, we have developed a new Cognitive Transfer Grammar which is a semantically motivated version of a generative unification grammar. For the preparation of system training components and obtaining statistical data about language structures, a multilingual resource is being created, comprising a Treebank and a corpus of semantically aligned parallel texts in Russian, English, and a number of other European languages.