Abstract:
We consider the problem of complex noun phrase recognition in Russian news texts with application to automatic information extraction. By complex noun phrases we mean long noun phrases that contain genitive or/and prepositional constructions and named entities. We describe a plan of noun phrase recognition that begins with a selection of the sentence fragments that undoubtedly contain noun phrases. The fragments selection algorithm is developed. The fragments are classified by frequency of their types, number of words in the fragment, part of speech structure, presence of extracted named entities, some complex prepositions and stable expressions. We introduce a feature system to make automatic noun phrase recognition inside selected fragments. In experiments we have selected 58032 fragments from 1000 documents collection of Russian news. We consider some complex cases.
(In Russian).
Key words and phrases:information extraction, named entities recognition, noun phrase chunking.