RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2019 Volume 29, Issue 2, Pages 148–160 (Mi ssi647)

This article is cited in 7 papers

Annotation methodology of supracorpora databases

A. A. Goncharov, O. Yu. Inkova, M. G. Kruzhkov

Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119133, Russian Federation

Abstract: The paper considers methodological principles of annotating linguistic units in parallel corpora using supracorpora databases. Supracorpora databases are a novel information resource in linguistics that allows researchers to save the results of linguistic analysis of corpus data in the form of annotations structured according to the research objectives. When dealing with parallel corpora, the annotation procedure consists of 4 basic stages: annotation objects lookup; definition of the linguistic unit and its context (both in original and translated texts); definition of the linguistic unit's attributes (both in original and translated texts); and combination of two linguistic units into a translation correspondence and definition of its attributes. The paper summarizes the previously described annotation techniques, examines functional potential of supracorpora databases, and concludes that it is possible to apply the developed methodology to a wide variety of research objects.

Keywords: suprocorpora databases, faceted classifications, linguistic annotation, annotation methodology, contrastive linguistics.

Received: 15.03.2019

DOI: 10.14357/08696527190213



© Steklov Math. Inst. of RAS, 2024