N. V. Buntman, A. A. Goncharov, I. M. Zatsman, V. A. Nuriev, “Using supracorpora databases for quantitative analysis of machine translations”, Inform. Primen., 2018, Volume 12, Issue 4,Pages <nobr>96

This article is cited in 6 papers

Using supracorpora databases for quantitative analysis of machine translations

N. V. Buntman^a, A. A. Goncharov^b, I. M. Zatsman^b, V. A. Nuriev^b

^a M. V. Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow 119991, Russian Federation
^b Institute of Informatics Problems, Federal Research Center "Computer Science and Control" of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The paper discusses an information technology that supports expertise of machine translations. The technology has been developed to meet the following conditions: ($i$) there are connectives in all translated contexts; ($ii$) the connectives may be both one-word (khotya ‘although,’ a ‘and’) and multiword (da esche ‘and beside this,’ no zato ‘but instead’); and ($iii$) between words making up a given connective, there may be a space (esli (space) tak ‘if (space) then’). With this technology, expertise of machine translations develops through three main stages: ($i$) linguistic annotation of machine translations in a supracorpora database; ($ii$) quantitative processing of annotations; and ($iii$) linguistic analysis of annotations and quantitative data. The paper describes technological aspects of the first two stages. The examples given are only those with multiword connectives. Source sentences chosen for machine translation have been collected from literary texts.

Keywords: supracorpora database, machine translation, classification of errors, technology supporting expertise, linguistic annotation, corpus linguistics, connectives.

Received: 15.10.2018

DOI: 10.14357/19922264180414