RUS  ENG
Full version
JOURNALS // Informatika i Ee Primeneniya [Informatics and its Applications] // Archive

Inform. Primen., 2014 Volume 8, Issue 2, Pages 98–110 (Mi ia315)

This article is cited in 9 papers

Information technologies for corpus studies: underpinnings for cross-linguistic database creation

N. V. Buntmana, Anna A. Zaliznyakbc, I. M. Zatsmanb, M. G. Kruzhkovb, E. Yu. Loshchilovab, D. V. Sitchinavad

a Faculty of Foreign Languages and Area Studies, M.V. Lomonosov Moscow State University, 31-a Lomonosov Str., Moscow 119192, Russian Federation
b Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
c Institute of Linguistics, Russian Academy of Sciences, 1-1 Bolshyi Kislovskiy pereulok, Moscow 125009, Russian Federation
d Institute of Russian Language, Russian Academy of Sciences, 18/2 Volkhonka Str., Moscow 119019, Russian Federation

Abstract: Information technology for creation of cross-linguistic databases of Russian texts with French translations (also known as parallel texts) is considered. The underlying principles of the developed database provide a unique combination of three types of bilingual search: lexical, grammatical, and lexico-grammatical. A distinctive feature of the considered technology is simultaneous creation of Russian-French parallel subcorpus within the National Russian Corpus and of the cross-linguistic database of Russian verbal lexico-grammatical forms and their French functional equivalents. The subcorpus and the database have different levels of alignment: the former is aligned at the level of sentences, and the later at the level of constructions. The academic relevance of the developed database is due to its support of bilingual contrastive grammar development, as well as to its role in creation of Russian grammar based on the modern empirical base and information technologies of corpus linguistics. The main practical application of the database consists in improvement of quality of machine translation.

Keywords: parallel corpus; information technology; cross-linguistic databases; bilingual lexical grammar search; corpus linguistics; contrastive grammar.

Received: 29.03.2014

DOI: 10.14357/19922264140210



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024