RUS  ENG
Full version
JOURNALS // Informatika i Ee Primeneniya [Informatics and its Applications] // Archive

Inform. Primen., 2021 Volume 15, Issue 2, Pages 96–103 (Mi ia734)

This article is cited in 2 papers

Extracting knowledge about means of expression of logical-semantic relations from the supracorpora database

A. A. Goncharov, O. Yu. Inkova

Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The goal of this paper is to demonstrate how parallel texts annotated with a supracorpora database (SCDB) can be efficiently used to extract knowledge about alternative means of expression of logical-semantic relations (LSR). The authors review the most prominent discursively annotated corpora (Penn Discourse Treebank, Prague Dependency Treebank, and Rhetorical Structure Theory Discourse Treebank) to support the observation that there is no consensus among the researchers as to which linguistic means are to be considered connectives (i. e., prototypical markers of LSR) and which means are deemed “alternative.” The research shows that application of the comparative method while leveraging the capabilities of the SCDB of connectives makes it possible not only to extract new knowledge about LSR markers but also to create thesauri of various means of LSR expression in the languages involved, including the alternative ones. In addition, the SCDB data makes it possible to generate new knowledge on correlations between specific LSRs and unconventional means of LSR expression and calculate frequencies of utilization of these means for the studied languages.

Keywords: supracorpora database, logical-semantic relations, connectives, knowledge generation, parallel texts.

Received: 06.04.2021

DOI: 10.14357/19922264210214



© Steklov Math. Inst. of RAS, 2024