RUS  ENG
Full version
JOURNALS // Informatika i Ee Primeneniya [Informatics and its Applications] // Archive

Inform. Primen., 2022 Volume 16, Issue 2, Pages 52–59 (Mi ia786)

This article is cited in 1 paper

Principles of describing markers of logical-semantic relations and their hierarchy

A. A. Durnovoa, O. Yu. Inkovaab, N. A. Popkovaa

a Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
b University of Geneva, 22 Bd des Philosophes, CH-1205 Geneva 4, Switzerland

Abstract: The article deals with annotation strategies in corpora with discourse markup. It is shown that Rhetorical Structure Theory (RST)-based corpora only contain annotations of coherence relations, or rhetorical relations (RR). In contrast, the Penn Discourse Treebank (PDTB) of the University of Pennsylvania annotates relations markers, as does the Supracorpora Database of Connectives. The RST Signaling Corpus (RST-SC), also based on RST, has been shown to annotate RR markers, but cannot combine the markup of RRs and their markers in a single annotation. This problem is solved by the GUM corpus and the Supracorpora Database of Hierarchy of Logical-Semantic Relations. The latter has a few advantages: the ability to search, to obtain statistics, and to form bilingual annotations. This makes it possible to identify both universal phenomena in the discursive organization of the text and language-specific phenomena.

Keywords: supracorpora database, corpus of texts' annotation, discourse relations, connective.

Received: 07.04.2021

DOI: 10.14357/19922264220207



© Steklov Math. Inst. of RAS, 2024