Abstract:
Kyrgyz is a less-resourced language and requires significant effort to create high-quality syntax corpora (treebanks). In this work, we propose an approach that simplifies the development of a treebank for the Kyrgyz language. We present a tool for transferring syntactic annotations from the Turkish language to Kyrgyz based on the treebank translation method. We evaluate the efficiency of our approach using the TueCL treebank. Results show that our method provides higher quality of syntactic annotation compared to a monolingual model trained on the Kyrgyz KTMU treebank. Moreover, in this work we propose a method to evaluate the complexity of manual annotation for the resulting syntax trees, contributing to further optimization of the annotation process.
Key words and phrases:dependency grammar, natural language processing, less-resourced languages, machine translation, Kyrgyz language processing.