V. Yu. Shelepov, A. V. Nitsenko, “The refined identification of beginning-end of speech; the recognition of the voiceless sounds at the beginning-end of speech. on the recognition of the extra-large vocabularies”, Eurasian Journal of Mathematical and Computer Applications, 2017, том 5, выпуск 4,страницы 70

The refined identification of beginning-end of speech; the recognition of the voiceless sounds at the beginning-end of speech. on the recognition of the extra-large vocabularies

V. Yu. Shelepov, A. V. Nitsenko

Institute of Artifical Intelligence, 118-b, Artyom st, 83048 Donetsk, Ukraine

Аннотация: The present paper belongs to the diphone DTW-recognition strategy developed by the authors. Voiceless plosives, as well as energetically weak hard and soft [f] constitute a problem for recognition when they occur at the beginning or end of speech, owing to their similarity to neighboring silence stretches. The article opens up a description of some refined methods for specifying the beginning and the end of a spoken word or phrase. This is the basis for the proposed methods of recognizing the mentioned sounds beginning or concluding a spoken word or phrase. We introduce a concept of the final quasifricative fragment as well as the algorithms for its selection and use to classify voiceless plosives in the final position. The results obtained together with an insignificant increase in the number of basic speech units, makes it possible to advance in solving the difficult problems of recognizing short speech segments as well as extra-large vocabularies

Ключевые слова: continuous-speech recognition, speech segmentation, large vocabulary speech recognition, voiceless fragment, diphone, dynamic time warping (DTW).

MSC: 68T10, 68T50

Поступила в редакцию: 15.09.2017
Принята в печать: 09.11.2017

Язык публикации: английский