Аннотация:
The present article is a description of a speech recognition method based on the idea of recognizing words by their component parts. The method proceeds from automatic phonetic segmentation, using full variation digital analogue, to further compose a diphone base and carry out a DTW algorithm-based speech recognition: rstly, for a variable word part (a quasiexion) and secondly, for its static part (a quasibase), with reference templates automatically formed from diphone templates. It results in considerable reduction of the running time and the reliability growth of word form speech recognition. This method can be employed for recognizing large and very large vocabularies.
Ключевые слова:segmentation of speech signal, diphone, dynamic time warping, feature vector, quasiexion.