Abstract:
In this paper, the generalized architecture used in almost all modern systems of automatic speech
recognition is analyzed. The necessity of developing a fundamentally new approach to solving speech
recognition problems is outlined. A formal description of the structure of the speech perception act is
proposed for use as a general theoretical basis in the development of universal automatic speech
recognition systems that are highly effective in conditions of high noise and “cocktail party” situations.
The general structural dynamics of the speech recognition process has been developed, which allows to
take into account the linguistic and extra-linguistic aspects of a speech message. The concept of an
articulation event as a minimal basic pattern of sound image recognition has been proposed. The
recognition process is structured based on the functional determinants of the situation. The need to
analyze the numerous sources of information accompanying the sound message, the rejection of the
search for an invariant here is of fundamental nature. Multi-agent systems were chosen as the formal
means for implementation. Multi-agent approach allows to differentiate and analyze sounds of different
nature. This makes the proposed model unique and gives it advantages in the so-called “cocktail party”
situation, as well as in tasks where the noise level is extremely high.