RUS  ENG
Full version
JOURNALS // Upravlenie Bol'shimi Sistemami // Archive

UBS, 2018 Issue 73, Pages 67–94 (Mi ubs954)

Information Technology Applications in Control

Overview of phonetic encoding algorithms

V. S. Vykhovanetsa, J. Dub, S. Sakulinb

a V.A. Trapeznikov Institute of Control Sciences of RAS, Moscow
b Bauman Moscow State University, Moscow

Abstract: This paper gives an overview of the phonetic encoding algorithms, designed to determine the similarity of words in sound (pronunciation). Phonetic encoding algorithms are divided into algorithms for comparing words and algorithms for determining the distance between words. Word comparison algorithms such as SoundEx, NYSIIS, Daitch-Mokotoff, Metaphone, Polyphone and algorithms for determining the distance between words such as Levenshtein, Jaro, N-grams are described. For each algorithm, its advantages and disadvantages are indicated, an analogue of the algorithm for the Russian language is given. To eliminate the common shortcomings of phonetic encoding algorithms, it is proposed to use not the sequence of letters of words, but the sequence of their elementary sounds. In this case, word recognition, record linkage, indexing words by sounds are expected to improve.

Keywords: phonetic encoding algorithms, phonetic distance, record linkage, indexing words by sound.

UDC: 004.93
BBK: 32.972.1

Received: September 12, 2017
Published: May 31, 2018

DOI: 10.25728/ubs.2018.73.4



© Steklov Math. Inst. of RAS, 2024