RUS  ENG
Full version
JOURNALS // Sistemy i Sredstva Informatiki [Systems and Means of Informatics] // Archive

Sistemy i Sredstva Inform., 2014 Volume 24, Issue 4, Pages 124–134 (Mi ssi379)

This article is cited in 1 paper

Adjustable variable-length character encoding scheme — ACE

I. M. Adamovich, D. V. Zemskov

Institute of Informatics Problems, Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation

Abstract: The article describes ACE (Adjustable Character Encoding) — a variable-length character encoding scheme, which is capable of encoding the full range of UCS (Universal Coded Character Set, ISO/IEC 10646) code points as sequences of one to four octets (8-bit code units). The main reason of creating this encoding was to increase, in comparison with UTF-8 (Unicode Transformation Format, 8-bit), the number of code points encoded as one-octet code unit sequence, thus allowing more compact representation of texts containing characters of a chosen national alphabet, and also to increase the capability to preserve binary representation of encoded characters of such alphabet to match their binary values in a single-byte code table. This encoding retains such properties of the UTF-8 encoding as statelessness (the representation of an encoded character does not depend on the values of previous characters), self-synchronization (none of the valid code sequences can occur inside the other one, nor inside any adjacent sequences across their boundaries), and the possibility to locate the beginning or the end of a code sequence at any place of encoded text.

Keywords: character encoding scheme; UCS; program localization; UTF-8.

Received: 05.05.2014

DOI: 10.14357/08696527140408



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024