RUS  ENG
Full version
JOURNALS // Modelirovanie i Analiz Informatsionnykh Sistem // Archive

Model. Anal. Inform. Sist., 2023 Volume 30, Number 2, Pages 140–159 (Mi mais795)

Algorithms

On simplifying expressions with mixed Boolean-arithmetic

Y. V. Kosolapov

Southern Federal University, 105/42 Bolshaya Sadovaya str., Rostov-on-Don, 344006, Russia

Abstract: Mixed Boolean-Arithmetic expressions (MBA-expressions) with $t$ integer $n$-bit variables are often used for program obfuscations. Obfuscation consists of replacing short expressions with longer equivalent expressions that seem to take the analyst more time to explore. The paper shows that to simplify linear MBA-expressions (reduce the number of terms), a technique similar to the technique of decoding linear codes by information sets can be applied. Based on this technique, algorithms for simplifying linear MBA-expressions are constructed: an algorithm for finding an expression of minimum length and an algorithm for reducing the length of an expression. Based on the length reduction algorithm, an algorithm is constructed that allows to estimate the resistance of an MBA-expression to simplification. We experimentally estimate the dependence of the average number of terms in a linear MBA-expression returned by simplification algorithms on $n$, the number of decoding iterations, and the power of the set of Boolean functions, by which a linear combination with a minimum number of nonzero coefficients is sought. The results of the experiments for all considered $t$ and $n$ show that if before obfuscation the linear MBA-expression contained $r=1,2,3$ terms, then the developed simplification algorithms with a probability close to one allow using the obfuscated version of this expression find an equivalent one with no more than $r$ terms. This is the main difference between the information set decoding technique and the well-known techniques for simplifying linear MBA-expressions, where the goal is to reduce the number of terms to no more than $2^t$. We also found that for randomly generated linear MBA-expressions with increasing $n$, the average number of terms in the returned expression tends to $2^t$ and does not differ from the average number of terms in the linear expression returned by known simplification algorithms. The results obtained, in particular, make it possible to determine $t$ and $n$ for which the number of terms in the simplified linear MBA-expression on average will not be less than the given one.

Keywords: code obfuscation, MBA-expressions, simplification of MBA-expressions, decoding by information sets.

UDC: 004.056.5

MSC: 93B11

Received: 03.04.2023
Revised: 17.05.2023
Accepted: 17.05.2023

DOI: 10.18255/1818-1015-2023-2-140-159



© Steklov Math. Inst. of RAS, 2024