V. A. Padaryan, “On representation used in the binary code reverse engineering”, Proceedings of ISP RAS, 2017, Volume 29, Issue 3,Pages <nobr>31

This article is cited in 2 papers

On representation used in the binary code reverse engineering

V. A. Padaryan^ab

^a Lomonosov Moscow State University
^b Institute for System Programming of the Russian Academy of Sciences

Abstract: The paper discusses the problem of representation of algorithms extracted from binary code in course of reverse engineering: both representations for automatic analysis and final representations for the user. Two key subproblems of reverse engineering are focused on: automatic search for exploitable defects and discovery of undeclared capabilities. A principal scheme of system that allows automatically finding exploitable defects is described, along with key features of an internal representation employed by such system from the viewpoint of efficient generation of equations for an SMT solver. A sequence of steps for a system that reveals undeclared capabilities is enumerated: algorithm localization, its representation in a form suitable for analysis, and recovery of its properties. In order to automate the first step a static-dynamic representation is built which includes OS-level events and calls to library functions that serve as “anchor points” for the analyst in course of algorithm localization. Further support for localization is provided by means of code slicing and navigation algorithms. Once the algorithm is localized, further work goes in two directions: dialogue-based building of an annotated representation of the algorithm as a flowchart and automated research of characteristics of the algorithm in terms of declared and undeclared data flows. Flowchart representation of an algorithm is based on building simplified function models which describe input and output buffers, and automatic analysis of data flows between buffers of calls of different functions. The general scenario of interaction between an analyst and such a flowchart in context of the undeclared capability revealing problem is described, based on annotating declared data flows and automatically revealing undeclared ones. The paper concludes with an example of such a representation and an enumeration of further work directions.

Keywords: binary code, combined analysis, intermediate representation.

DOI: 10.15514/ISPRAS-2016-29(3)-3