Abstract:
The paper describes a unified representation for an abstract syntax tree (AST) suitable for static analysis of several programming languages. The proposed analysis scheme consists of saving an intermediate representation in the form of a unified AST from compilers of the corresponding languages and subsequent analysis of the saved trees. We have implemented this described representation for Java, Kotlin and Python. The unified AST analyzer has 27 checkers. In the paper we present structure and entities of our unified AST, provide more details regarding language specifics that have to be reflected in the UAST representation. We give extensive experimental results that show UAST generation and analysis speed, analysis quality, and comparison with the old scheme of analyzing compiler ASTs where applicable. As a result, we see that we observe some degradation of analysis speed, but we pay it for the separation of AST construction and checkers’ implementation. This separation allows easier support of many languages in the analyzer, where one can just generate UAST and support the required checker once within the UAST infrastructure instead of implementing a checker once per language.