RUS  ENG
Full version
JOURNALS // Proceedings of the Institute for System Programming of the RAS // Archive

Proceedings of ISP RAS, 2018 Volume 30, Issue 4, Pages 7–28 (Mi tisp344)

This article is cited in 4 papers

Tolerant parsing with a special kind of «Any» symbol: the algorithm and practical application

A. V. Goloveshkin, S. S. Mikhalkovich

I.I. Vorovich Institute for Mathematics, Mechanics and Computer Science, Southern Federal University

Abstract: Tolerant parsing is a form of syntax analysis aimed at capturing the structure of certain points of interest presented in a source code. While these points should be well-described in the corresponding language grammar, other parts of the program are allowed to be not presented in the grammar or to be described coarse-grained, thereby parser remains tolerant to the possible inconsistencies in the irrelevant area. Island grammars are one of the basic tolerant parsing techniques. “Island” is used as the relevant code alias, while the irrelevant code is called “water”. In the paper, a modified LL(1) parsing algorithm with built-in “Any” symbol processing is described. The “Any” symbol matches implicitly defined token sequences. The use of the algorithm for island grammars allows one to reduce irrelevant code description as well as to simplify patterns for relevant code matching. Our “Any” implementation is more accurate and less restrictive in comparison with the closest analogues implemented in Coco/R and LightParse parser generators. It also has potentially lower overhead than the “bounded seas” concept implemented in PetitParser. As shown in the experimental section, the tolerant parser generated by the C# island grammar is proven to be applicable for large-scale software projects analysis.

Keywords: tolerant parsing, robust parsing, lightweight parsing, partial parsing, island grammar, parser generation.

Language: English

DOI: 10.15514/ISPRAS-2018-30(4)-1



Bibliographic databases:


© Steklov Math. Inst. of RAS, 2024