Tuesday, September 06, 2005

Thoughts on parser generators

Recently I have been evaluating several parser generators and eventually picked antlr in my project. Among all the parser generators in the world, yacc, boost.spirit, antlr and Perl 6 represent the current state of art and each has unique characters worth talking about.

Since most languages do not have parsing features built-in (Perl 6 will be the pioneer), almost all of the current parser generators use code generation. A common problem is: the generated code may become very foreign to you, and look totally cryptic in your favorite debugger.

Yacc suffers most in this perspective, since it uses LALR parsing. Lots of people like LL(k) based antlr since it produces human readable code, which I find extremely useful when working on a new/unfamiliar grammar. For more detailed comparison you can read Ian Kaplan's article.

Boost.Spirit is quite unique (and amazing!). With the help of powerful C++ template and operator overloading, it mimics EBNF in C++. Although it looks like perl 6's approach, it actually generates code through template (compile time), and the generated code is not as readable as antlr's output. Even worse, since C++ compiler won't take special care for boost.spirit, a simple grammar error may appears to be multiple pages of impenetrable garbage. Dave Handley has a nice introduction about Boost.Spirit on CodeProject.

Comega(Cω) made an interesting move by integrating popular "foreign language"(XML, SQL) to the core of the host language(C#). Perl 6's built-in parsing support did the similar thing (brings EBNF to perl). This can help to provide ealier and more friendly error reporting, which is a mjor problem for Boost.Spirit.

So far, antlr is sufficient for me. It can generate parser in several languages (Java, C#, C++, Python), reports errors nicely, and the code is readable and debugger friendly.

0 Comments:

Post a Comment

<< Home