Wednesday, May 27, 2009

ANTLR: an exercise in pain.

Ok, so for my current project I need to either build heuristic or machine learning based fuzzy parser. As someone who has written numerous standard parsers before, this qualifies as interesting. Current approaches I'm considering include cascading concrete grammars; stochastic context-free grammars; and various forms of hidden-markov-model based recognisers. Whatever approach ends up working best, the first stage for all is a scanner.

So I start building a jflex lexer, and make reasonable progress when I find out that we are already using ANTLR for other projects, so I should probably use it as well. Having experienced mulgara's peak of 5 different parser generators - this does eventually become ridiculous - I was more than willing to use ANTLR. Yes it's LL, and yes I have previously discussed my preference for LR; but, it does support attaching semantic actions to productions, so my primary requirement of a parser-generator is met; and anyway, it has an excellent reputation, and a substantial active community.

What I am now stunned by, is just how bad the documentation can be for such a popular tool. One almost non-existent wiki; a FAQ; and a woefully incomplete doxygen dump, does not substitute for a reference. ANTLR has worse documentation than sablecc had when it consisted of an appendix to a masters thesis!

My conclusion: If you have any choice don't use ANTLR. For Java: if you must use LL I currently recommend JavaCC; if you can use an LALR parser do so, my current preference is Beaver.

4 comments:

JamesDumay said...

Buy the ANTLR book since it contains all of the relevant docs since ANTLR is an open source project supported by commercial documentation.

Andrae Muys said...

I'm aware there is a book, but quite frankly I shouldn't have to buy a book to get a basic reference to the tool.

The problem isn't that there's a book, and I have no objection to buying books (I have spent literally $1000's of dollars on books on interpreters, compilers, and compiler technologies), but on the _only_ meaningful documentation being a book.

Compare with JavaCC.
Reference Docs: JavaCC Grammar FilesAPI Routines: JavaCC API Documentationand I'm happy to consider buying the book if/when these comprehensive, but terse, descriptions prove inadequate:
Book: Generating Parsers with JavaCCOr consider JFlex which is particularly comprehensive.

While Beaver is spartan, at least it is there.

Seriously, playing "guess the syntax" is only fun for the first -15min. I don't care that it is open-source, there are numerous open-source alternatives, and I will always recommend them over ANTLR.

Jim said...

Probably late to help but I've been
experiencing the same pain.

Version 2 actually has a pretty good reference manual -
[http://antlr2.org/doc/index.html].

It's out of date but an ANTLR newbie should
find it very helpful with the v.3 docs filling
things out.

I think the cool thing about a tool like ANTLR
is that it can be a great way to start
experimenting with parsers. The lack of good
docs really hurts.

Mathias said...

One interesting alternative to ANTLR might be parboiled, an open-source Java 5+ parsing library that is much easier to use than ANTLR and comes with good documentation.
With parboiled you describe your parsing grammar right in Java with no additional Syntax to learn and full IDE support...