I have just finished a very fustrating week fighting a parser generator implementing some syntactic sugar for Kowari. The reason why it's been so fustrating is simply because it should have taken one day to implement, test, and document, except that the parser generator was sablecc.
Now sablecc has been stable, featureful, and reasonably performant. Documentation is a minor problem given that it really only consists of a tutorial and the author's masters thesis; still it's a LALR parser generator once you have an EBNF exactly how hard can it be? As a parser generator sablecc works wonderfully.
Unfortunately that's also where the trouble starts, because it's very rare for someone to only want a parser. Generally someone is using a parser generator because they are writing some sort of compiler, and so the parser generator feeds into some sort of semantic analysis stage. And this is where sablecc fails, cf. the topic of todays post.
Sablecc generates an AST. Sablecc always generates an AST! And of course how does your hard working, OOP entrenched, lemming of an author support inspection of this tree? Well as with all well behaved OOP lemmings the author parrots off in the true unquestioning drone of the acolyte, "The visitor pattern of course". Think I'm joking? The thesis was called "SABLECC, AN OBJECT-ORIENTED COMPILER FRAMEWORK"! At this point swearing would be appropriate.
Let's step back and look at the problem we are actually trying to solve. We have an context-free grammar that describes how we can generate an abstract syntax tree from a given input. Semantic Analysis is effectively a transform applied to the AST. Now the nature of this transform is remarkably consistent across compilers. Occasionally it amounts to a eulers walk over the tree, but generally it is a straight forward post-order traversal generating what are called 'Synthsised Attributes' at each production from it and it's children. The result is called an 'Annotated Syntax Tree', and generally the attribute derivations are defined to ensure that the desired output of the semantic analyser is an attribute at the root of the AST.
Attribute derivation defintions are routinely defined as code snippets that convert child attributes into production attributes. In other words we have a series of functions that are dispatched on the AST node type, are called within the context of the results of the child production's functions, and are called as a post-order traversal.
Finally we have the added advantage that each node in the AST corresponds to a reduction in the LALR bottom-up shift-reduce parser that is the ultimate goal of an LALR parser generator. Being bottom-up, this is automatically a post-order sequence of reductions. Consequently we can choose to forgo the generation of the AST altogether and associate the functions directly with the reductions if we so desire. (and why wouldn't we?)
Now generating an AST is sometimes desirable, for instance when you require the afore mentioned eulers walk based traversal, or if you are wanting to unparse the AST back into source-code again later. But regardless of if you generate an AST or not, most compilations are a simple bottom-up generation of the root anntoation in an annotated syntax tree based on associating functions with AST-nodes/parser-reductions.
So that is the problem we want to solve, and in the strictest possible sense it is fundamentally functional in nature. This of course poses a serious problem to our aforementioned object-oriented-lemming, because the natural, straight forward solution is not object-oriented. So the lemming spits out a tree and of course doesn't even need to think about the next step. After all "the object-oriented way to traverse a tree is the visitor pattern", so we'll disregard the fact that the problem is functional in nature --- we have this hammer and by george if we aren't going to make this problem fit our solution.
The visitor pattern is kludge used by programmers in a language that doesn't support multiple-dispatch to provide themselves with a static version of double-dispatch. The cannonical example that was used by the GoF when they documented the pattern is tree traversal. And ever since that fateful day no OOL has dared consider anything else (feh). Note there is no mention of trees in the definition above. Trees are a datastructure. The visitor pattern is a language semantics idiom. THEY ARE NOT RELATED CONCEPTS! Maybe if you are implementing dynamic traversals over a static tree (which does require double-dispatch to do cleanly) you might consider the visitor pattern as a solution to THIS problem; but only if you are willing to bet the life of your first born to guarentee that you will never ever want to introduce a new tree-node type into your design.
For LALR parsers using code definitions for synthesised attributes, this problem has been solved for decades. Yacc got it right. The limitations and difficulties that programmers have with yacc fall into 2 categories: Memory management - garbage collection neatly solves that problem; Conceptual vagueness - a tool can only rarely even help you understand the problem you are trying to solve.
Why is this the right approach? Because the attribute derivations are generally trivial calls to constructors or functions --- along the lines of foobar = new Foobar(token), or combined = combineFoobars(child1.foobar, child2.foobar). And these derivations are intimately associated with individual productions (as discussed above). Consequently it is extremely awkward to seperate the grammar definition from the attribute derivation definition, as they needs must have an identical structure, and hence seperation violates the DRY principle and makes debugging/enhancement a right royal pain.
So why have I suffered this week? Because OO-Lemmings can't be bothered to consider if OO is an appropriate paradigm for the problem that presents itself to us; and because of this unfortunate association of tree-traversal with the visitor pattern. Now I can't do anything about the prevalence of lemmings that won't get me arrested, but please please please stop using the visitor pattern to traverse trees.
 If an attribute derivation requires information from a parent or sibling then it is referred to as an 'Inherited Attribute', and this is when you require a dependency aware eulers walk instead of a post-order traversal.