Thursday, November 03, 2005

Linear approximate lookahead

Even though we talk antlr as a LL(k) parser generator for simplicity, it actually does linear approximate lookahead instead of full LL(k).
Most of the time this is a good thing, because it reduces the complexity of storing and testing lookahead (from exponential to linear reduction). But occasionally the weaker lookahead is insufficient and results in nondeterministic parser/lexer.
Here is an example to illustrate the problem:

RULE1
   : "ab" | "ba"
   ;

RULE2
   : "ac" | "ca"
   ;

The above is prefect LL(2) grammar, but antlr will give you a warning:

[antlr] test.g: warning:lexical nondeterminism between rules RULE1 and RULE2 upon
[antlr] test.g: k==1:'a'
[antlr] test.g: k==2:'a'

If you look at the generated code, soon you will realized this is because of linear approximate lookahead:

else if ((LA(1)=='a'||LA(1)=='b') && (LA(2)=='a'||LA(2)=='b')) {
   ...//match RULE1
}
else if ((LA(1)=='a'||LA(1)=='c') && (LA(2)=='a'||LA(2)=='c')) {
   ...//match RULE2
}

If the input is "aa", even though it does not match any of the two rules, both tests will be true, and that is what the nondeterminism warning trying to tell you.

A full LL(k) parser generator should generate the following code instead:

else if ((LA(1)=='a'&&LA(2)=='b') || (LA(1)=='b'&&LA(2)=='a')) {
   ...//match RULE1
}
else if ((LA(1)=='a'&&LA(2)=='c') || (LA(1)=='c'&&LA(2)=='a')) {
   ...//match RULE2
}

0 Comments:

Post a Comment

<< Home