Tuesday, November 01, 2005

Distinguish leftshift and heredoc

I am working on a ruby parser with antlr. Last weekend I tested my ruby lexer on lots of realworld ruby scripts, and found an interesting problem. That is, how to distinguish left shift operator and "here document(heredoc)"?

As we know, "<<" is left shift operator in lots of computer languages. In perl and ruby, it is also the start of heredoc. For those of you who do not have unix shell script/perl/ruby background, heredoc acts like a string, but with flexibility. Here is an example in ruby:

str = <<EOF
hello, world!

Since perl/ruby choose not to introduce a new symbol for the start of heredoc, their parser developers have to deal with the ambiguity. I tried to find out a simple solution, but later found out the ambiguity can not be solved at syntax level.

For example, in the following code, "<<" should be parsed as the start of heredoc:

def x(var)
  puts var

x <<1

But in another case, "<<" should be parsed as left shift operator:

x = 1

x <<1

The only difference is: in the first case, x is a method, while in the second case x is a variable. So there is no other choice, we have to look up symbol table to decide lexer state.

Today I checked the source code of ruby 1.8.3. No surprise, it uses the same approach.


Post a Comment

<< Home