Distinguish leftshift and heredoc
I am working on a ruby parser with antlr. Last weekend I tested my ruby lexer on lots of realworld ruby scripts, and found an interesting problem. That is, how to distinguish left shift operator and "here document(heredoc)"?
As we know, "<<" is left shift operator in lots of computer languages. In perl and ruby, it is also the start of heredoc. For those of you who do not have unix shell script/perl/ruby background, heredoc acts like a string, but with flexibility. Here is an example in ruby:
Since perl/ruby choose not to introduce a new symbol for the start of heredoc, their parser developers have to deal with the ambiguity. I tried to find out a simple solution, but later found out the ambiguity can not be solved at syntax level.
For example, in the following code, "<<" should be parsed as the start of heredoc:
But in another case, "<<" should be parsed as left shift operator:
The only difference is: in the first case, x is a method, while in the second case x is a variable. So there is no other choice, we have to look up symbol table to decide lexer state.
Today I checked the source code of ruby 1.8.3. No surprise, it uses the same approach.
As we know, "<<" is left shift operator in lots of computer languages. In perl and ruby, it is also the start of heredoc. For those of you who do not have unix shell script/perl/ruby background, heredoc acts like a string, but with flexibility. Here is an example in ruby:
str = <<EOF
hello, world!
EOF
Since perl/ruby choose not to introduce a new symbol for the start of heredoc, their parser developers have to deal with the ambiguity. I tried to find out a simple solution, but later found out the ambiguity can not be solved at syntax level.
For example, in the following code, "<<" should be parsed as the start of heredoc:
def x(var)
puts var
end
x <<1
test
1
But in another case, "<<" should be parsed as left shift operator:
x = 1
x <<1
test
1
The only difference is: in the first case, x is a method, while in the second case x is a variable. So there is no other choice, we have to look up symbol table to decide lexer state.
Today I checked the source code of ruby 1.8.3. No surprise, it uses the same approach.
0 Comments:
Post a Comment
<< Home