[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: OT: LALR(1)
From: |
Russell Lewis |
Subject: |
Re: OT: LALR(1) |
Date: |
Wed, 14 Aug 2002 08:14:15 -0700 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.9) Gecko/20020408 |
Magnus Ekhall wrote:
Hi!
Aslightly off topic short question:
What is the abbreviation LALR(1) really short for?
/Magnus
_______________________________________________
address@hidden http://mail.gnu.org/mailman/listinfo/help-bison
Look Ahead Left Right parser, with (1) token of look ahead.
Now, to explain what that means...somebody sent me a link a few weeks
back to an excellent online book (in PDF, IIRC) that explained it all
well, but I don't have the link accessible right at the moment.
Basically, my general understanding is that LR parser parses the tokens
from left to right. It has to parse a set of tokens, and then
immediately replace the sequence of tokens with some other, higher-level
token. Then you start over from the left side and re-parse. You
continue this process until you have reduced the entire sequence down to
a single end-state token. So you get rules like this:
a => b c
b => d e
c => f g
If you start with this token sequence:
d e f g
it gets parsed like this:
b f g ( b => d e )
b c ( c => f g )
a ( a => b c )
and so you've parsed the sequence.
The problem you get with LR parsers is that you can't make choices based
on any tokens that follow AFTER your current token. So a grammar like
this can't be parsed by an LR parser:
a => b c
a => d e
b => f g
c => h i
d => f g
e => j k
Take the example string:
f g j k
The parser reads the 'f' and 'g' tokens. At that time, it must
either reduce them down to a 'b' or down to a 'd', but it can't decide
which is right. Thus, an LR parser is stuck.
However, an LALR(1) parser is allowed to look ahead at 1 additional
token and make a decision based on that. Thus, it will be able to parse
the string above. It looks at the 'j' token and decides that it has to
interpret the 'f g' string must be interpreted as 'd', not as 'b'. So
it makes the conversion:
f g j k
d j k ( d => f g )
d e ( e => j k )
a ( a => d e )
Now, the trouble with LALR(1) parsers is that they have ONLY one token
of lookahead. So while they are a lot more flexible than LR, they still
have troubles.
P.S. Bison doesn't actually "start over from the beginning" after
reducing a string of tokens into a single token. If you look at the
mechanics of how parsing happens (a DFA, beyond the scope of this
email), you will find that reparsing will follow the exact pattern you'd
followed before, up to a certain point. Thus, bison keeps a stack of
states that's it's been in and simply backs up a bit after it does a
token conversion.