[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: What is going in this syntax
From: |
Tim Van Holder |
Subject: |
Re: What is going in this syntax |
Date: |
Fri, 08 Oct 2004 08:13:28 +0200 |
User-agent: |
Mozilla Thunderbird 0.8 (Windows/20040913) |
wim delvaux wrote:
Hi all,
this excerpt.
"unsigned"{WSPC}+{IDENT} ...;
{IDENT} ...;
Where defined are
WSPC [ \t]
IDENT {VAROF}|{REALIDENT}
when the input presents 'unsigned long' the first rule applies (which is what
you would expect) but ONLY with the tokenvalue "unsigned" (which is NOT what
you would expect).
next the long token is matched ... and NO not the second rule first but the
first fires again.
I have run my syntax in debug to find out what was going on
Why is that
Look at what the rules expand to:
"unsigned"[ \t]+{VAROF}|{REALIDENT} ...;
{VAROF}|{REALIDENT} ...;
As you can see, the REALIDENT pattern is a valid match for both rules.
Because of this, 'unsigned' and 'long' will both be matched by the first
rule's REALIDENT pattern (the 'unsigned VARIANT' portion produces no
match).
You'll need to add parentheses, either around the IDENT pattern
(preferred) or around the use(s) of {IDENT}, to get the intended
behaviour.
As an aside, it may be better to use states for multi-word matching like
this; the above example won't handle a case where unsigned is on one
line and long is on the next, for example. Setting a state upon
scanning 'unsigned' would allow you to cope with all sorts of things
and still know to handle 'long' specially.
As a further aside, I see no real reason to do any of this in the lexer
at all - combining 'unsigned' and the following type name is really a
job for the parser.