grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Grammatica-users] novice question: when more than one token match?


From: malcolm macaulay
Subject: [Grammatica-users] novice question: when more than one token match?
Date: Mon, 12 Apr 2004 21:04:35 +0100

Hi there,

I hope someone can help me with this and I apologize if this is a dumb
question. 

I have a grammar where one more than one token can match (i.e. one or
more regex tokens are a subset of another regex token). If I am reading
the C# code correctly, Grammatica will return the *longest* token which
can be matched and if this does not match the production it will return
an error. This does not make sense to me as there may be a shorter token
match which is the correct one according to the grammar. 

The document I want to parse looks like this:

Document to parse (the first part of a configuration file for a digital
power protection relay):

[DEVICE INFORMATION]
DEVICE NAME=750
COMMENT=some words
VERSION=500

My grammar:

%tokens%

WHITESPACE =                                    <<[\s\n\r]+>> %ignore%
DEVICE_INFORMATION_HEADING =    <<\[DEVICE INFORMATION\]>>
DEVICE_NAME =                                   <<DEVICE NAME>>
EQUALS =                                                "="
NUMBER =                                                <<[0-9]+>>
DOT_STAR =                                      <<.*>>

%productions%

Expression = DEVICE_INFORMATION_HEADING DEVICE_NAME EQUALS NUMBER
DOT_STAR;

When I test this grammar against the document I get:

Expression(2001)
DEVICE_INFORMATION_HEADING(1002): "[DEVICE INFORMATION]". Line: 1, col:
1
Error: in test.txt line 2:
Unexpected token "DEVICE NAME=750" <DOT_STAR>, expected <DEVICE_NAME>

I have read the C# code and I can see that it is matching both
<DEVICE_NAME> and <DOT_STAR>, but returning the <DOT_STAR> match as this
has the longer string of the two matches. This does not seem right;
surely it should return all matches and then refer to the productions to
determine which to use?

When I parse the document I am only interested in getting hold of
DEVICE_NAME and its value. I don't care about the remainder of the
document, hence the <DOT_STAR>.

If I change the grammar to (remove DOT_STAR):

%tokens%

WHITESPACE =                                    <<[\s\n\r]+>> %ignore%
DEVICE_INFORMATION_HEADING =    <<\[DEVICE INFORMATION\]>>
DEVICE_NAME =                                   <<DEVICE NAME>>
EQUALS =                                                "="
NUMBER =                                                <<[0-9]+>>

%productions%

Expression = DEVICE_INFORMATION_HEADING DEVICE_NAME EQUALS NUMBER;

Then it correctly reads <DEVICE_NAME> <EQUALS> <NUMBER> (then gives me
an error on the third line as you would expect). 

Any help would be greatly appreciated. 

Per, thanks for making this parser generator available.

cheers

Malcolm Macaulay








reply via email to

[Prev in Thread] Current Thread [Next in Thread]