[Grammatica-users] novice question: when more than one token match?

grammatica-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Grammatica-users] novice question: when more than one token match?

From:	malcolm macaulay
Subject:	[Grammatica-users] novice question: when more than one token match?
Date:	Mon, 12 Apr 2004 21:04:35 +0100

Hi there,

I hope someone can help me with this and I apologize if this is a dumb
question. 

I have a grammar where one more than one token can match (i.e. one or
more regex tokens are a subset of another regex token). If I am reading
the C# code correctly, Grammatica will return the *longest* token which
can be matched and if this does not match the production it will return
an error. This does not make sense to me as there may be a shorter token
match which is the correct one according to the grammar. 

The document I want to parse looks like this:

Document to parse (the first part of a configuration file for a digital
power protection relay):

[DEVICE INFORMATION]
DEVICE NAME=750
COMMENT=some words
VERSION=500

My grammar:

%tokens%

WHITESPACE =                                    <<[\s\n\r]+>> %ignore%
DEVICE_INFORMATION_HEADING =    <<\[DEVICE INFORMATION\]>>
DEVICE_NAME =                                   <<DEVICE NAME>>
EQUALS =                                                "="
NUMBER =                                                <<[0-9]+>>
DOT_STAR =                                      <<.*>>

%productions%

Expression = DEVICE_INFORMATION_HEADING DEVICE_NAME EQUALS NUMBER
DOT_STAR;

When I test this grammar against the document I get:

Expression(2001)
DEVICE_INFORMATION_HEADING(1002): "[DEVICE INFORMATION]". Line: 1, col:
1
Error: in test.txt line 2:
Unexpected token "DEVICE NAME=750" <DOT_STAR>, expected <DEVICE_NAME>

I have read the C# code and I can see that it is matching both
<DEVICE_NAME> and <DOT_STAR>, but returning the <DOT_STAR> match as this
has the longer string of the two matches. This does not seem right;
surely it should return all matches and then refer to the productions to
determine which to use?

When I parse the document I am only interested in getting hold of
DEVICE_NAME and its value. I don't care about the remainder of the
document, hence the <DOT_STAR>.

If I change the grammar to (remove DOT_STAR):

%tokens%

WHITESPACE =                                    <<[\s\n\r]+>> %ignore%
DEVICE_INFORMATION_HEADING =    <<\[DEVICE INFORMATION\]>>
DEVICE_NAME =                                   <<DEVICE NAME>>
EQUALS =                                                "="
NUMBER =                                                <<[0-9]+>>

%productions%

Expression = DEVICE_INFORMATION_HEADING DEVICE_NAME EQUALS NUMBER;

Then it correctly reads <DEVICE_NAME> <EQUALS> <NUMBER> (then gives me
an error on the third line as you would expect). 

Any help would be greatly appreciated. 

Per, thanks for making this parser generator available.

cheers

Malcolm Macaulay

[Prev in Thread]

Current Thread

[Next in Thread]

[Grammatica-users] novice question: when more than one token match?, malcolm macaulay <=
- Re: [Grammatica-users] novice question: when more than one token match?, Per Cederberg, 2004/04/12
  - RE: [Grammatica-users] novice question: when more than one tokenmatch?, malcolm macaulay, 2004/04/12

Prev by Date: [Grammatica-users] Grammatica TimeOut
Next by Date: [Grammatica-users] Case-insensitive parsing
Previous by thread: [Grammatica-users] Grammatica TimeOut
Next by thread: Re: [Grammatica-users] novice question: when more than one token match?
Index(es):
- Date
- Thread