grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] Regex token order


From: Drew Vogel
Subject: Re: [Grammatica-users] Regex token order
Date: Mon, 28 Feb 2011 00:17:08 -0600

I would expect the token definition order to matter, based on my experience with similar tools like flex. I must be doing something wrong.

This is the test file I am trying to parse:
--------------------------------------------------
>email<
  Enter your email address:


This is my test grammar:
--------------------------------------------------
%header%
GRAMMARTYPE = "LL"

%tokens%
RCARET = ">"
LCARET = "<"
ITEM_NAME = <<[a-zA-Z][a-zA-Z0-9]+>>
TEXT = <<.+>>

%productions%
Item = ItemDecl TEXT;
ItemDecl = RCARET ITEM_NAME LCARET ;


This is the error I get from grammatica:
--------------------------------------------------
java -jar grammatica-1.5.jar Q.grammar --parse test.q
Parse tree from test.q:
Error: in test.q: line 1:
    unexpected token ">email<" <TEXT>, expected ">"


If I remove the TEXT token definition and the reference in the Item production, the remaining grammar does properly match the first line and I get a parse error at the new line character (as expected). Why does the introduction of my TEXT token override those previously-matching tokens, even though it is listed last in the %tokens% section?



On Sun, Feb 27, 2011 at 11:49 PM, Oliver Bock <address@hidden> wrote:
I had to do a similar thing, but putting the more specific tokens first in %tokens% worked for me.  From my grammar:

ON = "ON"
VARNAME = <<address@hidden(address@hidden@])?>>

The text "ON" could match both these tokens, but for me ON matches, not VARNAME.  I suggest you cut your example down into a very simple grammar (like the above).


  Oliver


On 28/02/2011 4:37 PM, Drew Vogel wrote:
If I have two regex tokens A and B and A is a subset of B, how do I disambiguate them such that A will always be tried before B? The order they appear in the %tokens% section does not seem to affect this and I did not see an example of this in the documentation.

The parser I am trying to construct is for a template-like language with commands embedded in text. Thus I have a "text" token regex <<.+>> to match everything not otherwise matched as a command, but I only want to match it after all other token regex patterns have been tried.

Drew Vogel
_______________________________________________ Grammatica-users mailing list address@hidden http://lists.nongnu.org/mailman/listinfo/grammatica-users


_______________________________________________
Grammatica-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/grammatica-users



reply via email to

[Prev in Thread] Current Thread [Next in Thread]