grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] More Grammatica Help


From: Per Cederberg
Subject: Re: [Grammatica-users] More Grammatica Help
Date: Tue, 23 Mar 2004 12:51:24 +0100

On Tue, 2004-03-23 at 08:31, Brandon Silverstein wrote:
> Per,
> 
> Thanks for you help; your explanations cleared up the main problem I 
> was having.  Now I am running into another problem:
> 
> inherent ambiguity in production 'CStyleComment' at position 2
> starting with token "*"
> 
> I have a feeling that my grammar file is not formatted correctly or 
> that you might have explained what to do in the previous e-mail.  In 
> any case, I have attached the current file I am working with to see if 
> anything sticks out as being wrong to anyone.  How do I solve these 
> inherent ambiguities?  Any help is appreciated.

An inherent ambiguity is reported by Grammatica whenever two
look-ahead sets contain unresolvable overlaps. That is, if we
have two alternatives in a production Grammatica must be able
to choose which to use based on a limited number of look-ahead
tokens. Imagine a case like this:

A = "b"* B
  | "b"* C ;

There is no way Grammatica can calculate a limited look-ahead
set for each of the alternatives here. It is impossible to know
beforehand the number of repetitions of the "b" token. The way
to resolve these ambiguities is to rewrite the production, like
this:

A = "b"* BOrC ;

BOrC = B
     | C ;

In your case, what Grammatica is trying to tell you is that the
CStyleComment contains such an ambiguity at position 2:

CStyleComment = "/*" [ CStyleBody ] CStyleEnd;
                     ^
                    here

Problem is that CStyleBody can start with an unlimited number 
of "*", while the following CStyleEnd can also match the same.
It is thus impossible for Grammatica to know if the next tokens
indicate the optional CStyleBody or the CStyleEnd production.
The simple resolution in this case would be to redefine 
CStyleEnd like this:

CStyleEnd = "*/" ;

BUT, this is not a good way to use Grammatica. Instead, 
comments, indentifiers and similar a MUCH BETTER represented 
as tokens. This is how I'd define the comment tokens for 
example (a bit tricky to read, but anyway):

DOC_COMMENT  = <</\*\*([^*]|\*[^/])*\*/>>
C_COMMENT    = <</\*([^*]|\*[^/])*\*/>>
CPP_COMMENT  = <<//.*>>

Note that DOC_COMMENT must preceed C_COMMENT as both will 
match the documentation comments (and in that case order is
important). For this reason, string tokens should ALWAYS be
placed before regular expression tokens, to avoid that a 
regular expression token takes precedence.

Some other things I noted in the jml.grammar:

* The "nonAtPlusStar" token matches long pieces of text 
  (including newlines and whitespace) which would almost 
  always be the longest match (and thus the token found).

* The "@since" and similar tags inside the documentation
  comment are better parsed separately. It is possible to
  write a grammar to do it all, but it would not look very
  pretty.

If you need more examples, have a look at the various grammar
files distributed with Grammatica (in src/grammar and 
test/src/grammar). Also try the "<grammar> --tokenize <file>"
command to check that your grammar splits the input into the
expected tokens.

Cheers,

/Per






reply via email to

[Prev in Thread] Current Thread [Next in Thread]