grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Grammatica-users] Trying to create a (G) LR parser with support for amb


From: Andres Hohendahl
Subject: [Grammatica-users] Trying to create a (G) LR parser with support for ambiguities..
Date: Sun, 23 Oct 2005 22:36:05 -0300

HI 

I have been testing...grammatica parsers under C#
Have ported the main part of Gramática (the whole parser) successfully
towards C# (will send the source back asap.) 
Noted some awfulness (slowness) in its use...hmmm.

I did this:
Created a EBNF parser using grammatica (1.4, sorry!) then used THIS parser
to create the parser object to feed into the grammar. Ported the Grammar
Class, and the First and SecondPass Parser's, that's it. And It works!

Then I load the grammar file (Spanish) from disk, and create a Parser of it,
then I can create an analyzer and a tokenizer for the Spanish phrase (to be
analyzed) and voilá! it parses ok!..but

PROBLEMS
First, when the parser itself is created out of the BNF grammar file), it
builds a parse-tree calculating the lookaheads.. this is a VEEEEERY slow!
action..hmmm, and for only 18 productions (simple Spanish I've sent down in
my last mail) this has a 5-10 second delay (P4/2.4G/512K/WinXP)!! Too slow
to be useful. 
I've tried to create a Full Parser (as a library itself) and.. the time is
the same (it also creates the in-memory look-ahead structure).
The (generated) bnf loader it very fast! so the most time is spent in
preparing the parser.

SOLUTIONS??

TODO (in theory):
I will (try to) write a simple LR parser (shift-reduce) and then modify the
tokenizer to allow ambiguities to be handled, and then modify the node class
to allow "lateral" nodes to build "parse-forest's" (multiple trees) and then
may be include a (simil-context) constrain as a separate part of the syntax
(aka grammar) like %constrain% inside I want to put simple rules to allow to
disambiguate the grammar in run time.
Those rules should be applied at the reduce phase, and will (should) quickly
kill inadequate ambiguities in the production-parser forest (on the fly)
(hope so) this will be a new "strange" kind of parser I guess, hope it
works...

This new Tokenizer will parse input strings with normal parsing methods
(regex, etc.), then give them into some external procedures (using
callback/events) to "tag" them externally (using my affix/morpho-syntactical
/ dictionary libraries) and feed tem back along with the newly added
multiple tags (which will be part of the %constrain-tag% tags)

My proposal on this new "layer" is something like:
--------------------------------
&header% bla blah..
%token%
noun = <<\w+\.nn>>
article = <<\w+\.ar>>
verb = = <<\w+\.ve>>
adverb = <<\w+\.av>>
pronoun = <<\w+\.pr>>
preposition = <<\w+\.pp>>

%production%
sentence = [subject] predicate;
subject = [article] noun [adjective];
predicate = verb [adverb]* [complement]*;
complement = preposition ( noun | adverb | sentence );

%constrain-tag%
gender = M|F|N
number = singular | plural | dual
person = first | second | third | impersonal

%constrains%
subject:  article(gender,number) = noun(gender,number) =
adjective(gender,number);
sentence: subject(person) = verb(person)

<*---------------------------------------------------*>
Q: is this a new stuff, or am I reinventing the wheel?
<*---------------------------------------------------*>

Any comments are welcome ;)

Andrés Hohendahl

-----Mensaje original-----
De: address@hidden
[mailto:address@hidden En
nombre de Per Cederberg
Enviado el: Thursday, October 20, 2005 2:54 PM
Para: address@hidden
Asunto: RE: [Grammatica-users] Lookahead feature?

Hi Andres,

Before you rush ahead with your project, I suggest that you take
a look at the current development version of Grammatica:

http://grammatica.percederberg.net/download/development/index.html

The thing is, that Grammatica has already been ported to .NET
once by Adrian Moore. I have so far only included parts of his
efforts, due to my general slowness. The code generator parts
for .NET are already ported, but have been launched in a
separate project -- SourceFlow (www.sourceflow.org).

So currently only the Grammatica application remains to be
ported. If you can contribute to that, I'm all ears. Also, if
you have any other improvements I'd appreciate patches relative
to the latest development snapshot, as it is otherwise very slow
and tedious for me to merge the changes. Also, I try to maintain
the same coding style that is currently used in all C# parts of
Grammatica (regarding comments etc).

Regarding a conversion tool between different grammar formats,
that is an excellent idea! Hadn't thought about that before,
but having such a tool would definitely rock!

Cheers,

/Per

On thu, 2005-10-20 at 13:52 -0300, Andres Hohendahl wrote:
> Hi there!
> 
> I am now working on a modification of the Grammatica Parser-Generator
under
> C# to allow this:
> Get (read) a Full EBNF grammar file (on the fly) then parse it (with a
> EBNF-grammatica generated parsed) and generate a Parse Tree.
> 
> NOTE:
> I've successfully ported the Grammatica-Core Parse's towards C# (.net)
> Also I modified the Run-time library to be more C# friendly while
> maintaining full compatibility towards the grammatical generated code. I
> think I've improved readability and may be some performance.
> (Per: do you want me to send in the source's)?
> 
> ...
> Then take this Parser, and scan an input text (implementing this grammar)
> and generate an interpreted representation of this input fed from the
> grammar-parser tree into a new hierarchical tree to represent the
> "parsed/compiled/acquired" data from the input.
> 
> This idea came from the fact that many tokens/words (in natural languages)
> has multiple possible functions, and thus generate ambiguous grammar's.
> So this is the point: let's generate a tree of the input tokens, with all
> the (possibilities) as branches.
> Then generate a (recursive) method of: 
> 1 Taking all possible token-sequence's and feeding them into the parser's
> tree, to see if they succeed. After each success, the system will generate
a
> "possible representation list of token's" along with a "ponderation" in
> which each word who has several "functions" has a value associated to the
> each function (creating a most-used hierarchy). Then all "values are
added,
> and the final result will be a ordered (probabilistically) list of the
> possible input token's values (functions) thet responds to thei
> deterministic grammar.
> 
> Thoughts are welcome
> 
> Andres
> ----------------
> By the Way, Per:
> 
> Is there any grammar translation tool to-from EBNF(grammatica's) <->
> EBNF(std) / YACC / BRISON / etc. (standard) 
> Because there are a lot of tools out there (IE. The Gold-Parsed) to
> debug/test your grammars graphically... but they don’t use this EBNF
syntax,
> and the conversion must be made by hand (quite tedious)!
> 
> url-Links are welcome..too!
> 
>
---------------------------------------------------------------------------
> Here is a grammar file in which I have been wonkin in, and the "words"
will
> be fed into pre-tagged as follows, for the phrase "this word is red"
> 
> This.AR word.SN is.VB red.AD
> 
> I have some problems in representing this ambiguous (Spanish) grammar, can
> anyone help me out?
> 
> HERE IT IS:
> 
> -------------------------------
> 
> /*
>  * Gramática Sintaxis Española 
>  */
> %header%
> GRAMMARTYPE = "LL"
> DESCRIPTION = "Spanish grammar for SINTACTIC-spanish recognition"
> AUTHOR      = "AndySoft"
> VERSION     = "1.0"
> DATE        = "19 Oct 2005"
> LICENSE     = ".."
> COPYRIGHT   = "All rights.. hmmm.. reserved.?"
> 
> %tokens%
> 
> LEFT_BRACKET = "["
> RIGHT_BRACKET = "]"
> WHITESPACE = <<[ \t\n\r]+>> %ignore%
> 
> // POS (Part Of Speech )
> //    Auxiliary Verb (haber, ser, estar) ' Copulative ' no Direct/Indirect
> Object
> VX =  <<\w+\.VX\w*>>
> //    Verb Transitive (need direct object)
> VT =  <<\w+\.VT\w*>>
> //    Verb Intransitive (Reflexive) don't need direct object
> VI =  <<\w+\.VI\w*>>
> //    Verb (Transitive or Intransitive)
> VB =  <<\w+\.VB\w*>>
> //    Verb Participio
> VP =  <<\w+\.VP\w*>>
> //    Verb Gerund
> VG =  <<\w+\.VG\w*>>
> //    Verb Suplemento
> VS =  <<\w+\.VS\w*>>
> //    Adverb
> AV =  <<\w+\.AV\w*>>
> //    Adverbio de Cantidad
> AC = <<\w+\.AC\w*>>
> //    Adjective
> AJ =  <<\w+\.AJ\w*>>
> //    Noun singular or Mass
> NN =  <<\w+\.NN\w*>>
> //    Proper Noun, singular
> SN =  <<\w+\.SN\w*>>
> //    Noun, Numeral
> NU = <<\w+\.NU\w*>>
> //    Article
> AR = <<\w+\.AR\w*>>
> //    Personal Pronoun
> PP = <<\w+\.PP\w*>>
> //    Possessive Pronoun
> PO = <<\w+\.PO\w*>>
> //    Demonstrative Pronoun
> PD = <<\w+\.PD\w*>>
> //    Interrogative Pronoun
> PI = <<\w+\.PI\w*>>
> //    Preposition
> PR = <<\w+\.PR\w*>>
> //    Conjunction
> CJ = <<\w+\.CJ\w*>>
> //    Interjection
> IJ = <<\w+\.IJ\w*>>
> //    Pronominal
> PL = <<\w+\.PL\w*>>
> //    Un-Known
> UK = <<\w+\.UK\w*>>
> // Special reflexive particle "se"
> SE = <<\w+\.SE>>
> 
> // SINTAGMAS (construcciones)
> %productions%
> 
> // Frase
> // Fras       = Orac | Orel | SNom ;
> // Oración
> Orac  = [ SNom ]  SVer ;
> // Oracíon de Relativo
> Orel  =  Orac ;
> // Oración Interrogativa
> // Oint       = Orac ;
> 
> // Sintagma Nominal
> SNom  = [ Det ]  SNuc ;
> // Nucleo Nominal
> SNuc  =  ( NN | SN | NU )  [ CNuc ] ;
> //    un nombre común 
> //    un nombre propio (y entonces no aparece ningún determinante en Det) 
> //    un pronombre personal (y entonces no aparece ningún determinante en
> Det) 
> //    cualquier elemento sustantivado (infinitivos aislados, adjetivos,
> oraciones subordinadas //     sustantivas, adverbios...), que generalmente
> irá precedido por un artículo.
> // Determinante
> Det   =   AR ;
> //    artículo 
> //    demostrativo 
> //    indefinido 
> //    numeral 
> //    posesivo 
> //    distributivo 
> //    interrogativo 
> //    exclamativo
> //  Complemento del Núcleo
> CNuc  =  (  SAdj | SPre |  SNom |  Orel ) [ CNuc ] ;
> //    un sintagma adjetival 
> //    un sintagma preposicional 
> //    un SN en aposición 
> //    una oración de relativo
> 
> 
> // Sintagma Adjetival
> SAdj  = [ AC ] NAdj [ CAdj ] ;
> // Núcleo Adjetival
> NAdj  =   AJ  ;
> // Complemento del Adjetivo
> CAdj  =  SPre ;
> 
> 
> // Sintagma Verbal
> SVer  =  ( VTr CDir ) | ( VI CInd ) | ( VS CCir ) ;
> VTr   = VT | VT VI ;
> // Si el verbo es transitivo: un SNom, un SPre o una oración en función de
> Complemento Directo. 
> CDir  = SNom | SPre | Orac ;
> // Si el verbo es copulativo: cualquier sintagma (salvo SVer) o una
oración
> en función de Atributo.
> CInd  = ( SNom | SAdv | SPre | Orac ) [ CInd ] ;
> // Si el verbo es de suplemento: un SP (que contenga un SNom o una
oración)
> en función de Suplemento 
> CCir  = [ PR ]  ( SNom | Orac ) ;
> // Verbo Compuesto
> VComp = [ VI ] VT [VI] ;
> 
> // Sintagma Adverbial
> SAdv  = [ AC ] NAdv ;
> 
> // Núcleo (Adverbio/forma Adverbial)
> NAdv  =  AV  [ CAdv ] ;
> // Complemento del Adverbio
> CAdv  = SPre ;
> 
> // Sintagma Preposicional  
> SPre  = [ PR ]  ( SNom | SAdv |  Orac | SPre ) ;
> 
> 
> 
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
> 



_______________________________________________
Grammatica-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/grammatica-users





reply via email to

[Prev in Thread] Current Thread [Next in Thread]