grammatica-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Grammatica-devel] Specialized node types update


From: Connor Robert Prussin
Subject: [Grammatica-devel] Specialized node types update
Date: Tue, 14 Jul 2009 15:10:55 +0200

Per (and other users)-

I am a student who is using the Grammatica project.  I have written up modifications to implement "specialized" node types (see http://code.google.com/p/grammatica/wiki/FeatureProductionSubclasses).  The attached patch successfully implements these nodes in Java, and I'm working on the C# version.

You will notice that the patch adds a "--specialize" CLI switch.  Without this switch, Grammatica performs the exact same as before this patch.  With the switch, however, it prints out a directory full of classes that extend "SpecializedProduction" (which in turn extends "Production").  Note also that the new classes won't actually be used unless a user does something along the lines of:

        Node head = new GeneratedParser(new FileReader(fileToBeParsed), new GeneratedAnalyzer() {}).parse();

or

        Node head = new GeneratedAnalyzer() {}.analyze(new GeneratedParser(new FileReader(fileToBeParsed)).parse());

Basically this means that even if the "--specialize" switch is set, backwards compatibility can still be achieved after class generation by simply using something like:

        Node head = new GeneratedParser(new FileReader(fileToBeParsed)).parse();

However, actually using the "--specialize" flag then attempting backward compatibility is untested.  It just must be noted that the correct analyzer HAS to be used to get the specialized nodes.

Also, you will notice that the "synthetic" productions are not "spliced" out of the parse tree when using "--specialize."  This is due to logistical issues with the well-named accessors.  While programming, we found no good way to efficiently get all the possible accessors of a production and still get rid of the synthetic productions.  We kept them with the mindset that the majority of people using the "--specialize" flag will be writing grammars in a style that will not have a lot of synthetics.  For an example of a production rule that would be troublesome in generating classes and accessors, consider:

        production1 = production2 (produciton3 | production4) ;

In this case, do we write accessors for Production3, Production4, or both?  The simple answer is, there is no way at code generation time.  Thus, the nodes printed are simply production1, production2, and an interface "production1_S1" that is implemented by either production3 or production4.

Also note that the generated classes have prefixes of either "i_," "a_," or "t_."  The "t_" classes are tokens--the classes are simply written to enable use of language features like "instanceof."  the "i_" classes are interfaces--node types which have no children themselves.  In the last example "production1_S1" would get an "i_" prefix, as would production5 in the following example:

        production5 = production6 | production7 | production8 ;

Finally, the "a_" classes are "alternatives"--node classes that correspond to rules that have children.  These have well-named generated accessors that are much easier to use than simply the children[] object.

Again, I'm currently working on the C# implementation.  As it is basically a copy-paste port of what has already been written for Java, it should not take much time.  I hope the new modifications are useful!

Sincerely,


Connor Prussin

Attachment: specialized_classes_java.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]