fenfire-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ff-cvs] fenfire/docs/pegboard/swamp_easier--benja peg.rst


From: Benja Fallenstein
Subject: [ff-cvs] fenfire/docs/pegboard/swamp_easier--benja peg.rst
Date: Sat, 27 Sep 2003 13:02:57 -0400

CVSROOT:        /cvsroot/fenfire
Module name:    fenfire
Branch:         
Changes by:     Benja Fallenstein <address@hidden>      03/09/27 13:02:57

Modified files:
        docs/pegboard/swamp_easier--benja: peg.rst 

Log message:
        address issues

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/fenfire/fenfire/docs/pegboard/swamp_easier--benja/peg.rst.diff?tr1=1.2&tr2=1.3&r1=text&r2=text

Patches:
Index: fenfire/docs/pegboard/swamp_easier--benja/peg.rst
diff -u fenfire/docs/pegboard/swamp_easier--benja/peg.rst:1.2 
fenfire/docs/pegboard/swamp_easier--benja/peg.rst:1.3
--- fenfire/docs/pegboard/swamp_easier--benja/peg.rst:1.2       Mon Sep 22 
02:04:43 2003
+++ fenfire/docs/pegboard/swamp_easier--benja/peg.rst   Sat Sep 27 13:02:57 2003
@@ -26,8 +26,54 @@
 requested it.
 
 
-.. Issues
-   ======
+Issues
+======
+
+- Should we keep the current methods, and just add those
+  proposed in this PEG? There is a lot of code using the
+  current methods; we could just deprecate them for now.
+
+  RESOLVED: No. The point is to *simplify* the API;
+  adding more variants doesn't do that. 
+
+  Deprecating the current methods but not changing the code
+  that uses them adds to the confusion, rather than making
+  that code simpler.
+
+  (I have volunteered to change the existing code
+  if this PEG is accepted.)
+
+- What should happen in ``getObject()`` etc.
+  if there is more than one triple of the requested form?
+
+  RESOLVED: Do the same as currently: throw
+  ``NotUniqueException``. There are some problems
+  associated with that (see mailing list discussions),
+  but they are out of scope for this PEG.
+
+- What should be the name of the method returning
+  a ``TripleIter``? ``get()``, for symmetry with
+  the Collections API and the other functions;
+  ``find()``, similar to what we have now; or
+  ``query()`` for similarity with e.g. Aaron Swartz'
+  Python API for RDF?
+
+  RESOLVED: ``find()``. Tuomas explains:
+
+      I feel better about ``find()``, since it 
+
+      1. feels lighter than query
+      2. feels heavier than get, as it should - we don't *necessarily*
+         have all indices ready.
+
+- Should you be able to query just subjects, i.e. ignoring objects,
+  having them ``null`` in ``TripleIter`` and not getting duplicates?
+
+  RESOLVED: No-- this is what ``getSubjects()`` etc. is for;
+  working with a ``Set`` is more useful and consistent in these cases 
+  than working with a ``TriplesIter`` (and having one of its elements
+  ``null``, i.e. not really iterating through *triples*, etc.).
+
 
 A flavor of the API
 ===================
@@ -36,8 +82,8 @@
 through a set of triples. I propose the following
 interface::
 
-    for(Triples t = graph.get(_, RDF.type, _); t.loop();) {
-        System.out.println(t.sub+" is instance of "+t.ob);
+    for(TripleIter i = graph.get(_, RDF.type, _); t.loop();) {
+        System.out.println(i.subj+" is instance of "+i.obj);
     }
 
 I.e., have our own iterator-like thing, which iterates
@@ -59,9 +105,9 @@
 when efficiency is at a premium. (Then again, when I print
 to the console inside the loop, efficiency isn't at a
 premium anyway... but whatever...) The *fast* version
-would look like this::
+would look like this [#speed]_::
 
-    for(Triples t = graph.get_A1A(RDF.type); t.loop()) {
+    for(TripleIter t = graph.find_X1X(RDF.type); t.loop();) {
         System.out.println(t.sub+" is instance of "+t.ob);     
     }
 
@@ -70,7 +116,7 @@
 
 In Jython, the loop would look like this::
 
-    t = graph.get(_, RDF.type, _)
+    t = graph.find(_, RDF.type, _)
 
     while t.loop():
         print "<%s> is instance of <%s>" % (t.sub, t.ob)
@@ -84,7 +130,7 @@
 We'll make it a convention that classes using the API
 have this at the top::
 
-    static final _ = null;
+    static final Object _ = null;
 
 You don't have to have this, but it makes things easier to read.
 
@@ -92,8 +138,8 @@
 ``ConstGraph``
 --------------
 
-``ConstGraph`` shall have the following API
-for getting triples::
+The current methods for finding triples shall be removed
+from ``ConstGraph`` and be replaced by the following API::
 
     /** Get an iterator through all triples in the graph
      *  matching a certain pattern.
@@ -102,11 +148,11 @@
      *  If any of the parameters is <code>null</code>,
      *  any node will match it.
      */
-    Triples get(Object subject, Object predicate, Object object);
+    TripleIter find(Object subject, Object predicate, Object object);
 
     // Versions that don't allow wildcards (``null``)
-    Triples get_AA1(Object predicate, Object object);
-    Triples get_1A1(Object subject, Object object);
+    TripleIter find_XX1(Object predicate, Object object);
+    TripleIter find_1X1(Object subject, Object object);
     ...
 
     /** Get the subject of the triple matching a certain pattern.
@@ -116,10 +162,13 @@
      *  any node will match it.
      *  @returns The subject of the triple, if there is one,
      *           or <code>null</code> if there is no such triple.
+     *  @throws  NotUniqueException if there is more than one
+     *           matching triple in the graph.
      */
-    Object getSubject(Object subject, Object predicate, Object object);
+    Object getSubject(Object subject, Object predicate, Object object)
+        throws NotUniqueException;
 
-    Object getSubject_A1A(Object predicate);
+    Object getSubject_X1X(Object predicate) throws NotUniqueException;
     ...
 
 Note: The reason for having ``subject`` as a parameter
@@ -135,45 +184,57 @@
      *  If any of the parameters is <code>null</code>,
      *  any node will match it.
      *  <p>
-     *  The set is immutable; it is <em>not</em> backed
-     *  by the graph (i.e., changing the graph does not
-     *  change the set.)
+     *  The set is backed by the graph (i.e., changing the graph
+     *  changes the set, e.g. if the last triple with a given
+     *  subject is removed from the graph, that subject
+     *  disappears from the set). The set is <em>not</em> modifiable
+     *  (e.g. the <code>add()</code> and <code>remove()</code> methods 
+     *  throw <code>UnsupportedOperationException</code>).
      */
     Set getSubjects(Object subject, Object predicate, Object object);
 
-(Backing is harder to program and I don't see the pay-off,
-since the ``getXXXs`` functions won't be used that often.)
+Backing is generally used in the Collections API, and allows
+for lighter implementations of the method. For example,
+when using ``new TreeSet(graph.getSubjects(_, _, _))`` to get
+a *sorted* set of all subjects in a graph, it would be quite
+wasteful if ``getSubjects()`` created a ``HashSet`` only to have
+it discarded after being used in the constructor of ``TreeSet``.
 
-    Set getSubjects_AA1(Object object);
+    Set getSubjects_XX1(Object object);
     ...
 
     // getObject(), getObjects() similarly
-    // getPredicate(), getPredicates() similarly
+    // getPredicates() similarly
 
-``getPredicate()`` is essentially useless, but we'll have it
-for symmetry. ``getPredicates()`` is useful, mostly for
+``getPredicate()`` is essentially useless, so we don't
+have it. This is symmetric with not having ``setPredicate()``,
+below. (If you need something to the same effect,
+you can use ``find()`` manually.)
+
+``getPredicates()`` is useful, mostly for
 getting *all* predicates used in a graph.
 
-Note that we don't have ``X`` in the function variants
-any more, just ``1`` and ``A``, with ``A`` being equivalent
+Note that we don't have ``A`` in the function variants
+any more, just ``1`` and ``X``, with ``X`` being equivalent
 to passing ``null`` in that position to the generic method.
 
-(E.g., ``getSubjects_AAA()`` is equivalent to
+(E.g., ``getSubjects_XXX()`` is equivalent to
 ``getSubjects(_, _, _)``, returning the set of all subjects
 in the graph.)
 
 
-``Triples``
------------
+``TripleIter``
+--------------
 
-For the API of the iterator-like object, ``Triples``,
+For the API of the iterator-like object, ``TripleIter``,
 see ``swamp_easier_iteration--benja``.
 
 
 ``Graph``
 ---------
 
-For changing graphs, the following API shall be used::
+The current methods for adding, changing and removing triples
+shall be removed from ``Graph`` and replaced by::
 
     /** Add a triple to this graph. */
     void add(Object subject, Object predicate, Object object);
@@ -186,8 +247,8 @@
      */
     void remove(Object subject, Object predicate, Object object);
 
-    void remove_A1A(Object predicate);
-    void remove_1AA(Object subject);
+    void remove_X1X(Object predicate);
+    void remove_1XX(Object subject);
     ...
 
     /** Replace all triples with the given predicate and object
@@ -219,6 +280,33 @@
 I believe this API will be substantially simpler to use 
 than the one we have at the moment, and not lose
 anything w.r.t. speed. In fact, it may speed things up
-in the future, because we can cache the ``Triples`` objects.
+in the future, because we can cache the ``TripleIter`` objects.
+
+\- Benja
+
 
-\- Benja
\ No newline at end of file
+.. [#speed] The speed difference between ``find(_, RDF.type, _)``
+   and ``find_X1X(RDF.type)`` is that ``find()`` has to check
+   for ``null`` in each of the arguments (that's three ``jnz``
+   instructions) and do one method call. (If we can get the compiler
+   to inline the ``find_XXX()`` variants, the method call goes away.)
+   This may actually be fine even in an inner loop. (The
+   hashtable lookups inside the loop will probably not be as cheap!)
+
+   One might think that all fields of ``TripleIter``
+   (``subj``, ``pred``, ``obj``) need to be fetched for each
+   iteration, but that's actually not true: Only those that are
+   different from the previous iteration need to be fetched.
+   (The implementation of the iterator can easily know
+   which those are.)
+
+   The only situation where this makes a speed difference
+   is something like::
+
+       for(TripleIter i = graph.find(_, RDF.type, _); i.loop();) {
+           System.out.println("Has an rdf:type: "+i.subj);
+       }
+
+   where fetching the ``obj`` each time is superfluous.
+   This situation is not expected to be frequent enough
+   to be a problem.
\ No newline at end of file




reply via email to

[Prev in Thread] Current Thread [Next in Thread]