fenfire-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Fenfire-dev] PEG swamp_easier--benja: An easier API for Swamp


From: Benja Fallenstein
Subject: Re: [Fenfire-dev] PEG swamp_easier--benja: An easier API for Swamp
Date: Mon, 22 Sep 2003 12:54:32 +0300
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4

Tuomas Lukka wrote:
On Mon, Sep 22, 2003 at 05:09:44AM +0300, Benja Fallenstein wrote:
   for(Triples t = graph.get(_, RDF.type, _); t.loop();) {
       System.out.println(t.sub+" is instance of "+t.ob);
   }

This is nice.

ISSUE: Name for that call: get(...)? We have find() so far.

Hm. I've always advocated get ;-) ;-)

I've done some googling-- e.g. Aaron Swartz' Python API uses query(...) (with similar semantics). The thing I don't like about find() and query() is mostly psychological: they seem to indicate a little effort, whereas get(...) sounds like something that's essentially free. But that's only a mild objection to find(), not a strong one.

What do you think?

ISSUE: Name for the iterator-like thing that goes through triples.
"Triples" says it contains several triples while it has only one at a time. "TripleIterator", "TripleIter", ...?

I wanted it to be short, of course, but I guess you have a point. ``TripleIter`` should be fine... ::

    for(TripleIter i = graph.get(_, RDF.type, _); i.loop();) {
        System.out.println(i.subj+" instance of "+i.obj);
    }

I still prefer ``Triples``, but I'm willing to settle for ``TripleIter``.

However, to be fair, my code isn't how it would look
when efficiency is at a premium. (Then again, when I print
to the console inside the loop, efficiency isn't at a
premium anyway... but whatever...) The *fast* version
would look like this::

Umm, you should note here that the efficiency difference is in the call,
not in the actual code, as get() can be just a set of if clauses

True.

and actually I think that hotspot might be able to handle it.

I earlier suggested that and you were suspicious of it ;-) I do agree-- it's essentially three ``jnz``s per ``get()``, very cheap. I can say this in the PEG.

However, there's another performance difference with the Triples objects
which you haven't mentioned: *all* members need to be fetched each
time.

Not exactly true: Only the members which change need to be. E.g., if you have ::

    get(_, RDF.type, _)

only the subject and object need to be loaded each time.

And most of the time if you do such a query you would want to use both of them. So it would only cost extra if you do such a query, but do *not* use both subject and object.

Still, can note it in the PEG. -- Or maybe we *should* have::

    Object subj(), pred(), obj();

These are also nicer because they can give error messages when ``next()`` hasn't been called yet. Opinions?

   for(Triples t = graph.get_A1A(RDF.type); t.loop()) {
       System.out.println(t.sub+" is instance of "+t.ob);     
   }

Note: missing a semicolon.

ISSUE: Naming. I'd think find_X1X_Triples would make more sense here.

find...Triples: Any particular reason?

A vs X: Fine, we can use X instead of A everywhere if you prefer.

   static final _ = null;

static final **Object** _ = null; ?

Yes, of course.

   Object getSubject(Object subject, Object predicate, Object object);

   Object getSubject_A1A(Object predicate);
   ...

ISSUE: If there is more than one?

Clarified on IRC: The issue is what happens if there is more than one matching triple.

The current way is to throw NotUniqueException.

There's a problem with that: Basically always when a property has cardinality one, there can still be two nodes in the graph, e.g.::

    x:foo   ex:homeCountry   y:bar .
    x:foo   ex:homeCountry   z:baz .

because ``x:bar`` and ``x:baz`` may represent the same resource. (You cannot require global agreement on the one URI to be used for every particular thing in the world.)

So signalling an error isn't necessarily correct.

Jena returns just an arbitrary one of the matching triples in a similar situation; I'm leaning towards that.

The iterator-like object, ``Triples``, shall have
the following API::

   Object sub, pred, ob;

Issue: Names. subj, pred, obj would be more consistent, i.e.
up to the *end* of the second consonant group.

Yes, but these are also impossible to pronounce... "SUB-djjjj"

Sub, pred, ob are the shortest abbreviations that have a chance to get understood, so they're consistent in a sense, too. ;-) I.e., "su" or "pre" would be misleading/not understood.

The purpose of ``loop()`` is to enable the common loop
pattern, ::

   for(Triples t = graph.get(...); t.loop();) {
       // ...
   }

which would otherwise have to be written as::

   Triples t;
   for(t = graph.get(...); t.hasNext(); t.next()) {
       // ...
   }
   t.free();

This should go into the javadoc.

Sure, but for the PEG I found it easier to read in the body, and the javadoc is in the PEG for clarification of the PEG, no?

The examples should go into the *class*'s javadoc actually, I think.

We don't have ``setPredicate()`` because it is essentially useless
and potentially harmful-- someone using it almost certainly
intended to do something else.

You're not marking exactly what the **diff** to current practice
is here, and why.

Ok, will do.

- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]