gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] PEG: swamp rdf api, now current


From: Tuomas Lukka
Subject: [Gzz] PEG: swamp rdf api, now current
Date: Mon, 7 Apr 2003 13:54:17 +0300
User-agent: Mutt/1.4.1i

Please comment ASAP.

=============================================================
PEG swamp_rdf_api--tjl: 
=============================================================

:Author:   Tuomas J. Lukka
:Last-Modified: $Date: 2003/04/07 10:53:54 $
:Revision: $Revision: 1.4 $
:Status:   Current

This document outlines the main issues in the Jena api
currently in use and proposes a lightweight api of our own
to replace it.

Issues
======

- Are there any APIs out there that already support our needs?

- Do we want implicit or explicit observers?
  Gzz used explicit observing, due 
  to having an object per cell. However, the tradeoffs are different
  here.

  The benefits of implicit observing are ease and purity of the functional
  approach: in the explicit approach, forgetting a single obs somewhere will 
  make the code buggy in a potentially dangerous way. However, the implicit
  observing requires wrapper objects for all parts of the API.

    RESOLVED: Implicit observing, since we *can* wrap all parts
    of the API without too much cost. If the API is non-object-oriented,
    in the sense that the individual nodes and statements are not tied 
    to any model/space, we only need to wrap O(1) objects.

    Additionally, we can cheaply make "derived" models, 
    e.g. when implementing the model_versions--tjl stuff.

- How should resources and properties be represented?

    RESOLVED: As hashable Objects with '==' comparison semantics.
    The resources are mapped to and from Strings through a 
    global resource name mapper / compressor.

    We need to save memory and e.g. the URN-5 names
    are too many and too long. The compressor would return either
    strings with a prefix markup (i.e. a funny character and an index
    to a prefix table) or an object with an integer (for the number part)
    and an interned string for the shared part.

    For properties and other such resources, interned strings should be 
sufficient.

- How should literals be represented?

    RESOLVED: As immutable Literal objects, with several types of accessors.

- What about literal typing? How do we support enfilades?

    RESOLVED: For now, just get the raw string.

- Do we want explicit Statement objects a la Jena?
    
    RESOLVED: No, they force a certain style of implementation which may not be 
the
    most efficient. We need to minimize the number of Java objects created.
    While an object for every resource and literal is just about unavoidable,
    an object for each statement is not.

- What would be the right characters for the search methods?    

    RESOLVED: 1 for a given object, X for an unknown object. 
    They are visually clearly separate, and X for the unknown is mnemonic.

- Should bags, alts &c be supported explicitly in the API?

    RESOLVED: Not yet. Many issues related e.g. to versioning.

- That's a LOT of methods for all combinations. Couldn't we use wildcards
  or something?

    RESOLVED: No. It would be unnecessary inefficiency to look for them. 
    Remember, this code is *the* inner loop. 

    Quite likely code generation will be used.


Problems with jena
==================

The most important problem with Jena appears to be that it does not
support observation.

With Gzz, we were moving towards a functional style of programming
where we could easily cache the object given by f(node) since
the node could be observed.

Jena makes this impossible because there are no change listeners.
Wrapping or extending Jena to something that would have them would
be a major task which would result in a more complicated API.

Another issue I (personally) have with Jena is that it tries
to be too object-oriented: I first thought (and liked that thought!)
that Statements and nodes were independent of the model. However,
this was not the case.

Efficiency is also important: in order for Fenfire to work properly,
*ALL* searches within memory must be O(1). Jena makes no guarantees,
since its goal is to support different implementations of Model.
For us, the different implementations do not matter so much as raw
efficiency of the memory-based implementation. This is quite different
from most RDF uses, since the usual scenario is that there is not too much
RDF (at least so far).

Design
======

All classes in this API shall be in org.fenfire.swamp.

The resource mapper
-------------------

The global resource mapper (has to be global since resources are model-agnostic)
is simple: The name must be short because it's so widely used.

    public class RMap {
        public static Object toModel(String res);
        public static Object toModel(String res, int offs, int len);
        public static Object toModel(char[] res, int offs, int len);

        public static String toString(Object res);

        /** Append the string version of the resource to the given buffer.
         * In order to avoid creating too many String objects
         * when serializing a space, we
        public static void appendToString(Object res, StringBuffer buf);
    }

The appendToString method solves one problem we had in Gzz:
when saving, too many Strings were created for object names. Similarly, having
the toModel method overloaded with different parameter types allows the most 
efficient
creation of resources without conversions.

We *may* want to make RMap internally redirectable in the future to allow
alternate implementations; the static interface will not change.

The model object
----------------

The ShortRDF class shows what a mess the query functions can easily become.
To avoid this, we'll drop the semantics (subject,predicate,object) for now
and name all methods according to a general scheme.

    public interface ConstFirstOrderModel {
        public Object find1_11X(Object subject, Object predicate);
        public Object find1_X11(Object predicate, Object subject);
        ...
        public Iterator findN_11X(Object subject, Object predicate);
        ...
    }

    public interface FirstOrderModel extends ConstFirstOrderModel {
        public void set1_11X(Object subject, Object predicate, Object object);
        public void set1_X11(Object subject, Object predicate, Object object);
        ...

        public void rm_1XX(Object subject);
        public void rm_11X(Object subject, Object predicate);
        public void rm_X11(Object predicate, Object object);
        ...

        /** Add the given triple to the model.
         */
        public void add(Object subject, Object predicate, Object object);
    }

The functions are built by the following format:
first, the actual function type:

    find1
        Find a *single* triple fitting the given parts and return the part
        marked X. If there is none, null is returned. If there are more than
        one, an exception is thrown.

        Only a single X may be used.
      
    findN
        Return an iterator iterating through the triples fitting the given 
parts,
        and return. Even if there are none, the iterator is created.
        Only a single X may be used.

    set1
        Remove the other occurrences of the matching triples, replace them with 
the given
        new one. For example, if triples (a,b,c) and (a,b,d) and (a,e,d) are in 
the model,
        then after ::

            set1_11X(a, b, g)

        the model will have the triples (a,b,g) and (a,e,d).
        Only a single X may be used (restriction may be lifted in the future).
    
    rm
        Remove the matching triples from the model. Any amount of Xs may be 
used.

and, after an underscore, the parameter scheme:

    1
        Given
    X
        Requested / set

The uniqueness exception
------------------------

For debugging and possibly cool code hacks, the following error gives
enough information to understand what was not unique.

    public class NotUniqueError extends Error {
        public final Object subject;
        public final Object predicate;
        public final Object object;
    }







reply via email to

[Prev in Thread] Current Thread [Next in Thread]