dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DotGNU]Implement RDF in a Universal Data Structure


From: Seth Johnson
Subject: Re: [DotGNU]Implement RDF in a Universal Data Structure
Date: Sat, 29 Mar 2003 11:28:06 -0500

Peter Minten wrote:
> 
> Seth Johnson wrote:
> >
> > RDF benefits from the generalization that knowledge can be represented by
> > assertions of a common, universal form.  All assertions have a subject, a
> > predicate and an object.  This generalization enables the creation of tools
> > that draw inferences from knowledge represented in this common form.
> >
> > I want to take this to a higher level.  Instead of stopping at the assertion
> > structure of RDF, I propose that we implement a similarly universal
> > generalization about relations (among data entities).  This generalization
> > reflects the purpose of representing what the most fundamental universal
> > data structure is, which supports all the information access and retrieval
> > functions necessary for any application (and any web service).
> >
> > RDF represents relations as a series of statements.  It's really a form of
> > many-to-many key table.  We can put RDF into this universal data structure,
> > along with anything else we please, because putting things into the
> > universal data structure makes interoperability implicit and automatic.
> >
> > The universal data structure represents all relations as Use Types that are
> > related to Link Types, each of which may be particularized into specific
> > Uses and Links.  A particular Use of a certain Use Type represents the
> > parent record of a relation, and the particular Links of a certain Link Type
> > represent the children records related to that parent record:
> >
> >    Use Type: Shopping Cart
> >    Link Type: Products Selected
> >    Use: Seth's Cart
> >    Links: Soap, Shampoo, Milk, Butter, etc.
> >
> > (This is only a piece of the structure, but it represents the core
> > generalization that all the rest stems from)
> >
> > Once you have a universal data structure, you can define a fundamental
> > protocol that says everything you need to know about any application, and
> > you can store everything for all applications in one universal structure
> > that inherently lets all elements in any particular such type of relation,
> > be used freely in any other such type of relation.
> >
> > I refer to this fundamental relation as a "context."  It can also be
> > referred to as an atomic application.  A context is an extended version of
> > the traditional idea of relations among data entities, turning that concept
> > into the core of the idea of what an atomic universal application is
> > necessarily made up of and must be able to do.  More complex applications
> > are simply made by combining such atomic contexts.
> >
> > RDF can be stored in this data structure as follows:
> >
> >    Use Type: Subject
> >    Link Type: (Various predicates, like "has" "contains," etc.)
> >    Use: Whatever particular subject
> >    Links: Whatever particular "objects" asserted to relate in the link type
> > way to the particular subject.
> >
> >    Use Type: Subject
> >    LinkType: Has
> >    Use: Seth Johnson
> >    Links: arms, legs, a receding forehead
> >
> > What you can do with this is generalize about the universal functions that
> > must be built into such a representation of a universal, atomic
> > application.  This includes the query functions that the RDF area focuses
> > on.
> >
> > Build this into DotGNU.  Make a language that speaks in terms of these
> > abstractions.  I call the language CCL, or Context Control Language, and I
> > call the basic structure of a context "packet" or "message" CTP, or Context
> > Transfer Protocol.  CTP can either be defined as something immediately above
> > TCP and immediately below the application layer, in a binary way, or we
> > could define it as something correlative with HTTP, in a more textual way.
> >
> > There's more to it, but maybe this ramble will interest some of you . . .
> 
> (a good idea deserves a long answer)
> 
> Hmm, it's an interesting idea. The interesting here is that instead of RDF
> properties that are often translated to predicates you use links. On the
> technical level there is no real difference though. RDF properties can easily
> express all kinds of links in the world.


Sure; you can do RDF in CTP.  You can do anything at all in CTP.

CTP is based on a universal data structure that is geared toward giving you
all the functionality you need for any application automatically, simply by
declaring a context (a use type related to a link type).

A CTP context can be outlined, queried, track dependencies, handle
distribution, receives values for attributes, prompts you for categories. 
It's an automatic application, on the data structure side.  The whole task
of developing becomes basically creating front ends for different contexts,
plus importing data into the system, where you'll leave it because it
handles everything for you.


> In the infosphere (the world of info, cyberspace is a subset of it) everything
> can be expressed using subject-link-object terminology. So everything can be
> expressed as RDF. Or more exactly everything can be expressed as a web. I 
> think
> this is a very cool idea, especially since it allows us to store all possible
> information in a web.


RDF is knowledge-focused.  It is the product of a very fruitful insight. 
But the idea of generalizing about universal functionality has been hampered
by the assumptions that state and data models are arbitrarily complex.  RDF
proposes to model knowledge as universal assertions, so others can make
diverse applications out of that.  In CTP, there's really one universal
application, or collections of such universal applications.  You can write
special front ends to look at the information in any way you please, but
since everything is in the same data structure, you really can browse any
application with any other application's front end.


> 
> RDF is not sufficient however for all purposes though due problems with it's
> representation, it's usually written down as XML or triples that lack
> flexibility.


Yes, it makes RDF assertions the universal entity.  The best notion so far,
short of CTP.


> A more serious problem is the need to name everything that you want
> to refer to with an unique uri. IMHO it should also be possible to refer to
> something by indirection, I mean to refer to the value of a property of a
> resource instead of directly to a resource. This will also serve to store more
> meaningful info than in RDF. It's often more interesting to know to get to
> something (using a link path) than where it is.


I think that giving unique keys to pieces of information is a Good Thing.  I
think those who express misgivings about this aspect of RDF are only
expressing certain limitations in current thinking.

For instance, the "indirection" idea you mention here, which by structure I
take it you see as expressing a problem with the RDF generalization, really
expresses an inadequacy in the idea of resource.  If you don't think in
terms of getting and reading resources for subsequent parsing and processing
(in a way that shows a sort of document-centricity reflected by the design
of the web), but in terms of slicing and dicing information in various
contexts, then you appreciate the addressability.  But RDF doesn't
completely make the case the way CTP does.


> 
> With link path I mean a path starting at a certain resource that goes to the
> target resource. Link paths will often start at 'this' resource. Link paths
> allow flexibility, if you have something 10 resources on the path away from 
> you
> and something changes at step 4 then you have a good chance that the resource
> you're hardlinked to is not the right one anymore, but that the resource the
> path refers to is.


This comes from the limitations of RDF.  You're having to describe the route
to information through a series of intermediary queries.  And you're doing
this just to work with information to support an esoteric application.  In
CTP, you link directly to the source of the information, and the
intermediary queries aren't really necessary.  You'd just store them insofar
as you need to refer to the inference chains.


> 
> Problem with link paths is of course that they tend to split at collections
> where multiple routes are possible, but I think that's a solvable problem.


Well, if the element has a universal key value, and you don't depend on
intermediate queries, that problem goes away.


> 
> The moral: we need to develop a good link path protocol. Here is a start:
> 
> A link path is a dot path like in OO languages, it starts with a resource, all
> following parts are properties. The resource name 'this' refers to the subject
> of the property which object is the link path. An example:
> 'this.contact:author.foaf:lastname = "Minten"'


CTP does something like this, except while you can name the elements with
things like "contact" and "lastname," you can also refer to them by the type
of element you're talking about, use attributes or link attributes in this
case.


> 
> Link paths can only contain URI's in pointy brackets (< >).
> 
> Link paths can contain requirements for properties by enclosing it in 
> accolades,
> link paths can contain assumptions about the value of a property by putting it
> behind a double @ in the accolades. For example:
> 
> '<http://dotgnu.org/people>{lastname="Minten"@@http://dotgnu.org/people/mdupont}.mailbox'
> 
> The example is slightly misleading. It recommends trying the condition 
> (property
> lastname of a resource in the collection is "Minten") on mdupont first, but
> since that will fail ("DuPont" != "Minten") it will try out the condition on
> every resource in the collection. I could have added extra conditions by 
> putting
> 'lastname="Minten"' between parenthesises, putting & behind it and adding
> another condition between parenthesises after it.
> 
> Link paths can become quite large and full of conditions, however they're the
> only way I see to safely keep links between things far away from eachother.
> 
> Link paths should not be treated as URI's by the RDF server, but as link paths
> (the type designator 'p' in the GNU.RDF store design is hereby reserved for 
> link
> paths).
> 
> --
> 
> About the CTP. I'm thinking of a fast binary protocol here. To ensure the 
> speed
> of the semantic web GNU.RDF.QL and it's older brothers are out of the question
> as the main protocol between semantic web servers. The protocol should however
> support link paths as they will be one of the principle ways to travel around
> the semantic web. The protocol should also be as small as reasonably possible,
> but it should not need to be decompressed.
> 
> If we do things right it should be possible to store things in RDF files that
> use the fast binary notation instead of the rather cumbersome XML one. Or yet
> even better a fast system RDF database in the kernel. Anyone for kernel module
> hacking? :-)


Well, I'm not qualified, but I highly encourage the binary (and kernel)
level approach, for certain political reasons beyond the speed advantage.


Seth Johnson



-- 

DRM is Theft!  We are the Stakeholders!

New Yorkers for Fair Use
http://www.nyfairuse.org

[CC] Counter-copyright: http://cyber.law.harvard.edu/cc/cc.html

I reserve no rights restricting copying, modification or distribution of
this incidentally recorded communication.  Original authorship should be
attributed reasonably, but only so far as such an expectation might hold for
usual practice in ordinary social discourse to which one holds no claim of
exclusive rights.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]