fenfire-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fenfire-dev] PEG "Reference URIs: Storm blocks with metadata"


From: Benja Fallenstein
Subject: [Fenfire-dev] PEG "Reference URIs: Storm blocks with metadata"
Date: Mon, 18 Aug 2003 23:52:12 +0200

Someone please commit this to Storm as ``ref_uris--benja``. -b

==========================================
Reference URIs: Storm blocks with metadata
==========================================

:Author:  Benja Fallenstein
:Created: 2003-08-18
:Changed: $Date$
:Status:  Current
:Scope:   Major
:Type:    Architecture


Storm blocks are very simple, just a sequence of octets paired with a
MIME media type. There are good reasons for this design, explained
elsewhere.

However, sometimes you want to include more information with a document,
for example the author; a title; copyright information; the creation date;
the natural language; licensing information (e.g. copyleft or micropayment)
that can be automatically processed by your computer; other files that
should be downloaded if you download this one, e.g. images used on a
Web page; for audio data, the artist and album; for an image,
where it was taken, what is shown on it, and an alternative description
for the blind. The list goes on.

Also, you may want to create something more complex than a document
represented by a single octet stream. For example, a Web page may be
available in different languages, and an image may be available as both
``image/png`` and ``image/svg+xml``.

This PEG proposes an extensible architecture that allows for all this.


Issues
======

.. None yet.


Reference URIs
==============

So far, we have used one type of URI in Storm: Block URIs,
preliminarily of the form::

    urn:x-storm:block:<media-type>,<hash>

This PEG introduces a new kind of URI, for now of the form::

    urn:x-storm:ref:<hash>

where ``<hash>`` is the hash of a Storm block in Sub-RDF/XML format
(see `sub_rdf_xml--benja`__). This block contains *metadata about*
the resource identified by the ``ref`` URI.

__ ../sub_rdf_xml--benja/peg.gen.html

It follows that the metadata graph itself has a URI in Storm, namely::

    urn:x-storm:block:application/rdf+xml,<hash>

This graph defines authoritative metadata for the ``ref`` URI.


How to make statements about the ``ref`` URI
============================================

The metadata RDF graph contains a triple of the following form::

    <>  storm:refDefines  _:foo

i.e., with the empty URI as the subject; with ``storm:refDefines`` as
the property (the ``storm`` namespace is defined in this PEG, below),
and with a blank (aka anonymous) node as the object.

The empty URI is a *relative* URI which identifies "this document."
Actually, RDF graphs do not contain relative URIs, only serializations
of RDF graphs do; the actual triple in the graph is, ::

    <urn:x-storm:block:application/rdf+xml,<hash>> storm:refDefines _:foo

What this triple *means* is: The subject (an RDF graph) is a metadata
graph that defines, through the mechanism outlined in this PEG,
a resource, called ``_:foo`` in the graph. I.e., ``_:foo`` is the same
resource as ``urn:x-storm:ref:<hash>``; the problem is that we cannot
write the latter in the Sub-RDF/XML graph, because the hash of that graph
is used in it; a block cannot include its own hash (without breaking
the hash function...).

So whenever we want to make a statement about ``urn:x-storm:ref:<hash>``,
instead we make a statement about ``_:foo``. Storm knows that the two
nodes represent the same resource.

For example, we can state something like::

    _:foo  dc:author   <http://example.org/~alice>
    _:foo  cc:license  <http://www.gnu.org/licenses/gpl.html>

(``cc`` is Creative Commons, and ``dc`` is Dublin Core.)


Documents
=========

Now we have a way to store arbitrary metadata *about* our document--
but how do we tell Storm what the *content* of our document is?

For this, we use a special RDF propery, ``repr:instance``::

    _:foo  repr:instance  <urn:x-storm:block:<type2>,<hash2>>

This triple tells Storm that when the user requests ``_:foo``
(i.e., the resource denoted by the ``ref`` URI), then
Storm can serve ``urn:x-storm:block:<type>,<hash2>``.

The ``repr`` is for "representation."

In the Web architecture, there are *resources*, denoted by URIs;
for example, "The home page of Amazon, Inc.," or "An image of
Sandro Hawke's dog, Taiko."

These resources can have multiple *representations*, octet streams
with media types and other metadata. For example, the home page
can have versions in English and French; the image can be available
in JPEG or PNG.

A triple with property ``repr:instance`` says that the subject
is some sort of "document"-- both the home page and the image are
documents, but the city of Hameln or the Fenfire project are not--
and that the object is one representation of this document.

Or, maybe more precisely, as the object is also a resource, not a
representation: The subject is some sort of document, and
all representations of the object are also representations of the subject.

The object may be a Storm URI or any other kind of URI; a Storm
implementation is not obligated to support anything else but
Storm URIs, though. (In fact, it might warn the user when a ``ref`` URI
is used to refer to e.g. an HTTP page.)


Alternative representations
===========================

A document may have multiple, alternative representations::

    _:foo   repr:instance   <urn:x-storm:block:<type1>,<hash1>>
    _:foo   repr:instance   <urn:x-storm:block:<type2>,<hash2>>

A Storm implementation can then serve either of these as the document.

Additional triples can be used to describe these representations further::

    <urn:x-storm:block:<type1>,<hash1>>  mime:mimeType  "image/png"
    <urn:x-storm:block:<type1>,<hash1>>  img:height "100"
    <urn:x-storm:block:<type1>,<hash1>>  img:width  "200"

    <urn:x-storm:block:<type2>,<hash2>>  mime:mimeType  "image/png"
    <urn:x-storm:block:<type2>,<hash2>>  img:height "500"
    <urn:x-storm:block:<type2>,<hash2>>  img:width  "1000"

    <urn:x-storm:block:<type3>,<hash3>>  mime:mimeType  "image/svg"

Given this, a Storm implementation which understands the ``img``
and ``mime`` properties could pick either the low or the high resolution
version of the image, or the scaleable SVG version, if supported
by the client.

An HTTP gateway can use this kind of information to perform
content negotiation, selecting one of the alternative versions
depending on the client's ``Language`` and ``Accept`` headers.


Abstract concepts
=================

While ``block`` URIs always identify an octet stream with a media type,
a ``ref`` URI can be used to identify dogs, cars, houses, an RDF class
or the theory of relativity: *Anything*.

Of course you can also use ``urn-5`` for that, but sometimes it is useful
to be able to get some authoritative information about a resource--
the ability for a human to put a URI into a browser and get documentation
about what it identifies, and the ability for a machine to resolve a URI
and get some machine-readable information about it. For example, the
``ref`` block for an RDF class could include a human-readable label
for the class as well as its superclasses, and refer to some human-readable
documentation.

(Fenfire could then, when the class is used in some graph, download
its authoritative description and use the human-readable label from that
description to show the class.)

In order to be able to put an abstract concept ``ref`` URI in a browser
and have it resolve to some documentation about the concept, we have
to associate it with a representation. For this, we do not use
``repr:instance``, because a description of a concept is not an
*instance*, a *version* of that concept. Instead, we use ::

    _:foo   repr:description   <urn:x-storm:block:<type>,<hash>>

In general, there should only be one ``repr:description`` associated
with a resource, although the implementation should treat
``repr:description`` the same as ``repr:instance``. If the description
needs to be available in different languages or something like that,
it should have a ``ref`` URI itself.

This is because on the Web, important resources should have their own
URIs so that you can link to them and make statements about them--
you want to be able to make statements about both the theory of relativity
and the Web page that describes this concept.


Vocabulary defined in this PEG
==============================

This PEG defines the following URIs:

http://purl.oclc.org/NET/storm/vocab/ref-uri/refDefines
    A property. The subject of triples with this property is a
    resource that has as (one of) its representation(s) an RDF graph
    serialized in RDF/XML. The object of the triple is the resource
    identified by the ``ref`` URI that has as its ``<hash>`` part
    the hash of the RDF/XML serialization of the subject.

    In practice, this simply means that the subject is a
    Storm block with media type ``application/rdf+xml``, and the
    object is the ``ref`` URI with the same hash.

http://purl.oclc.org/NET/storm/vocab/representations/representation
    A property. The subject is any resource, and the object is a
    representation of that resource; or more precisely, all representations
    of the object are also representations of the subject.

    If included in the authoritative metadata about the subject, a
    URI resolver that understands this property shall consider
    the object of this property as one possible document that can be
    served as a representation of the subject.

    In particular, when a ``ref`` URI is e.g. entered into a browser,
    a URI resolver shall look at the ``ref`` block for triples
    of the form::

        _:foo  <...representation>  _:bar

    where _:foo is the resource represented by the ``ref`` URI.

    The objects in these triples (``_:bar``) are the possible
    representations of the resource (``_:foo``).


http://purl.oclc.org/NET/storm/vocab/representations/instance
    A property. Both the subject and the object are some kind of
    "document," something which can be serialized to bits and bytes.
    The object is some kind of specialization of the subject.

    For example, the subject might be "The Bible," and the object
    might be "The Bible, King James' Version," which is more specific.
    Or, the subject may be "An image of Sandro Hawke's dog Taiko,"
    and the object may be a PNG or JPEG version of that image.

    This is a sub-property of ``...representation``. A URI resolver shall
    treat a triple with this property like a triple with property
    ``...representation``.

http://purl.oclc.org/NET/storm/vocab/representations/description
    A property. The subject is any resource; the object is
    some kind of "document" which describes the subject.

    For example, the subject may be an RDF class, and the object
    may be a Web page describing how this class is used.

    This is a sub-property of ``...representation``. A URI resolver shall
    treat a triple with this property like a triple with property
    ``...representation``.

No other properties besides the three above shall be treated
the same as ``...representation``, even if some graph states that
they are a subproperty of ``...representation``. This is to make
resolution of ``ref`` URIs easier.


What this PEG does not define
=============================

This PEG doesn't define any "standard" properties for use inside a
``ref`` graph, besides the four used above. Other PEGs may define
properties to specify e.g. the languages or media types of representations,
and dictate resolver behavior in the presence of these properties,
for example honoring the ``Language`` header in HTTP requests.
However, this is left for future specifications.

\- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]