gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] Simple Storm again


From: Benja Fallenstein
Subject: [Gzz] Simple Storm again
Date: Wed, 02 Apr 2003 00:56:20 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030327 Debian/1.3-4


Hi all,

please have a look at the 'simple storm' PEG again. If nobody's opposed, I'd like to start using this soon, in order to get on with Storm.

Thanks,
- Benja

===========================================================
``simple_storm--benja``: Simplify Storm by dropping headers
===========================================================

:Author:        Benja Fallenstein
:Date:          2003-02-16
:Revision:      $Revision: 1.1 $
:Last-Modified: $Date: 2003/03/31 09:25:01 $
:Type:          Architecture
:Scope:         Major
:Status:        Current


Storm is quite complex with its MIME headers, and prone to become
more complex if we choose to separate hashing of headers and bodies
(``raw_blocks--benja``). If we break backward compatibility
a single time, as Tuomas suggests, we should take the opportunity to
get rid of our mistakes from the past, in order to make
the future simpler.

By analogy with the ``data`` URL scheme [RFC2397], this PEG
proposes a URN namespace to be registered whose URIs would
contain a MIME type and the content hash of a block of data.
"data" URLs contain a MIME type and a sequence of bytes,
either literally or encoded as base64. The analogy runs deep;
"data" URLs are a MIME type plus an immutable byte sequence,
and so are URIs in this URN namespace. The MIME type is included
with "data" URLs because it is considered the one absolutely
essential piece of metadata necessary to interpret
the byte sequence; for this URN namespace, the same thing holds.


Issues
======

- Won't dropping headers make it harder to include metadata?

   RESOLVED: MIME headers are a non-extensible form of metadata
   anyway; if we allow ``X-`` headers, we have problems with
   permanence. We can still put metadata into another block
   refering to this one; alternatively, many file formats
   allow inclusion of metadata in the file itself (e.g. PNG).

   Content types are now included in the block id (different
   content type -> different block).

   The benefits outweigh the problems by far.

- How about metadata that would be included in an HTTP
  response, such as alternative representations of a
  resource (different languages etc.)? How about Creative
  Commons licenses? Wouldn't it be better to have an
  RDF "header" block containing this data?

   RESOLVED: The idea about alternative representations
   is that a single "header" block would refer to
   different "body" blocks, each of which could be used.
   However, it is also necessary to be able to refer
   to each of these representations by itself; if we
   don't want to have an *additional* header block
   for each of these representations, we still need
   something like this proposal to refer to the
   individual alternatives.

   While it would be nice if a CC or other license would
   travel with every block in a computer-readable format,
   this is not by itself enough reason to require
   header blocks, making for a much more complex system
   and separating namespaces in the Storm world.

   I suggest that we may look at the header issue again
   related to pointers. Instead of having pointer URIs,
   we might have 'reference' URIs which give the hash
   of a metadata block used to retrieve the actual data.
   This metadata block could be used to implement the HTTP
   features as well as pointers and CC licenses.

   I think that the best route for now is to have these
   two layers-- the simple, but in itself useful method
   of identification by mime type plus content hash;
   and, to be built, the more complex and potentially
   extensible system of refering to metadata that can
   point to the actual data in more complex ways
   (or simply include additional metadata like a CC license).

- What about the hash tree vulnerabilities mentioned in
  <http://zgp.org/pipermail/p2p-hackers/2002-November/000993.html> /
  <http://zgp.org/pipermail/p2p-hackers/2002-November/000998.html>?

   RESOLVED: They've settled on a new convention, prepending a
   zero byte to tree leaves and a one to tree branches
   (concatenated hashes of tree leaves) before hashing.
   Their software is being updated; there's a Java implementation.
   We'll be using that (and we'll fully specify it when
   writing the informal URN namespace registration).

- Why bitzi bitprint? What is it? Why not SHA-1?

   RESOLVED: Bitprints are a combination of a SHA-1 hash with a
   Merkle hash tree based on the Tiger hash algorithm.
   Hash algorithms get broken; when one of the above
   is broken, you have a transitional period before
   the other is, too, in which you can e.g. sign blocks,
   ensuring you can still use them when the other
   is broken too.

   Having a hash tree allows you to download pieces
   of a block from different sources, verifying each
   piece individually. This can be of great help
   in speeding up download times.

- Are bitprints too long for short blocks like ours?
  (How long are the IDs going to be and whether
  this will be a problem.)

   RESOLVED: Here's an example URI, 102 characters long:

     urn:urn-?:application/rdf+xml,QLFYWY2RI5WZCTEP6MJKR
     5CAFGP7FQ5X.VEKXTRSJPTZJLY2IKG5FQ2TCXK26SECFPP4DX7I

   This is long, but IMO not 'too long.'

- Why this syntax? Why not another?

   RESOLVED: For similarity to ``data`` URLs.


Changes
=======

Storm blocks do not have headers any more; the hash in their URN
is only of the body. Storm URNs have the following form:

    <namespace>:block:[<mediatype>],<data>

``<namespace>`` is an informal URN namespace to be registered,
like ``urn:urn-5``. ``<bitprint>`` is a Bitzi bitprint as defined
by <http://bitzi.com/developer/bitprint>. ``<mediatype>`` is
the token defined in [RFC2397]--

    mediatype  := [ type "/" subtype ] *( ";" parameter )
    parameter  := attribute "=" value

"where [...] 'type', 'subtype', 'attribute' and 'value' are
the corresponding tokens from [RFC2045], represented using
URL escaped encoding of [RFC2396] as necessary" [RFC2397].
(Escaping is necessary when a character isn't in the set
of allowed URN characters.)

"X-" types aren't allowed, as they work against the persistence
of Storm blocks; ``application/octet-stream`` or similar
must be used instead.

Unlike in [RFC2397], if no ``<mediatype>`` is given,
``application/octet-stream`` is assumed (not ``text/plain``).

There is a public domain Java implementation of bitprints at
<http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/bitcollider/jbitprint/>.
Bitprints may be registered as a URN namespace in the future,
according to Bitzi. However, they will not include a
content type.

\- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]