gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/benja-diff-fa thesis.rst


From: Benja Fallenstein
Subject: [Gzz-commits] gzz/Documentation/misc/benja-diff-fa thesis.rst
Date: Sun, 09 Feb 2003 17:34:23 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Benja Fallenstein <address@hidden>      03/02/09 17:34:23

Modified files:
        Documentation/misc/benja-diff-fa: thesis.rst 

Log message:
        add refs

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/benja-diff-fa/thesis.rst.diff?tr1=1.4&tr2=1.5&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/benja-diff-fa/thesis.rst
diff -u gzz/Documentation/misc/benja-diff-fa/thesis.rst:1.4 
gzz/Documentation/misc/benja-diff-fa/thesis.rst:1.5
--- gzz/Documentation/misc/benja-diff-fa/thesis.rst:1.4 Sun Feb  9 05:32:33 2003
+++ gzz/Documentation/misc/benja-diff-fa/thesis.rst     Sun Feb  9 17:34:23 2003
@@ -1,20 +1,42 @@
 ========================================================================
 Storm: Supporting data mobility through location-independent identifiers
 ========================================================================
+------------------------------------------------------------------------
+   Facharbeit in Computer Science at the Oberstufen-Kolleg, Bielefeld
+------------------------------------------------------------------------
 
 :Author: Benja Fallenstein
+:Date: February 2003
+:Thesis advisor: Christian Oldiges
 
 
-1. Introduction
-===============
+Introduction
+============
+
+.. 'jyvaskyla' should be with umlauts :-(
+
+This Facharbeit is the product of my work in a research group,
+the Hyperstructure Group at University of Jyvaskyla, Finland,
+I have been engaged in for the past 2 1/2 years. I originally
+came into contact with the Hyperstructure Group through the
+Internet, because I was interested in Gzz [gzz]_, their Free Software
+implementation of Ted Nelson's zzstructure [zigzag-welcome]_. Zzstructure
+is a new paradigm for structured data, allowing for rich
+interconnections between all data stored on a computer, unlike
+XML [xml]_. For example, if a user independently stores
+an address book 
+
+
+Storm
+-----
 
 The Web and many other hypermedia systems assume that identifiers
 either have to include location information (as in URLs, which break 
 when documents are moved), or can only be resolved locally (as in
 link services that can only find links stored on a select set
-of link servers [ref Microcosm, DLS, ...]). Berners-Lee [ref NameMyth - that
-was '96, what does he say now?] argues that 
-unique random identifiers are not globally feasible for this reason.
+of link servers [ref Microcosm, DLS, ...]). Berners-Lee [name-myth]_
+argues that unique random identifiers are not globally feasible 
+for this reason.
 
 However, recent developments in peer-to-peer systems have
 rendered this assumption obsolete. Structured overlay networks
@@ -48,7 +70,7 @@
 
 .. [#] It might be more appropriate to speak about *resources*
    and *references* instead of *documents* and *links*, but
-   in the spirit of [ref kappe95scalable], we stick with
+   in the spirit of [kappe95scalable]_, we stick with
    the simpler terms for explanation purposes.
 
 *Dangling links* are an issue when documents are moved
@@ -56,13 +78,13 @@
 but there is a local copy (e.g. on a laptop or dialup system);
 or when the publisher removes a document permanently,
 but there are still copies (e.g. in a public archive such as
-[ref web.archive.org]). Dangling links are also an issue
+[waybackmachine]_). Dangling links are also an issue
 when a document and a link to it are received independently,
 for example as attachments to independent emails,
 or when a link is sent by mail and the document is available
 from the local intranet. When two people meet e.g. on the train,
 they should be able to form an ad-hoc network and follow links
-to documents stored on either one's computer [ref Thompson et al].
+to documents stored on either one's computer [thompson01coincidence]_.
 
 Advanced hypermedia systems such as Microcosm and Hyper-G
 address dangling links through a notification system [ref]:
@@ -99,7 +121,7 @@
 dealing with versioning and dangling links. Storm is a library
 for storing and retrieving data as *blocks*, immutable
 byte sequences identified by cryptographic content hashes
-[ref ht'02 paper]. Additionally, Storm provides services
+[lukka02guids]_. Additionally, Storm provides services
 for versioned data and Xanalogical storage [ref].
 We address the mobility of documents by block storage
 and versioning, while we use Xanalogical storage
@@ -113,8 +135,8 @@
 hypermedia systems [ref ht01, ht02].
 
 Currently, Storm is partially implemented as a part of the Gzz 
-project [ref], which uses Storm exclusively for all disk storage.
-Gzz is an implementation of Ted Nelson's zzstructure [ref],
+project [gzz]_, which uses Storm exclusively for all disk storage.
+Gzz is an implementation of Ted Nelson's zzstructure [zigzag-welcome]_,
 providing a platform for hypermedia-aware applications.
 The peer-to-peer functionality is in a very early stage and not 
 usable yet.
@@ -158,11 +180,11 @@
     Diffs.c = z1 - (15, 15);
 
 
-2. Related Work
-===============
+Related Work
+============
 
-2.1. Dangling links and alternative versions
---------------------------------------------
+Dangling links and alternative versions
+---------------------------------------
 
 The dangling link problem has received a lot of attention
 in hypermedia research [refs]. As examples, we examine the ways
@@ -188,7 +210,7 @@
 
 In Hyper-G, documents are bound to servers, and a link
 between documents on different servers is stored by both servers
-[kappe95scalable]. This ensures that all links from and to a document
+[kappe95scalable]_. This ensures that all links from and to a document
 can always be found, but requires the cooperation 
 of both parties. Hyper-G employs a scalable protocol
 for notifying servers when a document has been moved or removed.
@@ -219,8 +241,8 @@
 [is there a specific ref for this?].
 
 
-2.3. Peer-to-peer systems
--------------------------
+Peer-to-peer systems
+--------------------
 
 .. thesis-benja: check what needs to be rewritten below
 
@@ -274,22 +296,22 @@
 approaches to the size of values. Consider a file-sharing application:
 If the keys are keywords from the titles of shared files, are the values
 the files-- or the addresses of peers from which the files may be
-downloaded? Iyer et al [ref Squirrel] call the former approach
+downloaded? Iyer et al [iyer02squirrel]_ call the former approach
 a *home-store* and the latter a *directory* scheme (they call the peer
 responsible for a hashtable item its 'home node,' thus 'home-store').
 
 Recently there has been some interest in peer-to-peer hypermedia.
-Thompson and de Roure [ref ht01] examine the discovery
+Thompson and de Roure [thompson01coincidence]_ examine the discovery
 of documents and links available at and relating to
 a user's physical location. An example would be
 a linkbase constructed from links made available by different
-participants of a meeting [thompson00weaving]. 
-Bouvin [ref 02] focuses on the scalability and ease of publishing
+participants of a meeting [thompson00weaving]_. 
+Bouvin [bouvin02open]_ focuses on the scalability and ease of publishing
 in peer-to-peer systems, examining ways in which p2p can serve
 as a basis for Open Hypermedia. Our own work has been 
-in implementing Xanalogical storage [ref 02].
+in implementing Xanalogical storage [lukka02guids]_.
 
-At the Hypertext'02 panel on peer-to-peer hypertext [ref],
+At the Hypertext'02 panel on peer-to-peer hypertext [p2p-hypertext-panel]_,
 there was a lively discussion on whether the probabilistic access
 to documents offered by peers joining and leaving the network
 would be tolerable for hypermedia publishing. For many documents,
@@ -304,12 +326,12 @@
 in the indexing overlay network.
    
 
-3. Block storage
-================
+Block storage
+=============
 
 In Storm, all data is stored
 as *blocks*, byte sequences identified by a SHA-1 
-cryptographic content hash [ref SHA-1]. 
+cryptographic content hash [fips-sha-1]_. 
 Being purely a function of a block's content, block ids
 are completely independent of network location.
 Blocks have a similar granularity
@@ -351,7 +373,7 @@
 the flash crowd problem could be alleviated: The more users
 request a block, the more locations there are to download it from.
 This resembles e.g. the Squirrel
-web cache [ref] [more refs? -Hermanni]; however, downloads can be
+web cache [iyer02squirrel]_; however, downloads can be
 from *any* peer since the source does not need to be trusted.
 On the other hand, there are privacy 
 concerns with exposing one's browser cache to the outside world.
@@ -365,12 +387,6 @@
 when a user wants to incorporate a set of changes, but not
 required at replication time.)
 
-.. On the other hand for instance, several popular 
-   database management systems (e.g. Lotus Notes [ref]) have complex 
-   replication schemes, which may led awkward replication conflicts, 
-   because of they lack the immutable properties of data. 
-   [Or does this belong to diff section ? -Hermanni]
-
 The same namespace is used for local data and data
 retrieved from the network. When an online document has been
 permanently downloaded to the local harddisk, it can be found
@@ -401,7 +417,7 @@
 Even after failure of all of the publisher's mirrors,
 a document may still be available from peers that have
 downloaded it. An archive of published blocks, in the spirit
-of the Web archive [ref], would only be yet another backup;
+of the Web archive [waybackmachine]_, would only be yet another backup;
 normal links to a block would work as long as the archive
 holds a copy. It would also be hard to purposefully remove
 a published document from the network; whether this is
@@ -440,10 +456,10 @@
 to overcome the limitations of traditional file-based applications.
 
 
-3.1. Implementation
--------------------
+Implementation
+--------------
 
-Storm blocks are MIME messages [ref MIME], i.e., objects with
+Storm blocks are MIME messages [borenstein92mime]_, i.e., objects with
 a header and body as used in Internet mail or HTTP.
 This allows them to carry any metadata that can be carried
 in a MIME header, most importantly a content type.
@@ -464,16 +480,16 @@
 
 Many existing peer-to-peer systems could be used to
 find blocks on the network.
-For example, Freenet [ref], recent Gnutella-based clients 
-(e.g. Shareaza [ref]), and Overnet/eDonkey2000 [ref] 
+For example, Freenet [freenet-ieee]_, recent Gnutella-based clients 
+(e.g. Shareaza [shareazaurl]_), and Overnet/eDonkey2000 [ref] 
 also use SHA-1-based identifiers [e.g. ref: magnet uri]. 
 Implementations on top of a DHT could use both the
-directory and the home store approach as defined by [ref Squirrel].
+directory and the home store approach as defined by [iyer02squirrel]_.
 
 Unfortunately, we have not put a p2p-based implementation
 into use yet and can therefore only report on our design.
 Currently, we are working on a prototype implementation
-based on UDP, the GISP distributed hashtable [ref],
+based on UDP, the GISP distributed hashtable [kato02gisp]_,
 and the directory approach (using the DHT to find a peer
 with a copy of the block, then using HTTP to download the block).
 Many practical problems have to be overcome before this
@@ -510,11 +526,11 @@
 the software (mutable documents are described in section 6.1).
 
 
-4. Xanalogical storage
-======================
+Xanalogical storage
+===================
 
-In the xanalogical storage model [ref], 
-pioneered by the unfinished Project Xanadu [ref],
+In the xanalogical storage model [ted-xanalogical-structure-needed]_, 
+pioneered by the unfinished Project Xanadu [ted-xu-tech]_,
 links are not between documents, but individual characters.
 When a character is first typed in, it acquires a permanent id
 ("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
@@ -557,10 +573,10 @@
 being ``text/plain``). To designate a span of characters
 from that session, we use the block's id, the offset of the first
 character, and the number of characters in the span.
-This technique was first introduced in [ref ht02 paper].
+This technique was first introduced in [lukka02guids]_.
 
-In Xanadu, characters are stored to append-only *scrolls*
-when they are typed [ref]. Because of this, in Storm, we call the 
+In Xanadu, characters are written to append-only *scrolls*
+when they are typed. Because of this, in Storm, we call the 
 blocks containing the actual characters *scroll blocks*. The documents
 do not actually contain the characters; instead, they are
 *virtual files* containing span references as described above.
@@ -590,8 +606,8 @@
 will be relatively small (limited by the amount of text
 the user enters between two saves of a document), we hope
 that this will not be a major scalability problem. Otherwise,
-systems that allow range queries, such as skip graphs [ref] 
-and skipnet [ref], may prove useful.
+systems that allow range queries, such as skip graphs [AspnesS2003]_ 
+may prove useful.
 
 One question raised by xanalogical storage is which links to show
 for a popular document that has been linked to by many users.
@@ -605,8 +621,8 @@
 comments of articles etc.
 
 
-5. Application-specific reverse indexing
-========================================
+Application-specific reverse indexing
+=====================================
 
 Finding links to and transclusions of a piece of content in
 Xanalogical storage is but one example of *reverse indexing*
@@ -680,8 +696,8 @@
 occuring in a document.
 
 
-6. Versioning
-=============
+Versioning
+==========
 
 Clearly, for block storage to be useful, there has to be a way to
 efficiently update documents/maintain different versions of documents. 
@@ -703,8 +719,8 @@
 the cryptographic hash in the full version's block id.
 
 
-6.1. Pointers: implementing mutable resources
----------------------------------------------
+Pointers: implementing mutable resources
+----------------------------------------
 
 In Storm, *pointers* are used to implement mutable resources.
 A pointer is a globally unique identifier (usually created randomly)
@@ -766,7 +782,7 @@
 one user or group of users to be able to produce new
 official versions of a given document (an exception may
 be wikis, which are collaboratively edited by anyone
-interested [ref]). It is not yet clear how to do this.
+interested [leuf01wiki]_). It is not yet clear how to do this.
 Signing pointer blocks digitally may be sensible, but
 digital signatures require a public key infrastructure
 and a trusted timestamping mechanism [#]_, which
@@ -805,14 +821,8 @@
 for Web-like publishing. More research is needed in this area.
 
 
-6.2. Diffs: storing alternative versions efficiently
-----------------------------------------------------
-
-[benja says: Please do not touch this section, but tell me
-how to improve it instead. Reason: this is the meat for my thesis,
-due Feb 9th, so I want all possible improvements on it
-to go there, too ;-) [and I'm of course allowed to solicite feedback,
-but not allowed to use stuff written by someone else...]]
+Diffs: storing alternative versions efficiently
+-----------------------------------------------
 
 The pointer system suggests that for each version of a document,
 we store an independent block containing this version. This
@@ -830,12 +840,12 @@
 leading up to the current one would be broken if any
 previous version were deleted. 
 
-Additionally, many versioning systems (e.g. CVS [ref])
+Additionally, many versioning systems (e.g. CVS [cvs]_)
 store the current version as well as the differences,
 enabling them to retrieve the current version quickly and compute
 recent versions by applying the differences 'backwards,'
 starting from the current version. This technique would also be
-made harder by the simplistic scheme above because XXX.
+made harder by the simplistic scheme above.
 
 The root of these difficulties is that we refer to a version
 by the hash of a data record referencing the previous version,
@@ -874,9 +884,6 @@
 to be altered, because it refers to *version* ``C`` and not
 the difference to ``C`` from ``B`` (as in the simplistic scheme).
 
-.. [XXX Figure: diffs from a->b, b->c, c->d; we replace
-   the diffs a->b and b->c by a single diff a->c.]
-
 We can also store the block containing version ``D``
 in addition to storing the versions above. Then, we can reconstruct
 version ``C`` in two ways: By using the diffs from ``A`` to ``B``
@@ -887,8 +894,6 @@
    that can be 'skipped' will have to be much higher
    for this mechanism to be useful.
 
-.. [XXX fig?]
-
 Our current implementation is a layer above Storm block storage
 and indexing. This layer implements a ``load(version-id) -> version``
 interface through the following simplified algorithm:
@@ -959,8 +964,8 @@
 would have to be sent through the network.
 
 
-7. Conclusions
-==============
+Conclusions
+===========
 
 We have presented the Storm design, which makes use of recent advances
 in peer-to-peer technology to support many different forms
@@ -982,7 +987,8 @@
 No work on integrating Storm with current programs (in the spirit of Open
 Hypermedia) has been done so far. It is not clear how far this is possible
 without changing applications substantially, if advantage of our
-implementation of Xanalogical storage is to be taken.  (Vitali [ref] notes
+implementation of Xanalogical storage is to be taken.  
+(Vitali [vitali99versioning]_ notes
 that Xanalogical storage necessiates strong discipline in version tracking,
 which current systems lack.)
 
@@ -998,7 +1004,4 @@
 on structured overlay networks like DHTs.
 
 
-8. References
-=============
-
-XXX
\ No newline at end of file
+.. bibliography:: gzigzag p2p




reply via email to

[Prev in Thread] Current Thread [Next in Thread]