[Gzz-commits] manuscripts/storm SCRATCH article.rst

gzz-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] manuscripts/storm SCRATCH article.rst

From:	Benja Fallenstein
Subject:	[Gzz-commits] manuscripts/storm SCRATCH article.rst
Date:	Fri, 24 Jan 2003 09:23:51 -0500
CVSROOT:        /cvsroot/gzz
Module name:    manuscripts
Changes by:     Benja Fallenstein <address@hidden>      03/01/24 09:23:50

Modified files:
        storm          : SCRATCH article.rst 

Log message:
        Move all text to SCRATCH for now, chapter structure

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/SCRATCH.diff?tr1=1.2&tr2=1.3&r1=text&r2=text
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.32&tr2=1.33&r1=text&r2=text

Patches:
Index: manuscripts/storm/SCRATCH
diff -u manuscripts/storm/SCRATCH:1.2 manuscripts/storm/SCRATCH:1.3
--- manuscripts/storm/SCRATCH:1.2       Tue Jan 21 05:09:43 2003
+++ manuscripts/storm/SCRATCH   Fri Jan 24 09:23:50 2003
@@ -1,3 +1,475 @@
+
+Many hypermedia systems place each document and link in the custody
+of one server, rendering it unusable when connectivity fails.
+For example, on the web, when connection to a server fails,
+links to documents on this server can generally not be followed.
+Open Hypermedia Systems such as Microcosm [ref] store links
+in *linkbases* on specific servers. In the Xanadu 88.1 design [ref],
+documents and links can be cached, but still have a 'home' server
+on which a 'master' copy is located. [XXX look more into it, correct xu terms]
+
+Such a system does not lend itself well to a world
+where servers fail, where clients are not 'always on,' and
+where documents regularly move from computer to computer in the form
+of email attachments. In a system where documents and links
+belong to one server, in a disconnected state documents
+cannot be retrieved and links cannot be followed.
+[XXX this isn't precise-- it's not true in this form]
+When documents are forwarded per email or downloaded from the Web today,
+links generally break.
+
+We propose a system implementing Xanalogical storage [ref Ted]
+based on global, location-independent identifiers, where
+document contents are named by cryptographical hashes [ref GUID paper].
+Assuming an index of all documents and links
+on the local computer and all systems that can be reached through
+the currently available network connections. Since no centralized
+scheme can scale to such an index of "everything," keeping one
+for the public Internet requires a decentralized,
+self-organizing distributed system (the IPTPS'03 definition
+of "peer-to-peer" [ref]).
+
+We propose to use this same system for publication on the 'net
+and storage on a local computer. This means that
+when a local document is published or a public document
+is saved locally, it retains its identifier; as long as a
+document is also accessible through the network, keeping
+a local copy is merely an efficiency measure (keeping
+a permanent cache). Obviously, in such scheme, as long as link
+remains accessible, it never breaks just because the documents
+it refers to are moved to a different location.
+
+The Squirrel system [ref] is a peer-to-peer network of shared
+browser caches, where a web page can be retrieved from
+any computer in a local network that has a copy in its cache.
+Our distributed index for document and link retrieval would
+have this as a natural side-effect, except
+that it would also extend any data that has been permanently
+downloaded on one of the network's computers.
+
+A system as we propose would necessiate deep changes in applications.
+As Vitali [ref Versioning hypermedia] notes, any implementation
+of Xanalogical storage necessiates this, as "[n]o approximate, 
+good-enough solutions" for the management of global identifiers
+"can conceivably be considered acceptable in this case." [#]_
+As such, we are cannot meet the minimum definition of an
+open hypermedia system, as given by Davis et al [ref]:
+We do impose storage of markup on applications and we cannot
+generally use data created by tools that are not aware of our system.
+
+...
+
+
+
+.. [#] We have built a system for editing Xanalogical text
+   in a non-aware editor such as ``emacs``, attempting to
+   determine the user's changes through structure matching [ref].
+   This seems to work reasonably well for insertions,
+   removals and rearrangements, but is hopeless if the user
+   uses copy&paste between arbitrary documents.
+
+
+
+
+
+
+
+Stuff not yet located in the article :-)
+========================================
+
+(remove-at-will: There are two kinds of mobility related to hypermedia use:)
+
+In today's networked world, data moves freely between computers:
+Text is copied from one document to another, documents
+are moved between folders,
+copied from one computer to another, sent by email,
+independently modified on two computers simultaneously,
+published on a server, moved to a different server, downloaded
+by a client. Yet, every time content is moved, links and references to it break
+in popular hypermedia systems.
+
+Also, computers are used by people who are increasingly mobile 
+[psych mobility research -paper], on global range.
+There are even prospects of interplanetary use of digital communications
+[cerf:internetplantary_internet]. This sets challenges for the "freedom"
+(referring to 1.) of data movement, as there are limits to the reach and
+performance of the networks (incl-from:antont->ohs-talk/disconnected?).
+Problems include disconnections (due to e.g. breakages), ...
+
+Thirdly, the amount of internetworked computers will increase rapidly, as
+new kinds of devices are being connected via different channels to the
+unifying Internet, creating heavy masses of usage. When pieces of content
+are tied to a location, a lot of traffick (tiivistyy) near a single point
+of
+failure. Therefore the common URL addressing on the Web may fail ...
+OTOH, yet: server downtimes etc.
+
+Research has been done in a number of areas to alleviate this problem.
+There are a number of proposals for sharing a web browser's cache
+between users [squirrel].
+... 
+Thompton and De Roure [ref ht'01] propose a peer-to-peer system
+for discovering cached web resources in a mobile, disconnected setting,
+available on the client or other systems connected to it through
+ad-hoc local networks (e.g., wireless).
+
+...
+
+We aim for a solution that will support mobility for all of a user's data
+(everything they'd store in their personal directory). Any pieces of text,
+any document and any collection of documents should be easy to move
+to a different computer, and after modifying the data on both systems,
+it should be easy to bring the two copies back in sync.
+
+Another motivation (right?) for the data sharing / data mobility is
+collaboration. Within an organization or a project group, there often is a
+shared file system, e.g. a file server in the local network, so that
+different people do not need their personal copies of the data but can work
+(synchronously) on the same items. Often, however, especially when
+crossing organizationary boundaries there is no access each-others'
+filesystems even though data is shared. In actual collaboration, where
+several individuals work on the same items -- possibly at the sametime --
+the data effectually forks. So similarly(?) to the the situation where a
+particular user has data on different computers, data needs to be kept in
+sync when there is collaboration. [cvs, (perforce, ..)]
+
+Our system allows documents and document content to be freely copied
+without breaking links. As long as a link and the documents
+it links are currently accessible, the link can be shown. 
+We archieve this by assigning documents and contents permanent,
+location-independent identifiers, and keeping an efficient (hemppah: should 
+we emphasize a *distributed* index ?) index of all data by its identity.
+
+----
+
+This type of system does not lend itself well to a world 
+where servers fail and clients are not permanently 'on.'
+Hypermedia functionality ought to be a service at the 
+operating system level, usable for organizing all data
+a user stores on their system [ref]. It is of course possible
+for a user to run an own, personal linkbase on their client system,
+
+In an ideal world, when users move documents between computers,
+links would not break, ...,
+different versions of documents could easily be reconciled,
+(file structure would not be lost). We envision a global identifier space,
+where links are created between global identifiers, and whenever
+any two endpoints of a link are known, this link can be shown.
+
+
+
+
+
+
+Xanalogical storage
+===================
+
+In the xanalogical storage model [cite], pioneered by Project Xanadu [cite],
+links are not between documents, but individual characters.
+When a character is first typed in, it acquires a permanent ID
+("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
+which it retains when copied to a different document, distinguishing
+it from all similar characters typed in independently.
+A link is shown between any two documents containing the characters
+that the link connects. Xanalogical links are external and bidirectional.
+
+In addition to content links, xanalogical storage keeps an index of
+transclusions: identical characters copied into different documents.
+Through this mechanism, the system can show to the user all documents
+that share text with the current document.
+
+
+
+
+
+Idea/Plan
+=========
+
+[Notes for the authors, not part of the final document
+though text may be moved from here to there.]
+
+Whenever a document moves on the current web, links to it break, 
+be it from an author's computer to a public server,
+from one server to another, from the server to a client,
+or from one personal computer to another. We subsume
+these forms of movement under the term 'data mobility.'
+
+
+Storm goals/benefits:
+
+- Reliability
+  - Append-and-delete-only
+  - The same data can be stored in many locations,
+    allowing it to be easily reconstructed after failure
+  - Versioning: Old versions remain accessible
+- Xanalogical storage
+- If a document is accessible, references to it work
+- Links do not break
+- Easy syncing:
+  - Just copy a bunch of blocks
+  - Documents can be synced & merged
+  - Inter-document structures can be synced & merged
+  - Syncing can be done without merging immediately,
+    leaving two alternative versions current
+    (so e.g. an automated process is entirely possible,
+    even when there are conflicts)
+- Versioning
+
+
+Grouped differently,
+
+- Reliability (as above)
+- Usablility in the face of intermittent connectivity 
+  (includes syncing, finding a document if available...)
+- Xanalogical structure 
+  (includes versioning, non-breaking links etc.)
+
+Storm limitations/weaknesses:
+
+- what, actually?
+
+antont ponders: for files storm is ok, but how about:
+- irc? (latency?)
+- video? (throughput)
+
+and:
+.. multipoint live video? (both latency and throughput demands)
+
+* does it make sense to think of irc messages, and/or video frames, as
+datablocks .. or what?
+
+  
+hemppah's comment on syncing term:
+I'd prefer term 'replication' instead of term syncing, when
+updating data to 'the most recent state'. E.g. Lotus Notes uses
+term replication, when one performs locally made updates into
+a centralized server --> 'used within same system'. Syncing term, however, 
+is used when importing/exporting e.g. Nokia Communicator calendar data 
into/from 
+Lotus Notes calendar --> 'used between different systems'.
+
+
+hemppah: worth to mention is that Ray Ozzie is a man behind Lotus Notes and 
Groove; 
+Lotus Notes is based on client-server model and, Groove is based on p2p model 
--> 
+possible direction etc. ?
+
+hemppah: I think we should mention that in Gzz one refer to data in 
non-hierarchial 
+way, where as in Notes (and other systems also, references!!), we must use 
+hierarchial way. In Notes most important IDs are:
+1) every document has a unique identifier, which is unique among all replicas 
+of database
+2) every document/design element has a identifier, named as noteID, which is 
unique 
+in database, but not among all replicas of database 
+3) every view has a unique identifier,  which is unique among all replicas of 
+database
+4) every database has a replica ID, which identifies database's replicas 
+among all databases
+
+So, if we want to refer to a document, we use format:
+
+replicaID/viewID/documentID
+
+Also, we can refer to same document, through *many different* views 
(analogical to Gzz's dimensions ?):
+notes://<server>/replicaID/viewID1/documentID
+notes://<server>/replicaID/viewID2/documentID
+
+Here's a real example:
+Notes://server/D235632D00313587/38D22BF5E8F088348525JK7500129B2C/REWB3FDE0D53807B67C2256CB50026FCVV
+
+For information about IDs in Notes:
+http://www-12.lotus.com/ldd/doc/tools/c/4.5/api45ug.nsf/85255d56004d2bfd85255b1800631684/00d000c1005800c985255e0e00726863?OpenDocument
+
+In Gzz, however, we don't know the location, we know only the *identity* of 
data what we are looking for, as follows:
+
+urn-5:FAB3FDE0hgfD5kkjj3807B67C2256CfsdB50026FC51 
+
+Above is not a *correct* urn-5, but very similar to last part of notes' syntax.
+
+<<<<<<< article.rst
+benja's reply:
+Hm. Replication to me means, the same data is kept on multiple
+machines. This is not what we are talking about here: We're talking
+about *different versions* of the same data being kept
+on multiple machines, and occasionally being 'brought into sync'
+with each other. If I send you a draft article and you comment on it,
+and I make changes too, and later I merge the two divergent
+versions back together, 'syncing' seems approximately right,
+but 'replication' seems completely wrong to me.
+
+=======
+(Of course, this is very similar to 'normal' URLs, but our purpose here is to 
give an example
+of how one refer to a particular data item in colloboration-like tool, like 
Notes)
+>>>>>>> 1.28
+
+In Notes, there are servers, which maintain replication of data, opposite to 
Gzz. What is 
+interesting in Notes' replication, is the fact that the replication of 
database not only 
+replicate the *data* but also design of data, which represent the data.
+Worth to mention is also that, even if the data and the design of data (logic 
etc.) is 
+in the *same* (physical) structure, data and design of data is very loosely 
coupled with 
+each other.
+
+Additionally, we should emphatize that how things are going towards 
non-hierarchical reference models, 
+for instance, Notes(hierarchical) and Gzz(non-hierarchical), which both are 
based on the same 
+xanalogical model.
+
+"Usability in the face of intermittent connectivity" is
+more than just mobile applications: It is also copying data
+from one computer to another, where the two computers'
+file systems are not kept in sync through a permanent
+network connection. Hmm, maybe "Usability in the face
+of irregular synchronization" or some such would
+make it clearer?
+
+Ok, let's split that in two:
+
+- Usability in the face of intermittent connectivity
+  (we cannot access data stored on the internet)
+- Usability in the face of non-synchronization
+  (we can have two independent versions of something
+  on two unconnected computers and we can easily
+  synchronize the two versions when desired)
+  
+
+Thus we have four goals which we must express in the article.
+
+...
+
+so: if there is a permanent network connection, are there reasons for
+using this? (i.e. being out of sync in the first place)
+
+one argument is that there is no such thing as "permanent connection" (in
+a way e.g. a hard disk failure can be thought of as a connection breakdown
+to the data on the drive ?)
+
+but of course synchronization might be the way to approach it, just that
+the irregularity is/may_be caused by the, well, intermittent connectivity?
+
+should the goals be derived from the use cases? or perhaps better looked in
+their light ("niiden valossa") -- take the case of e-mail attachments:
+
+Think of a use case here: email attachments. Between two computers that 
+are permanently on the 'net, you *could* replace email attachments by 
+"simply" setting up a shared file system between the sender and the 
+receiver of the attachment. Then, if the receiver made a modification, 
+the sender could even see it immediately. Yet nobody seems to be 
+proposing this. The overhead is one thing. Why 
+set up and maintain a shared file system for every file you send to 
+somebody? Privacy is another. You don't want the sender being able to 
+keep track what the receiver does with the document.
+
+... but don't some senders want to be able to track what the receivers do?
+(and thinking of the web: how do you get a "weblog" of a p2p published data?)
+
+And if you don't set up such a file system, but just send an email 
+attachment, you've got intermittent synchronization between two 
+permanently connected computers.
+
+Another question is, should e-mail attachments be used to share data.
+
+It may be natural when you want one specific person  (or small group of 
people) to look at a document.
+...
+> rsync is an intermittent synchronization solution :-) (it's not about
+> shared file systems, but synchronizing two copies of a file system   
+> intermittently).
+
+from:erno
+"firewalls" should be in end systems. ipsec was originally meant to
+be run in transport mode and provide end to end security, to
+work nicely with the ip philosophy.
+| >  > i prefer URI:s.
+| i.e. with today's technology, i prefer to put documents on the web or some
+| (other) filesystem within the recipients' reach.
+aren't these two orthogonal? in one case you get a copy of the
+document, in another case you get a (weak) reference to the document.
+i like the "disconnected operation" and loose caching semantics.
+another thing about filesystems, it's not a very precise concept...
+there are several components/aspects.
+(replace "is" with "can be" according to taste:)
+ * it's a namespace
+ * it's a wire protocol for conducting operations on an object
+ * it's an access control system
+ * it's a way for people to collaborate
+ * it's a locking protocol
+ * it's a tracking facility (access_log)
+  ....
+> but i still want to have the option of really making sure
+> i have a local copy of a document sitting on my disk, instead
+> of in a local cache that will be flushed before long.
+sure. and i want to have an option, that if any of the four-five machines
+at the studio suffer data loss, work would not be lost, but there's
+backups (also for situations when a machine is off-line etc.), within the
+limited resources of course.
+>   -- erno
+~Toni
+> what will people do when they have 3200GB disks and realise
+> they only have useful date to fill 10% of that but would
+> like more reliability?
+OTOH: already in the "grey economy" shares of increasing size are required
+to participate in the network, i.e. you get the movies you want if you
+provide lots of what other's need. and if you don't have any, you're out.
+but this is something gzz want's to avoid, too.
+
+
+
+References:
+
+- CFS (The Chord project's Cooperative File System)
+- Coda ("an advanced network filesystem", http://www.coda.cs.cmu.edu/)
+- The Internet Backplane Protocol
+- Issue about nomadicity, Communications of the ACM (Sep 2001)
+  - Note: Support for nomadicity includes scaling of resource
+    usage (e.g. bandwidth availability), which we're
+    less concerned with here.
+  - Future work needed: Currently, the Storm interfaces
+    do not provide information about underlying network conditions,
+    thus we can't e.g. show to the user what blocks
+    are available w/o network lookup and which aren't.
+- Delay-tolerant networks (http://dtnsig.org/)
+- OceanStore
+- Lifestreams, because they are a project believing
+  that the network [probably, a server] should hold
+  the users' data, not the terminal they access it through
+  - Possible reference here: IMAP?
+- The IPTPS'03 call for papers' definition of peer-to-peer
+- P2P Working group: definition of peer-to-peer
+- Hypermedia by coincidence, Thompton et al (HT'01)
+- Freenet, Free Haven et al
+- The pointer problem: CFS, OceanStore
+- A commercial p2p-based collaboration tool: Groove 
(http://erwin.dstc.edu.au/Herring/GrooveAnalysis-SCI2001.pdf)
+- Hypermedia implementations with non-breaking links: Microcosm, Hyper-G
+- Open Hypermedia, data and link servers? (links not in documents!)
+       * term in the hypermedia engineering -book: link management
+- (but not structural computing. hypertext functionality? not really.)
+- Persistent storage: the first 13years (dig bookmark from tolp42:galeon)
+- primitive sync stuff: rsync (ssync?), SyncML?
+- about ibm corporate p2p over http:
+ http://www.almaden.ibm.com/cs/people/bayardo/userv/
+ http://www.almaden.ibm.com/cs/people/bayardo/userv/userv.html
+
+P. Druschel and A. Rowstron. PAST: A largescale, persistent peer-to-peer
+storage utility. In IEEE, editor, Eighth IEEE Workshop on Hot Topics in
+Operating Systems (HotOS-VIII). May 20-23, 2001, Schloss Elmau, Germany,
+pages 75-80, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA,
+2001. IEEE Computer Society Press.
+
+- Squirrel: a decentralized peer-to-peer web cache
+- Feasibility of a Serverless Distributed File System Deployed on an Existing 
Set of Desktop PCs
+- Distributed File Systems: Concepts and Example ('de facto article on DFS')
+- Lotus Notes: 
ftp://ftp.lotus.com/pub/lotusweb/product/notes/G325-2061-00_j.pdf
+- Open Hypermedia in a Peer-to-Peer Context
+- Peer-to-Peer Hypertext
+- Publius and Tangler publishing systems
+- Mnet (ancestor: Mojo Nation), mnet.sourceforge.net
+- Open problems in Data-Sharing Peer-to-Peer Systems
+- Semantic Overlay Networks for P2P systems (a p2p 'version' of Xanalogical 
transclusions)
+- Farsite Project (similar to CFS, PAST and Oceanstore)
+
+There was a good paper and demo about synchronous collaboration in ht01, but
+is that out of scope here? (see pondering on limitations)
+
+
+Pasted from IRC (Finnish)
+=========================
+
 <hemppah:#gzz> antont: meidän pitää muuten antaa konkreettisia esimerkkejä
 +artikkelissame colloborationista
 >#gzz> hemppah: toki. use caseja aattelin
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.32 manuscripts/storm/article.rst:1.33
--- manuscripts/storm/article.rst:1.32  Wed Jan 22 16:26:49 2003
+++ manuscripts/storm/article.rst       Fri Jan 24 09:23:50 2003
@@ -2,472 +2,33 @@
 Don't fence me in: Supporting data mobility through the Gzz/Storm design
 ========================================================================
 
-Introduction
-============
+1. Introduction
+===============
 
-Many hypermedia systems place each document and link in the custody
-of one server, rendering it unusable when connectivity fails.
-For example, on the web, when connection to a server fails,
-links to documents on this server can generally not be followed.
-Open Hypermedia Systems such as Microcosm [ref] store links
-in *linkbases* on specific servers. In the Xanadu 88.1 design [ref],
-documents and links can be cached, but still have a 'home' server
-on which a 'master' copy is located. [XXX look more into it, correct xu terms]
+2. Block storage
+================
 
-Such a system does not lend itself well to a world
-where servers fail, where clients are not 'always on,' and
-where documents regularly move from computer to computer in the form
-of email attachments. In a system where documents and links
-belong to one server, in a disconnected state documents
-cannot be retrieved and links cannot be followed.
-[XXX this isn't precise-- it's not true in this form]
-When documents are forwarded per email or downloaded from the Web today,
-links generally break.
+3. Xanalogical storage
+======================
 
-We propose a system implementing Xanalogical storage [ref Ted]
-based on global, location-independent identifiers, where
-document contents are named by cryptographical hashes [ref GUID paper].
-Assuming an index of all documents and links
-on the local computer and all systems that can be reached through
-the currently available network connections. Since no centralized
-scheme can scale to such an index of "everything," keeping one
-for the public Internet requires a decentralized,
-self-organizing distributed system (the IPTPS'03 definition
-of "peer-to-peer" [ref]).
+4. Indexing
+===========
 
-We propose to use this same system for publication on the 'net
-and storage on a local computer. This means that
-when a local document is published or a public document
-is saved locally, it retains its identifier; as long as a
-document is also accessible through the network, keeping
-a local copy is merely an efficiency measure (keeping
-a permanent cache). Obviously, in such scheme, as long as link
-remains accessible, it never breaks just because the documents
-it refers to are moved to a different location.
+5. Versioning
+=============
 
-The Squirrel system [ref] is a peer-to-peer network of shared
-browser caches, where a web page can be retrieved from
-any computer in a local network that has a copy in its cache.
-Our distributed index for document and link retrieval would
-have this as a natural side-effect, except
-that it would also extend any data that has been permanently
-downloaded on one of the network's computers.
+5.1. Pointers
+-------------
 
-A system as we propose would necessiate deep changes in applications.
-As Vitali [ref Versioning hypermedia] notes, any implementation
-of Xanalogical storage necessiates this, as "[n]o approximate, 
-good-enough solutions" for the management of global identifiers
-"can conceivably be considered acceptable in this case." [#]_
-As such, we are cannot meet the minimum definition of an
-open hypermedia system, as given by Davis et al [ref]:
-We do impose storage of markup on applications and we cannot
-generally use data created by tools that are not aware of our system.
+5.2. Diffs
+----------
 
-...
+6. Peer-to-peer implementations
+===============================
 
+7. Experience and future directions
+===================================
 
+8. Conclusions
+==============
 
-.. [#] We have built a system for editing Xanalogical text
-   in a non-aware editor such as ``emacs``, attempting to
-   determine the user's changes through structure matching [ref].
-   This seems to work reasonably well for insertions,
-   removals and rearrangements, but is hopeless if the user
-   uses copy&paste between arbitrary documents.
-
-
-
-
-
-
-
-Stuff not yet located in the article :-)
-========================================
-
-(remove-at-will: There are two kinds of mobility related to hypermedia use:)
-
-In today's networked world, data moves freely between computers:
-Text is copied from one document to another, documents
-are moved between folders,
-copied from one computer to another, sent by email,
-independently modified on two computers simultaneously,
-published on a server, moved to a different server, downloaded
-by a client. Yet, every time content is moved, links and references to it break
-in popular hypermedia systems.
-
-Also, computers are used by people who are increasingly mobile 
-[psych mobility research -paper], on global range.
-There are even prospects of interplanetary use of digital communications
-[cerf:internetplantary_internet]. This sets challenges for the "freedom"
-(referring to 1.) of data movement, as there are limits to the reach and
-performance of the networks (incl-from:antont->ohs-talk/disconnected?).
-Problems include disconnections (due to e.g. breakages), ...
-
-Thirdly, the amount of internetworked computers will increase rapidly, as
-new kinds of devices are being connected via different channels to the
-unifying Internet, creating heavy masses of usage. When pieces of content
-are tied to a location, a lot of traffick (tiivistyy) near a single point
-of
-failure. Therefore the common URL addressing on the Web may fail ...
-OTOH, yet: server downtimes etc.
-
-Research has been done in a number of areas to alleviate this problem.
-There are a number of proposals for sharing a web browser's cache
-between users [squirrel].
-... 
-Thompton and De Roure [ref ht'01] propose a peer-to-peer system
-for discovering cached web resources in a mobile, disconnected setting,
-available on the client or other systems connected to it through
-ad-hoc local networks (e.g., wireless).
-
-...
-
-We aim for a solution that will support mobility for all of a user's data
-(everything they'd store in their personal directory). Any pieces of text,
-any document and any collection of documents should be easy to move
-to a different computer, and after modifying the data on both systems,
-it should be easy to bring the two copies back in sync.
-
-Another motivation (right?) for the data sharing / data mobility is
-collaboration. Within an organization or a project group, there often is a
-shared file system, e.g. a file server in the local network, so that
-different people do not need their personal copies of the data but can work
-(synchronously) on the same items. Often, however, especially when
-crossing organizationary boundaries there is no access each-others'
-filesystems even though data is shared. In actual collaboration, where
-several individuals work on the same items -- possibly at the sametime --
-the data effectually forks. So similarly(?) to the the situation where a
-particular user has data on different computers, data needs to be kept in
-sync when there is collaboration. [cvs, (perforce, ..)]
-
-Our system allows documents and document content to be freely copied
-without breaking links. As long as a link and the documents
-it links are currently accessible, the link can be shown. 
-We archieve this by assigning documents and contents permanent,
-location-independent identifiers, and keeping an efficient (hemppah: should 
-we emphasize a *distributed* index ?) index of all data by its identity.
-
-----
-
-This type of system does not lend itself well to a world 
-where servers fail and clients are not permanently 'on.'
-Hypermedia functionality ought to be a service at the 
-operating system level, usable for organizing all data
-a user stores on their system [ref]. It is of course possible
-for a user to run an own, personal linkbase on their client system,
-
-In an ideal world, when users move documents between computers,
-links would not break, ...,
-different versions of documents could easily be reconciled,
-(file structure would not be lost). We envision a global identifier space,
-where links are created between global identifiers, and whenever
-any two endpoints of a link are known, this link can be shown.
-
-
-
-
-
-
-Xanalogical storage
-===================
-
-In the xanalogical storage model [cite], pioneered by Project Xanadu [cite],
-links are not between documents, but individual characters.
-When a character is first typed in, it acquires a permanent ID
-("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
-which it retains when copied to a different document, distinguishing
-it from all similar characters typed in independently.
-A link is shown between any two documents containing the characters
-that the link connects. Xanalogical links are external and bidirectional.
-
-In addition to content links, xanalogical storage keeps an index of
-transclusions: identical characters copied into different documents.
-Through this mechanism, the system can show to the user all documents
-that share text with the current document.
-
-
-
-
-
-Idea/Plan
-=========
-
-[Notes for the authors, not part of the final document
-though text may be moved from here to there.]
-
-Whenever a document moves on the current web, links to it break, 
-be it from an author's computer to a public server,
-from one server to another, from the server to a client,
-or from one personal computer to another. We subsume
-these forms of movement under the term 'data mobility.'
-
-
-Storm goals/benefits:
-
-- Reliability
-  - Append-and-delete-only
-  - The same data can be stored in many locations,
-    allowing it to be easily reconstructed after failure
-  - Versioning: Old versions remain accessible
-- Xanalogical storage
-- If a document is accessible, references to it work
-- Links do not break
-- Easy syncing:
-  - Just copy a bunch of blocks
-  - Documents can be synced & merged
-  - Inter-document structures can be synced & merged
-  - Syncing can be done without merging immediately,
-    leaving two alternative versions current
-    (so e.g. an automated process is entirely possible,
-    even when there are conflicts)
-- Versioning
-
-
-Grouped differently,
-
-- Reliability (as above)
-- Usablility in the face of intermittent connectivity 
-  (includes syncing, finding a document if available...)
-- Xanalogical structure 
-  (includes versioning, non-breaking links etc.)
-
-Storm limitations/weaknesses:
-
-- what, actually?
-
-antont ponders: for files storm is ok, but how about:
-- irc? (latency?)
-- video? (throughput)
-
-and:
-.. multipoint live video? (both latency and throughput demands)
-
-* does it make sense to think of irc messages, and/or video frames, as
-datablocks .. or what?
-
-  
-hemppah's comment on syncing term:
-I'd prefer term 'replication' instead of term syncing, when
-updating data to 'the most recent state'. E.g. Lotus Notes uses
-term replication, when one performs locally made updates into
-a centralized server --> 'used within same system'. Syncing term, however, 
-is used when importing/exporting e.g. Nokia Communicator calendar data 
into/from 
-Lotus Notes calendar --> 'used between different systems'.
-
-
-hemppah: worth to mention is that Ray Ozzie is a man behind Lotus Notes and 
Groove; 
-Lotus Notes is based on client-server model and, Groove is based on p2p model 
--> 
-possible direction etc. ?
-
-hemppah: I think we should mention that in Gzz one refer to data in 
non-hierarchial 
-way, where as in Notes (and other systems also, references!!), we must use 
-hierarchial way. In Notes most important IDs are:
-1) every document has a unique identifier, which is unique among all replicas 
-of database
-2) every document/design element has a identifier, named as noteID, which is 
unique 
-in database, but not among all replicas of database 
-3) every view has a unique identifier,  which is unique among all replicas of 
-database
-4) every database has a replica ID, which identifies database's replicas 
-among all databases
-
-So, if we want to refer to a document, we use format:
-
-replicaID/viewID/documentID
-
-Also, we can refer to same document, through *many different* views 
(analogical to Gzz's dimensions ?):
-notes://<server>/replicaID/viewID1/documentID
-notes://<server>/replicaID/viewID2/documentID
-
-Here's a real example:
-Notes://server/D235632D00313587/38D22BF5E8F088348525JK7500129B2C/REWB3FDE0D53807B67C2256CB50026FCVV
-
-For information about IDs in Notes:
-http://www-12.lotus.com/ldd/doc/tools/c/4.5/api45ug.nsf/85255d56004d2bfd85255b1800631684/00d000c1005800c985255e0e00726863?OpenDocument
-
-In Gzz, however, we don't know the location, we know only the *identity* of 
data what we are looking for, as follows:
-
-urn-5:FAB3FDE0hgfD5kkjj3807B67C2256CfsdB50026FC51 
-
-Above is not a *correct* urn-5, but very similar to last part of notes' syntax.
-
-<<<<<<< article.rst
-benja's reply:
-Hm. Replication to me means, the same data is kept on multiple
-machines. This is not what we are talking about here: We're talking
-about *different versions* of the same data being kept
-on multiple machines, and occasionally being 'brought into sync'
-with each other. If I send you a draft article and you comment on it,
-and I make changes too, and later I merge the two divergent
-versions back together, 'syncing' seems approximately right,
-but 'replication' seems completely wrong to me.
-
-=======
-(Of course, this is very similar to 'normal' URLs, but our purpose here is to 
give an example
-of how one refer to a particular data item in colloboration-like tool, like 
Notes)
->>>>>>> 1.28
-
-In Notes, there are servers, which maintain replication of data, opposite to 
Gzz. What is 
-interesting in Notes' replication, is the fact that the replication of 
database not only 
-replicate the *data* but also design of data, which represent the data.
-Worth to mention is also that, even if the data and the design of data (logic 
etc.) is 
-in the *same* (physical) structure, data and design of data is very loosely 
coupled with 
-each other.
-
-Additionally, we should emphatize that how things are going towards 
non-hierarchical reference models, 
-for instance, Notes(hierarchical) and Gzz(non-hierarchical), which both are 
based on the same 
-xanalogical model.
-
-"Usability in the face of intermittent connectivity" is
-more than just mobile applications: It is also copying data
-from one computer to another, where the two computers'
-file systems are not kept in sync through a permanent
-network connection. Hmm, maybe "Usability in the face
-of irregular synchronization" or some such would
-make it clearer?
-
-Ok, let's split that in two:
-
-- Usability in the face of intermittent connectivity
-  (we cannot access data stored on the internet)
-- Usability in the face of non-synchronization
-  (we can have two independent versions of something
-  on two unconnected computers and we can easily
-  synchronize the two versions when desired)
-  
-
-Thus we have four goals which we must express in the article.
-
-...
-
-so: if there is a permanent network connection, are there reasons for
-using this? (i.e. being out of sync in the first place)
-
-one argument is that there is no such thing as "permanent connection" (in
-a way e.g. a hard disk failure can be thought of as a connection breakdown
-to the data on the drive ?)
-
-but of course synchronization might be the way to approach it, just that
-the irregularity is/may_be caused by the, well, intermittent connectivity?
-
-should the goals be derived from the use cases? or perhaps better looked in
-their light ("niiden valossa") -- take the case of e-mail attachments:
-
-Think of a use case here: email attachments. Between two computers that 
-are permanently on the 'net, you *could* replace email attachments by 
-"simply" setting up a shared file system between the sender and the 
-receiver of the attachment. Then, if the receiver made a modification, 
-the sender could even see it immediately. Yet nobody seems to be 
-proposing this. The overhead is one thing. Why 
-set up and maintain a shared file system for every file you send to 
-somebody? Privacy is another. You don't want the sender being able to 
-keep track what the receiver does with the document.
-
-... but don't some senders want to be able to track what the receivers do?
-(and thinking of the web: how do you get a "weblog" of a p2p published data?)
-
-And if you don't set up such a file system, but just send an email 
-attachment, you've got intermittent synchronization between two 
-permanently connected computers.
-
-Another question is, should e-mail attachments be used to share data.
-
-It may be natural when you want one specific person  (or small group of 
people) to look at a document.
-...
-> rsync is an intermittent synchronization solution :-) (it's not about
-> shared file systems, but synchronizing two copies of a file system   
-> intermittently).
-
-from:erno
-"firewalls" should be in end systems. ipsec was originally meant to
-be run in transport mode and provide end to end security, to
-work nicely with the ip philosophy.
-| >  > i prefer URI:s.
-| i.e. with today's technology, i prefer to put documents on the web or some
-| (other) filesystem within the recipients' reach.
-aren't these two orthogonal? in one case you get a copy of the
-document, in another case you get a (weak) reference to the document.
-i like the "disconnected operation" and loose caching semantics.
-another thing about filesystems, it's not a very precise concept...
-there are several components/aspects.
-(replace "is" with "can be" according to taste:)
- * it's a namespace
- * it's a wire protocol for conducting operations on an object
- * it's an access control system
- * it's a way for people to collaborate
- * it's a locking protocol
- * it's a tracking facility (access_log)
-  ....
-> but i still want to have the option of really making sure
-> i have a local copy of a document sitting on my disk, instead
-> of in a local cache that will be flushed before long.
-sure. and i want to have an option, that if any of the four-five machines
-at the studio suffer data loss, work would not be lost, but there's
-backups (also for situations when a machine is off-line etc.), within the
-limited resources of course.
->   -- erno
-~Toni
-> what will people do when they have 3200GB disks and realise
-> they only have useful date to fill 10% of that but would
-> like more reliability?
-OTOH: already in the "grey economy" shares of increasing size are required
-to participate in the network, i.e. you get the movies you want if you
-provide lots of what other's need. and if you don't have any, you're out.
-but this is something gzz want's to avoid, too.
-
-
-
-References:
-
-- CFS (The Chord project's Cooperative File System)
-- Coda ("an advanced network filesystem", http://www.coda.cs.cmu.edu/)
-- The Internet Backplane Protocol
-- Issue about nomadicity, Communications of the ACM (Sep 2001)
-  - Note: Support for nomadicity includes scaling of resource
-    usage (e.g. bandwidth availability), which we're
-    less concerned with here.
-  - Future work needed: Currently, the Storm interfaces
-    do not provide information about underlying network conditions,
-    thus we can't e.g. show to the user what blocks
-    are available w/o network lookup and which aren't.
-- Delay-tolerant networks (http://dtnsig.org/)
-- OceanStore
-- Lifestreams, because they are a project believing
-  that the network [probably, a server] should hold
-  the users' data, not the terminal they access it through
-  - Possible reference here: IMAP?
-- The IPTPS'03 call for papers' definition of peer-to-peer
-- P2P Working group: definition of peer-to-peer
-- Hypermedia by coincidence, Thompton et al (HT'01)
-- Freenet, Free Haven et al
-- The pointer problem: CFS, OceanStore
-- A commercial p2p-based collaboration tool: Groove 
(http://erwin.dstc.edu.au/Herring/GrooveAnalysis-SCI2001.pdf)
-- Hypermedia implementations with non-breaking links: Microcosm, Hyper-G
-- Open Hypermedia, data and link servers? (links not in documents!)
-       * term in the hypermedia engineering -book: link management
-- (but not structural computing. hypertext functionality? not really.)
-- Persistent storage: the first 13years (dig bookmark from tolp42:galeon)
-- primitive sync stuff: rsync (ssync?), SyncML?
-- about ibm corporate p2p over http:
- http://www.almaden.ibm.com/cs/people/bayardo/userv/
- http://www.almaden.ibm.com/cs/people/bayardo/userv/userv.html
-
-P. Druschel and A. Rowstron. PAST: A largescale, persistent peer-to-peer
-storage utility. In IEEE, editor, Eighth IEEE Workshop on Hot Topics in
-Operating Systems (HotOS-VIII). May 20-23, 2001, Schloss Elmau, Germany,
-pages 75-80, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA,
-2001. IEEE Computer Society Press.
-
-- Squirrel: a decentralized peer-to-peer web cache
-- Feasibility of a Serverless Distributed File System Deployed on an Existing 
Set of Desktop PCs
-- Distributed File Systems: Concepts and Example ('de facto article on DFS')
-- Lotus Notes: 
ftp://ftp.lotus.com/pub/lotusweb/product/notes/G325-2061-00_j.pdf
-- Open Hypermedia in a Peer-to-Peer Context
-- Peer-to-Peer Hypertext
-- Publius and Tangler publishing systems
-- Mnet (ancestor: Mojo Nation), mnet.sourceforge.net
-- Open problems in Data-Sharing Peer-to-Peer Systems
-- Semantic Overlay Networks for P2P systems (a p2p 'version' of Xanalogical 
transclusions)
-- Farsite Project (similar to CFS, PAST and Oceanstore)
-
-There was a good paper and demo about synchronous collaboration in ht01, but
-is that out of scope here? (see pondering on limitations)
[Prev in Thread]
Current Thread
[Next in Thread]
[Gzz-commits] manuscripts/storm SCRATCH article.rst, Benja Fallenstein <=
- [Gzz-commits] manuscripts/storm SCRATCH article.rst, Benja Fallenstein, 2003/01/26
Prev by Date: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...
Next by Date: [Gzz-commits] gzz/gfx/demo fpfont.py
Previous by thread: [Gzz-commits] gzz/gfx/demo fpfont.py
Next by thread: [Gzz-commits] manuscripts/storm SCRATCH article.rst
Index(es):
- Date
- Thread