gzz-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz] PEG about simplifying xanalogical text


From: Benja Fallenstein
Subject: [Gzz] PEG about simplifying xanalogical text
Date: Mon, 17 Feb 2003 19:10:41 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9

=======================================================
``xu_text--benja``: Use random ids for Xanalogical text
=======================================================

:Author:        Benja Fallenstein
:Date:          2003-02-17
:Revision:      $Revision: 1.1 $
:Last-Modified: $Date: 2003/02/16 20:08:57 $
:Type:          Architecture
:Scope:         Major
:Status:        Current


This PEG proposes two things: First, use random ids
instead of Storm blocks for xu text; second, make
text enfilades a data type disjoint from "media"
(images, PDF, etc.).

Storm is good for saving space when transcluding "media"
in multiple places by refering to a single block
(this was what it was originally conceived for, after all).
For text, it doesn't work as well-- it doesn't gain much
(text spans may often contain less characters than the
block's id), it costs much (loading lots of text blocks
to assemble all the text), and it makes things complex
(bowdlerization when publishing, having to save the text
block before saving a space to get the text block's id).

Let's drop text blocks.

Second, XML-based formats (as well as e.g. YAML or RDF)
know plain-text strings, but not images and the like embedded
in the strings; they refer to images through URIs
(but not to text, which they include directly).
We would like to enhance XML-based formats with Xanalogical
text and images. For text, it makes sense to have special tags;
for images, it makes sense to refer to them using Storm URIs
using the current URI-based transclusion mechanisms.

Giving Xanalogical identities to the character content in an XML file
seems like a much less intrusive change than using an enfilade format
which can include e.g. audio as well as characters, so that you could
say ``<tag>`` *10000 audio units* ``</tag>``.

The two proposals are in one PEG, because they make more sense
together; if we don't use the same enfilade class for text
and media, it doesn't seem so strange to handle text through
a very different mechanism (random ids rather than Storm blocks).


Changes
=======


Xu text model
-------------

Text spans are (uri, offset, string) triples.

For example, ("urn:urn-5:...", 17, "foo") would be a span
with offset 17 and length 3 in the 'block' ``urn:urn-5:...``,
containing the string "foo". A xanalogical plaintext would simply
be a list of such triples. (Benefit: Looking at the serialization
of one, you could actually see the text it contains.)

The 'string' component is part of the identity. That is,
("urn:urn-5:...", 17, "bar") is a *different* span from the above.
Finding transclusions thus becomes a three-step process:

1. Find spans with the same URI (character-for-character).
2. Among these, find spans whose range overlaps with the searched range.
3. Check that the strings match in the overlapping range.
   (Discard spans that don't match.)

Step 3 is new. Checking strings for equality should be reasonably
fast, much faster than, say, rendering the transclusions found
through this mechanism.

Finding links is similar, since it is defined in terms of
finding transclusions.


Interface
---------

I'd say, let's have enfilades only for text, for now. (We may
want them for video and audio later, or maybe we won't, but we
only need them for text right now.) To refer to images or PDFs,
refer to the Storm URI (once it's registered).

If we move to RDF, we could transclude a PDF as follows:
Create a node to represent the image; refer to the block
through a 'load-from' property (the block is a RDF node, by virtue
of having a URI); also give the page number(s) and coordinates
you want to transclude, as other properties, if you don't
want to transclude the whole block.

Then, we don't need a special index for finding the transclusions
of the PDF inside our space. We can find all transclusions
from a Storm block by going to the block's node and following
the 'load-from' property backwards.

\- Benja





reply via email to

[Prev in Thread] Current Thread [Next in Thread]