lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Lynx vs XML?


From: Christopher R. Maden
Subject: Re: LYNX-DEV Lynx vs XML?
Date: Wed, 12 Mar 1997 14:58:54 GMT

[Larry Virden]
> I am just starting to find serious articles discussion the next
> generation markup language, XML, around in various public journals.
> I was wondering if anyone on this mailing list has begun thinking
> about the potential impacts this may have on lynx.

I've been spending a good portion of my waking hours thinking about
it.  I've been heavily involved in the development of XML, and Lynx
has been near the top of my thoughts with it.

Unfortunately, I'm not sure how Lynx will be able to cope.  The HTML
parser is buried so deep in error recovery routines and treeless
parsing that I don't think it's adaptable.  I don't think I understand
the Lynx code thoroughly enough to make a suggestion with complete
confidence, but here goes.  I believe that right now, Lynx turns the
text/html MIME type into an internal signal to process HTML using its
various routines for doing so.  I think what should be done is add
another internal signal for text/xml or application/xml.  That parser
should be a straight-ahead tree-based parser.

No one is going to be using XML for human-readable documents until the
stylesheet specification is done around the end of this year, so we
have some time.  Eventually, here's what Lynx will need to do:

a)  Parse the XML declaration:
    <?XML version="1.0" encoding="iso8859-1" rmd="INTERNAL"?>

b1) If rmd="none", recognize and skip the doctype declaration, if
    present.
b2) If rmd="internal", parse the doctype declaration for entity
    declarations.
b3) If rmd="all", parse the doctype declaration for internal entity
    declarations, then for the external subset, which must be fetched
    and parsed for entity declarations as well.
bx) The linking specification is still in draft.  However, whatever
    part of the doctype declaration is required by the XML declaration
    must also be parsed for hypertext link identification.  This
    information will assert that certain elements map to certain XML
    link constructs.  That mapping will be loaded into a table.

c)  Identify and fetch the stylesheet(s) associated with the document.
    There is currently no mechanism specified for doing this - I'm
    making a proposal in about five minutes, though.  Lynx will have
    to learn to parse DSSSL[*], most likely.  A simplified subset,
    called dsssl-o, is likely to be the stylesheet language for XML.

d)  Parse and render the document, using the linking table and the
    stylesheet.  External entities can probably be left as links, only
    downloaded when the user requests them.  Can the HT structure be
    rebuilt on the fly when that happens?  I don't think it would be
    unreasonable to always treat external entities as an integer
    number of lines - i.e., only whole lines would ever be inserted
    into the rendered view.

I am more than happy to help with design, and with writing any new
code, but I don't feel comfortable playing with the existing code,
since I really haven't had the time to invest in understanding what's
already there.

More information about XML is available in the XML FAQ, at
<URL:http://www.ucc.ie/xml>.

-Chris

[*] DSSSL is the Document Style Semantics and Specification Language,
ISO/IEC 10179:1996.  DSSSL introduces the concept of a tree of flow
objects for rendering.  A lot of these will be irrelevant to a
character-mode browser, like font information.  Lynx will only need to
pay attention to larger-scale objects, like paragraphs.  While re-
writing the rendering routines, we might be able to add table support.
-- 
Christopher R. Maden                  One Richmond Square
DynaText SIT Technical Support        Providence, RI 02906 USA
Inso Corporation                      +1.401.421.9550 (voice)
Electronic Publishing Solutions       +1.401.521.2030 (facsimile)
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]