lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Re: new Lynx SGML.c parser


From: Foteos Macrides
Subject: Re: LYNX-DEV Re: new Lynx SGML.c parser
Date: Fri, 25 Apr 1997 20:44:19 -0500 (EST)

"Christopher R. Maden" <address@hidden> wrote:
>[Klaus Weide]
>> That is basically what Fote's changes to the parsing since 2.7.1 do:
>> in terms of your description, there is a class for highlighting, a
>> class for FORMs, a class for the rest.
>
>Yes, and it's a good solution for rendering the current state of the
>Web.
>[...]
>> I still don't get where the "MIME type" part comes in.  (But maybe
>> that's because I don't know anything about DOM?  You may want to
>> spread soem more knowledge.)  But if this is about building a
>> structure in memory, which would just used by one application
>> (Lynx), what's wrong with pointers and linked lists etc. in C, what
>> does it have to do with MIME (internal or not).
>
>By internal MIME type, I'm referring to Lynx's "www/source" and
>"www/present" objects.  I'm thinking of a new one, "xml/present".

        I've been reading this thread, and am not sure I understand
everything that's being said, or in turn, whether the participants
know what's already there, and what the present limitations are on
using any expanding SGML handling.  My puzzlement is based on an earlier
statement that the HTML.c stuff wouldn't need to be changed as well.
It makes no difference how well you parse SGML declarations and marked
sections if the *display engine* doesn't know what to do with markup
other than that for the HTML it's been designed to handle.

        Logically, the input stream should be converted from its
transmitted encoding to a standard encoded, if it's not that already,
then the declarations and marked sections handled, then the SGML
entities converted, and the markup parsed and converted into a tree,
and that acted upon by the display engine, if it can.  The current
Lynx API is trying to deal with all of these things, in one, octet
by octet pass through the input stream, using a "state machine" in
SGML_character() of SGML.c, which at the conclusions of the various
states (or in the course of state S_text) does various things, either
in conjunction with the SGML stack and other functions in SGML.c, or
via direct calls to functions in HTML.c.

        Back when it looked like Panarama might catch on, I added
recognition of the MIME types text/sgml and text/x-sgml, and handling
of SGML declarations and marked sections in SGML_character() via
states which load their content into HTChunks, and then pass them
to handle_foo() functions I added (e.g., handle_doctype(),
handle_marked(), handle_entity(), etc.).  Those presently just
send the chunk content to stderr if TRACE mode is on, free the
chunk, and return to SGML_character(), but the intent was to add
code in them for doing something worthwhile, when appropriate code,
e.g., for expanding or modifying the entity tables in conjunction
with Kurt's (err, I mean Santa Klaus') chartrans mods, or stuff
that could interact with Rob's color/styles mods, or display engine
enhancements in HTML.c, GridText.c and the LYfoo.c modules, also
was added.  If you don't have such code beyond that at present
for dealing with the basic HTML via the display engine, what
difference does it make if you've actually analyzed the additional
SGML stuff?  At least for now, we're making sure any additional
SGML stuff received when the MIME type is text/sgml or text/x-sgml
doesn't bleed through and garbage up the display. :) :) :)

                                Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]