lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV Lynx32 blows up when reading >64k


From: Foteos Macrides
Subject: Re: LYNX-DEV Lynx32 blows up when reading >64k
Date: Mon, 14 Apr 1997 19:58:22 -0500 (EST)

Wayne Buttles <address@hidden> wrote:
>On Mon, 14 Apr 1997, Foteos Macrides wrote:
>
>>      If you can't induce the document provider to fix that HTML, then
>> conceptualizing it as a need to "fix the problem in lynx" won't get you
>> any further.  Arbitrarily increasing the size of the stack or making it
>> dynamic won't get the anchors handled properly.  
>
>I was just wondering how much would get us how far.  You must have come up
>with 800 some how.

        As far as I know, it was picked it out of a hat by TimBL for
the linemode browser with the original wwwlib, and Lou therefore used
it as well in the adaptation of Lynx to "WWW HTML" (The original Lynx
had a HYPERERZ-based HTML, back when gopher was a big deal, and the
WWW was just a good idea in TimBL's imagination).  With decent HTML
it's not likely to exceed even 100, let alone 800.  It's only
stacking elements declared SGML_MIXED or SGML_LITTERAL (sic).  Note
also that SGML_LITTERAL was a hack in the original libwww by TimBL
for use with PLAINTEXT and XMP.  Real SGML parsers can't cope with
such a construct, which is why PLAINTEXT and XMP have been
depricated as of HTML 2.0.


>> You can check for whether me->inA is TRUE under case HTML_A: in
>> HTML_start_element(), and if so call HTML_end_element() to close it
>> before starting the next anchor, and add precautionary checks in
>> HTML_end_element() to not pop the stack and not invoke the close anchor
>> code if it's called to close an anchor when me->inA is FALSE.  That will
>> get the document rendered without a stack overflow, and get the anchors
>> with HREFs registered as links which can be activated, but their link
>> names will not all be what's indended, particularly in conjunction with
>> other bad HTML in that document. 
>
>This is what I was thinking, but I have tried it before and it doesn't
>translate real well to Lynx's method of parsing.  As you say, it still
>doesn't give the full desired results.

        If you do the above, plus blow off the configuration option
to use bold for HREF-less NAME-ed Anchors and artifically close them
immediately at the bottom of case HTML_A: in HTML_start_element()
(i.e., treat them as SPOT, which must be what Netscape and MSIE are
actually doing, apparantly without having read the HTML 3.0 specs and
realizing that they implemented SPOT), then it will work even better,
with little risk of a serious problem, but you still could have problems
with substitutions for interdigitated tags, and link names shorter than
intended in markup as awful as that at businesswire.  You'd probably
have to take the hacks all the way to what I did in FOTEMODS for FORMs.
By the way, businesswire appears to be offering garbage markup for
MSIE, not Netscape.  Writing a parser for Lynx which emulates BOTH
Netscape's and MSIE's sins, without access to either's source code,
would be even more challenging than anyone, in fact, would ever do
in his/her "spare time". :) :)


                                Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]