lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV monster files & GridText tuning


From: Foteos Macrides
Subject: Re: LYNX-DEV monster files & GridText tuning
Date: Tue, 29 Jul 1997 18:19:17 -0500 (EST)

Klaus Weide <address@hidden> wrote:
>[...]
>They are simply the first (approximately) 1M, 2M, and 4M bytes of
>an uncompressed copy of <URL:
>http://SUNSITE.UNC.EDU/pub/Linux/ls-lR.html.gz>. (NOTE: don't click
>that link just yet...)  That file in full has an uncompressed size of
>nearly 10M. It is also invalid HTML, but that's beside the point here.
>
>When the files were fully loaded, lynx needed significantly more memory
>to hold the rendered version and its structures, about a factor of 3.
>The 4M file made the lynx process grow to ~ 14 MB, from 2 MB at startup
>(these were compiled with debugging and not stripped).  This, as well as
>the time needed for loading, of course depends heavily on the contents
>of the files.  In this case, they are dense with links, probably more than
>half of the bytes are long HREF URLs.  The 4M version has more than
>20000 links! 
>
>I am referring to recent Lynx 2.7.1ac-0.* here, but there shouldn't be
>any relevant difference between this and the fotemods code.

        None.  I had checked that out back when he posted the message.
The uncompression worked fine in all cases.  For the text/plain version,
it was fetched, uncompressed, rendered, and displayed in about 10 secs.
The text/html version has pitifully Bad HTML, but nothing problematic
for Lynx.  The problem is that it has over FIFTY-EIGHT THOUSAND Anchors,
which takes a while for Lynx to deal with. :)


>The simple change given in Appendix A does not alter functionality and
>gives a noticable improvement:
>[...]
>There is, for both versions, an approximately quadratic growth of
>loading time with the file size.  As the size doubles,  it takes 4 times
>as much time to load the file.
>
>Another change, see B below, improves things more, especially for the
>biggest files:
>[...]
>The behavior is not dominated by a quadratic law.  (It seems rather linear,
>but there's not really enough data to say; maybe someone else wants to run
>a test on the full 10M :) )
>
>So what was Lynx doing?  I was spending a lot of time going over the list
>of links already handled, for each new link (or line) it was adding to
>the text.  The two instances of this which I found were both added since
>2.7.1.  Change A is a simple optimization which avoids this looping for
>the most common case (documents which do not have elements with NAME or ID
>attributes within A elements).  Change B is removing code which now is
>necessary, so I won't remove it execpt for testing.  Maybe Fote can come
>up with a more effeicient implementation.
>
>Or maybe we should just leave things as they are, 'cause optimizing for
>multi-megabyte HTML files may be just wrong priorities.  Actually, the
>fact that lynx needs more and more time to render a long document (with
>many anchors) can be seen as a kind of protection, it slows down the rate
>at which lynx can drive a machine into thrashing...  (but it comes at the
>price of wasted cpu cycles).

        This will be a rare case, but if the handling of such things
can be improved, why not?


>I noticed an annoying thing with these big files.  Lynx doesn't check
>for a 'z' key interrupt when loading a local file, and it also doesn't
>give any progress indication.  So a user who has been misguided into
>loading a 10M file will see nothing happening, and probably think that
>lynx is "broken".  So I am adding a check for 'z' and a progress
>indication for local files.  I let the display kick in only after a few
>hundred k have been read, since in the more normal case they would
>probably just be distracting, and fly by too fast to read.  Also there
>probably is no point in making loading of short files interruptible.
>(The fread() itself cannot be interrupted by 'z', I simply assume that
>the read itself doesn't hang for local files.  If that happens, ^C and
>^Z may be still possible.)

        That's a longstanding problem, particularly for the UMN's
"All the Gopher Servers in the World" link back in the gopher days.

        Your A looks OK, but I haven't checked it.  Your B will
break too much that's needed now that ID applies to all BODY
elements in the so-called HTML 4.0 and we need to deal adequately
with technically embedded links (which is why those mods were made).

                                Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]