lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Why doesn't lynx cache HTML source?


From: Leonid Pauzner
Subject: Re: lynx-dev Why doesn't lynx cache HTML source?
Date: Wed, 18 Nov 1998 23:48:04 +0300 (MSK)

18-Nov-98 08:14 Klaus Weide wrote:

That is exactly what I mean in another words (much clear here).
but see near the bottom.

> Let's image a scale of possible behavior (although that cannot really all
> be expressed in one dimension):

>      complete                                                 use all
>      semantic   <----------------------------------|-------> cached data
>    transparency                                    |          forever
>                                             Lynx in general

> Complete transparency means: whenever a link is followed, a new request
> is made.  The other end of the scale would be: ignore all
> "no-cache", "expires" etc. directives, keep documents around as long as
> they fit in the cache, and never re-request what we already have.
> No client implements either extreme (by default).

> Lynx is currently closer to the right; it is not completeley there
> because it honors no-cache in responses and (some forms of) Expires,
> and resubmits POST and HEAD requests.

> There are several "modes" of going to a page:
>  1) explicit reload: ^R, 'x'
>  2) '*', '\', '[', '"', ^V, etc.
>  3) form submissions (POST)
>  4) following a normal link, entering an address with 'g' etc.; default
>  5) going back in history: either left arrow, or link from History Page

> I have ordered them according to the scale above, 1) corresponds to
> the left end, 5) to the right end.

> I would argue that any change or addition of caching mechanisms should
> not move Lynx much to the left or to the right, for any of the modes,
> _by default_ -- except for 2), see below.

> I don't want a lynx session to act much more semantically transparent
> by default (I have a slow link, too), especially for 4), although it would
> be more correct to do so (it should follow the rules HTTP sets for caches
> more closely).  But it would be nice to be able to configure Lynx to
> act more semantically transparent.

> I also don't want Lynx to act (much) more relaxed by default.  But it
> would be nice to be able to configure Lynx to do even less checking.
> (For example, never honor Expires, or ignore it in META tags.)

> Mode 2 is different, because it is close to the left by accident and
> not by design: we would like it to behave like 5 but cannot since we
> throw away the raw bytes.  So now someone wants to implement a cache
> for raw bytes of HTML documents to achieve that.  Apart from the
> implementation, the major question is: how should this change the behavior
> in other modes.

> If the answer is: It shouldn't, by default -- then the minimal solution
> is simple: just use the new rawdata cache for what it was intended,
> that is only mode 2 requests.  It is very tempting to reuse the rawdata
> for mode 5 requests -- it seems such a waste not to do it -- but we don't
> have to do it.

> If the new rawbyte cache never gets used for requests other than mode 2,
> then no change is needed in the rules for when to make a new request,
> and no If-Modified-Since/Etag implementation is needed to preserve the
> current behavior.  IMS/Etag/304 could still be implemented later, but
> that is then a separate problem.  (It could also already be implemented
> already now, for the existing rendered-doc cache, [except for the
> "language confusion" problem,] which shows that it is separate.)
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^??
> But it DOES seem wasteful to keep raw data if in most cases we won't
> use them.  But:
>   A. The impementation doesn't have to be complicated if
>      - We never keep the cached raw data around for longer than we
>        keep the rendered text, with one exception: during a mode 2
>        request when the data is reparsed.
>      - We keep cached raw data in memory (not files).  We could
>        simply put each bufferful of data into a HTChunk while it is being
>        received.
>      - There is no new expiration, validation, or timestamp comparison
>        logic, so no new metadata needs to be stored.
>   B. It can be greatly restricted what documents get entered into the
>      cache in the first place.  We have a choice of
>      - Caching everything received.
>      - Caching all text/html, maybe with further restrictions based
>        on URL, method, etc.
 also restrictions on file://localhost which always available
>      - Require explicit user action.  Maybe a special "Enter cache" key,
>        meaning "I am going to want this text reparsed, so start caching
>        it".
 ???
>      - When '\' is pressed the first time for the current text, we mark
>        if for rawbyte caching.  There could be a confirmation question.
>        That means at least two network request are needed, but after that
>        cached data is used.
 ???
>      When we go to view a document in other than mode 2, the cached
>      data can be thrown away.  Or alternatively, whenever there is a new
>      network request.

> Note that this could be done without significant changes in mainloop().
> It just would have to set a "this is a mode 2 request" flag,
> HTuncache_current_document() might have to take care to preserve
> existing raw data in this case (and maybe get rid of it otherwise),
> Then other lower-level functions could handle the storing to cache
> and reading from it. The mainloop() function wouldn't have to know
> where the new data comes from.
exactly.

But I think "mode 2" need no confirmation in any case:
every text/plain _rendered_ document should be cached in rawdata
if it currently cached in rendered form, otherwise fall back
to the present behaviour (no rawdata cacheing at all).
Of cause, rawdata cache may be larger than rendered cache.

When we mean modes other than "mode 2" we can easily get more relaxed
behaviour it we want it (configurable, of cause), but getting more strict
require changing of present code (LYoverride_no_cache etc.) -
it is a separate problem.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]