lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev HTDoRead() HTTCP.c possible bug - retry limit set too high?


From: Vlad Harchev
Subject: Re: lynx-dev HTDoRead() HTTCP.c possible bug - retry limit set too high?
Date: Mon, 10 Jul 2000 15:56:37 +0500 (SAMST)

On Sun, 9 Jul 2000, Klaus Weide wrote:

> On Sat, 8 Jul 2000, Vlad Harchev wrote:
> 
> > On Fri, 7 Jul 2000, Klaus Weide wrote:
> > 
> > > On Fri, 7 Jul 2000, Vlad Harchev wrote:
> > > > On Fri, 7 Jul 2000, Klaus Weide wrote:
> > > > > On Fri, 7 Jul 2000, Vlad Harchev wrote:
> > > > > >   Seems we should add new lynx.cfg setting READ_TIMEOUT to control 
> > > > > > this (there
> > > > > > already exists CONNECT_TIMEOUT). Does anybody object against it? 
> > > > > 
> > > > > As long as you keep the current behavior by default...
> > > > 
> > > >   I assume that by "current behaviour" you mean current value of 
> > > > timeout.
> > > 
> > > I meant current behavior in the same situation.  What you call "current
> > > value of timeout" isn't all there is to it.
> > 
> >  I don't get you - I assume you understand that 'READ_TIMEOUT' addition will
> > just substitute '180000' with some expression 
> 
> That wasn't clearly stated, although I could assume that.
> 
> But anyway, that's not all that your change will do.  You will at least
> also add a lynx.cfg and/or command line option, with some documentation
> that implies a promise that READ_TIMEOUT will act as a read timeout.

  Yes, of course but this is not relevant for the behaviour of READ_TIMEOUT
to be implemented.

> Can you keep that promise, in all situations?  Or do you have to qualify
> it?

  Yes, I won't go beyond changing 180000 with some expression so I will have
to qualify.

> > - so all behaviour will remain the same. 
> >   
> > > First of all, the 180000 wasn't origninally meant as a timeout.  Rather
> > > as a protection against an infinite loop, which is subtly different:
> > 
> >  Yes, this value defines the value of timeout (18000 seconds by default).
> 
> No, 180000 * (approx. 100 ms + extra processing time), which is not the same
> as 180000 * (exactly 100 ms).  Small errors accumulate.  The clock will
> run faster if the process gets for some reason many interrupts that result
> in EINTR. The clock will not run while the process is stopped (^Z).
> 
> All this is to say that the 180000 wasn't originally meant as a timeout.
> If it had been, it probably would have been implemented differently, to
> work more reliably as a timeout.
> Some historic CHANGES entries:
> 
> 07-04-95 (Enjoy the fireworks!!!  8-)
> * Increased the connect() and select() while()-looping limit in HTTCP.c
>   to 30,000 tries. - FM
> 05-03-95
> * Increased limits in select() loops to 5000 tries. - FM
> 03-09-95
> * Increased the while() loop limit for select() tries in HTTCP.c to 500. - FM
> 03-05-95
> * Limited the while() loop for select()'s in HTTCP.c to 50 tries, to help
>   reduce likelyhood of a runaway CPU on undetected terminal disconnects. - FM
> 
> Note especially the last one, it sheds some light on the original motivation.
> Note CHANGEs entries in the same timeframe that mention fixes to BSDselect,
> that should give you an idea why a protection against infinite loop was 
> needed.
> Reading old lynx-dev messages would probably also be illuminating.
> 
> > >     while (!ready) {
> > >         /*
> > >         **  Protect against an infinite loop.
> 
> Note that it doesn't say "Time out after too meany tries" or something 
> similar.
> 
> > >         */
> > >         if (tries++ >= 180000) {
> > >             HTAlert(gettext("Socket read failed for 180,000 tries."));
> 
> Note that it doesn't say "timed out" or something similar.

 But it will work roughly as timeout, of course.

> > > Secondly, note that not all systems will make use of your new READ_TIMEOUT
> > > anyway.  Only those for which the
> > > 
> > >     #define NETREAD  HTDoRead
> > > 
> > > is not overridden in www_tcp.h will.
> > 
> >   Yes, and it looks like cygwin and OS/2 will use HTDoRead.
> 
> And at least some instances on VMS won't.  Or so it seems - I don't know
> if those combinations of Lynx with specific netwrking libraries are still
> useful.
> 
> Anyway, you may end up promising a READ_TIMEOUT that doesn't actually have
> any effect for some users.

  Yes, this is what I plan.

> Also, you seem to be thinking about *decreasing* the timeout with the new
> hypothetical option.  But can one use it to increase the timeout?  What is
> the absolute maximum?  Can one specify infinity?

  If you think it will be useful - it will be possible :) (say the value
zero). But of course OS won't let timeout to be infinite.
 
> The promise of using a specified (long) timeout also will not work if there
> already is a shorter timeout, outside of lynx's control, imposed by OS or
> the TCP {protocol,implementation} or a proxy server in the middle.
> 
> > > >  As for first part ("better use the script below") - we've discussed 
> > > > this
> > > > before. This won't work for crawling 
> > > 
> > > That is not the situation here.  The original poster mentioned -dump, no
> > > traversal.
> > > 
> > > Lynx's "traversal" code is quasi-interactive anyway.  You have a tty.
> > > A normal 'z' should work just fine to interrupt a hanging connect or
> > > read.  It did when I last checked.
> > 
> >   I didn't know about 'z'.
> 
> And to expand on that, a specific READ_TIMEOUT option isn't needed for any
> interactive lynx session, since one can always 'z'ap.  Specifying READ_TIMEOUT
> for an interactive session just means you deny yourself the possibility to
> decide to "give the connection a chance for a while longer".

  Probably it will be useful for semi-automatic session (when lynx reads
command from log file (option --cmd_script) or standard input).
 
> So, just to make clear what we are talking about: addingt
> is meant to be useful only with -dump or -source.

  Yes, this is the most useful for these things.

> > > > Also, if we use the script, we can only limit
> > > > the total time of the crawling session, not the timeout for each 
> > > > individual
> > > > document.
> > > 
> > > True.
> > > 
> > > It depends on what the problem to be solved is (which nobody has clearly
> > > stated).  As I wrote explicitly, I assumed that
> > 
> >   Let's think about entire spectrum of problems with respect to timeout on
> > reading, not just described in original post.
> 
> Fine if you have the time to think of every conceivable situation. :)
> But it seems to me that you are mostly concerned with parts of the spectrum
> that you have no experience with or personal use for (crawling) - and that
> could normally be better done with not-lynx, anyway.

  I'm not so concerned with them. Just a reason for new small patch that will
do some rather wise things. But yes, I used crawling about 2 years ago last
time.

> > > > > the problem
> > > > > is really: 'Non-interactive lynx processes hang around for too long
> > > > > under some conditions'.
> > > 
> > > If you are talking about "crawling session", you are talking about 
> > > something
> > > else, apparently.  At least you're not talking about lynx with -dump.
> > 
> >   Yes, I was talking about recursively storing rendered versions of 
> > documents
> > recursively.
> 
> The canonical recommendation for this kind of thing has bee "use wget or
> similar", for quite some time.

  Yes, other means could be used, but there are cases where only lynx will
work (though lynx's crawling is not very useful).

> 
> > > > > Better learn how to kill a process so that it *never* can run longer
> > >     ====================================================================
> > > > > than a max time.  Take the shell script below as a starting point.
> > >     ===============
> > > 
> > > I stand by that.  Better learn how to do that, if that's what you need.
> > > 
> > > I didn't mean that a -read_timeout option would be useless.  Just that
> > > in the situation at hand, as well as others (but not all), it is not
> > > the most straightforward or reliable way to fullfil the requirement /
> > > solve the problem.
> > 
> >   Yes, that's what I mean - it won't me useless. But why do you think that 
> > it
> > will be not the most reliable way to lfil the requirement / solve the 
> > problem?
> 
> As I've already said, because your READ_TIMEOUT won't always work ans one
> might expect.

  Yes, this will be stated in the docs.

>    Klaus
> 
> 
> ; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden
> 

 Best regards,
  -Vlad


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]