[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lynx-dev HTDoRead() HTTCP.c possible bug - retry limit set too high?
From: |
Vlad Harchev |
Subject: |
Re: lynx-dev HTDoRead() HTTCP.c possible bug - retry limit set too high? |
Date: |
Mon, 10 Jul 2000 15:56:37 +0500 (SAMST) |
On Sun, 9 Jul 2000, Klaus Weide wrote:
> On Sat, 8 Jul 2000, Vlad Harchev wrote:
>
> > On Fri, 7 Jul 2000, Klaus Weide wrote:
> >
> > > On Fri, 7 Jul 2000, Vlad Harchev wrote:
> > > > On Fri, 7 Jul 2000, Klaus Weide wrote:
> > > > > On Fri, 7 Jul 2000, Vlad Harchev wrote:
> > > > > > Seems we should add new lynx.cfg setting READ_TIMEOUT to control
> > > > > > this (there
> > > > > > already exists CONNECT_TIMEOUT). Does anybody object against it?
> > > > >
> > > > > As long as you keep the current behavior by default...
> > > >
> > > > I assume that by "current behaviour" you mean current value of
> > > > timeout.
> > >
> > > I meant current behavior in the same situation. What you call "current
> > > value of timeout" isn't all there is to it.
> >
> > I don't get you - I assume you understand that 'READ_TIMEOUT' addition will
> > just substitute '180000' with some expression
>
> That wasn't clearly stated, although I could assume that.
>
> But anyway, that's not all that your change will do. You will at least
> also add a lynx.cfg and/or command line option, with some documentation
> that implies a promise that READ_TIMEOUT will act as a read timeout.
Yes, of course but this is not relevant for the behaviour of READ_TIMEOUT
to be implemented.
> Can you keep that promise, in all situations? Or do you have to qualify
> it?
Yes, I won't go beyond changing 180000 with some expression so I will have
to qualify.
> > - so all behaviour will remain the same.
> >
> > > First of all, the 180000 wasn't origninally meant as a timeout. Rather
> > > as a protection against an infinite loop, which is subtly different:
> >
> > Yes, this value defines the value of timeout (18000 seconds by default).
>
> No, 180000 * (approx. 100 ms + extra processing time), which is not the same
> as 180000 * (exactly 100 ms). Small errors accumulate. The clock will
> run faster if the process gets for some reason many interrupts that result
> in EINTR. The clock will not run while the process is stopped (^Z).
>
> All this is to say that the 180000 wasn't originally meant as a timeout.
> If it had been, it probably would have been implemented differently, to
> work more reliably as a timeout.
> Some historic CHANGES entries:
>
> 07-04-95 (Enjoy the fireworks!!! 8-)
> * Increased the connect() and select() while()-looping limit in HTTCP.c
> to 30,000 tries. - FM
> 05-03-95
> * Increased limits in select() loops to 5000 tries. - FM
> 03-09-95
> * Increased the while() loop limit for select() tries in HTTCP.c to 500. - FM
> 03-05-95
> * Limited the while() loop for select()'s in HTTCP.c to 50 tries, to help
> reduce likelyhood of a runaway CPU on undetected terminal disconnects. - FM
>
> Note especially the last one, it sheds some light on the original motivation.
> Note CHANGEs entries in the same timeframe that mention fixes to BSDselect,
> that should give you an idea why a protection against infinite loop was
> needed.
> Reading old lynx-dev messages would probably also be illuminating.
>
> > > while (!ready) {
> > > /*
> > > ** Protect against an infinite loop.
>
> Note that it doesn't say "Time out after too meany tries" or something
> similar.
>
> > > */
> > > if (tries++ >= 180000) {
> > > HTAlert(gettext("Socket read failed for 180,000 tries."));
>
> Note that it doesn't say "timed out" or something similar.
But it will work roughly as timeout, of course.
> > > Secondly, note that not all systems will make use of your new READ_TIMEOUT
> > > anyway. Only those for which the
> > >
> > > #define NETREAD HTDoRead
> > >
> > > is not overridden in www_tcp.h will.
> >
> > Yes, and it looks like cygwin and OS/2 will use HTDoRead.
>
> And at least some instances on VMS won't. Or so it seems - I don't know
> if those combinations of Lynx with specific netwrking libraries are still
> useful.
>
> Anyway, you may end up promising a READ_TIMEOUT that doesn't actually have
> any effect for some users.
Yes, this is what I plan.
> Also, you seem to be thinking about *decreasing* the timeout with the new
> hypothetical option. But can one use it to increase the timeout? What is
> the absolute maximum? Can one specify infinity?
If you think it will be useful - it will be possible :) (say the value
zero). But of course OS won't let timeout to be infinite.
> The promise of using a specified (long) timeout also will not work if there
> already is a shorter timeout, outside of lynx's control, imposed by OS or
> the TCP {protocol,implementation} or a proxy server in the middle.
>
> > > > As for first part ("better use the script below") - we've discussed
> > > > this
> > > > before. This won't work for crawling
> > >
> > > That is not the situation here. The original poster mentioned -dump, no
> > > traversal.
> > >
> > > Lynx's "traversal" code is quasi-interactive anyway. You have a tty.
> > > A normal 'z' should work just fine to interrupt a hanging connect or
> > > read. It did when I last checked.
> >
> > I didn't know about 'z'.
>
> And to expand on that, a specific READ_TIMEOUT option isn't needed for any
> interactive lynx session, since one can always 'z'ap. Specifying READ_TIMEOUT
> for an interactive session just means you deny yourself the possibility to
> decide to "give the connection a chance for a while longer".
Probably it will be useful for semi-automatic session (when lynx reads
command from log file (option --cmd_script) or standard input).
> So, just to make clear what we are talking about: addingt
> is meant to be useful only with -dump or -source.
Yes, this is the most useful for these things.
> > > > Also, if we use the script, we can only limit
> > > > the total time of the crawling session, not the timeout for each
> > > > individual
> > > > document.
> > >
> > > True.
> > >
> > > It depends on what the problem to be solved is (which nobody has clearly
> > > stated). As I wrote explicitly, I assumed that
> >
> > Let's think about entire spectrum of problems with respect to timeout on
> > reading, not just described in original post.
>
> Fine if you have the time to think of every conceivable situation. :)
> But it seems to me that you are mostly concerned with parts of the spectrum
> that you have no experience with or personal use for (crawling) - and that
> could normally be better done with not-lynx, anyway.
I'm not so concerned with them. Just a reason for new small patch that will
do some rather wise things. But yes, I used crawling about 2 years ago last
time.
> > > > > the problem
> > > > > is really: 'Non-interactive lynx processes hang around for too long
> > > > > under some conditions'.
> > >
> > > If you are talking about "crawling session", you are talking about
> > > something
> > > else, apparently. At least you're not talking about lynx with -dump.
> >
> > Yes, I was talking about recursively storing rendered versions of
> > documents
> > recursively.
>
> The canonical recommendation for this kind of thing has bee "use wget or
> similar", for quite some time.
Yes, other means could be used, but there are cases where only lynx will
work (though lynx's crawling is not very useful).
>
> > > > > Better learn how to kill a process so that it *never* can run longer
> > > ====================================================================
> > > > > than a max time. Take the shell script below as a starting point.
> > > ===============
> > >
> > > I stand by that. Better learn how to do that, if that's what you need.
> > >
> > > I didn't mean that a -read_timeout option would be useless. Just that
> > > in the situation at hand, as well as others (but not all), it is not
> > > the most straightforward or reliable way to fullfil the requirement /
> > > solve the problem.
> >
> > Yes, that's what I mean - it won't me useless. But why do you think that
> > it
> > will be not the most reliable way to lfil the requirement / solve the
> > problem?
>
> As I've already said, because your READ_TIMEOUT won't always work ans one
> might expect.
Yes, this will be stated in the docs.
> Klaus
>
>
> ; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden
>
Best regards,
-Vlad
; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden