lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Changing timeout length


From: Klaus Weide
Subject: Re: lynx-dev Changing timeout length
Date: Sat, 15 Jan 2000 03:11:33 -0600 (CST)

On Fri, 14 Jan 2000, Kevin A. Jett wrote:

> We are using Lynx in a web-spider script to download a large number of
> HTML pages from various websites on a list by first doing a "lynx -dump"
> followed by a "lynx -source" on each page.  However, I noticed that if
> Lynx attempts to do either of these on a page that cannot be accessed
> (for example, if its server is down), it waits a ridiculously long time
> before it finally "gives up" and moves on to the next file.  It seemed
> to try to access the page for about 13 minutes, so I was wondering if
> this was a value specified in the source code or if it just happened to
> take that long.  

It's your responsibility to follow robot rules etc. if that applies.
Lynx doesn't do it for you.  If that's what you need, you should use
wget or similar.

Your method looks very wasteful anyway, especially when used for
"a large number of pages".  One "lynx -source" request should be
enough, if you need a formatted copy you can generate that locally
from the downloaded source.

Leaving aside the question whether it's a good idea to use Lynx for
this...  the timeout you see on connect() would normally be your
system's default timeout for tcp connections.  It's part of your
OS's / TCP/IP stack's behavior.  You can probably change it, but the
way to do that would be system dependent (e.g., setsockopt(), sysctl(),
...).

(For me the default timeout seems to be around 3 min 13 s.  I don't
usually get a timeout from lynx anyway, since I'm going through a proxy
whihc is configured to provide a (shorter) timeout.)

> I didn't see a way to set the timeout length in the
> lynx.conf file, so I was wondering if there's some way to change this in
> the source code.  If you could provide any assistance on this I'd really
> appreciate it.

There is no configuration option for lynx to change this.
You may be able to achieve a shorter timeout (if you don't use
one of the mechanisms mentioned above) by changing the number
180000 in HTTCP.c to something much smaller.  Probably to
( desired timeout / s ) * 10.

 
            /*
            **  Protect against an infinite loop.
            */
            if (tries++ >= 180000) {


   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]