bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Timeouts need 'page' and 'byte' options


From: Darshit Shah
Subject: Re: [Bug-wget] Timeouts need 'page' and 'byte' options
Date: Sun, 28 Sep 2014 00:45:33 +0530
User-agent: Mutt/1.5.23 (2014-03-12)

Hi grarpamp,

Responding inline...
On 09/27, grarpamp wrote:
So I set '-T 10' briefly thinking that, no matter what, wget
would exit, with an appropriate status, after 10 sec.

No, the --timeout switch will only set dns, connect and read timeouts. It is not meant to force Wget to exit after X seconds. This feature is clearly documented in the Wget man page.

Yet on a slow loading page it keeps on slowly
receiving data up to around 210 sec, ie: how
long it takes to deliver the sample page I'm on.

This is one of the core tenets of GNU Wget. It will try and try and try to download the page, no matter what.

wget 1.15 with connect/read/dns timeouts
being the only ones I see available.

I don't need the whole page, just status
off it after some while of the page coming
in and it is recorded to disk.

Can you add a --page-timeout which will
exit wget after n secs loading a single page?

Honestly, I'm unable to see any major use-case for such a functionality. Why would you want to download a page for only a constant number of seconds?

Anyhow, if you do have a valid use case for this, I'd suggest scripting a solution around Wget. You could invoke Wget in background mode with -b, sleep() for the amount of time you want and then send a SIGKILL signal to the Wget process. Killing Wget with SIGKILL will result in an exit status of 130, which you can use to check if Wget was indeed downloading a file at the end of the timeout or it had completed its job and exited, in which case the kill command will exit with a status of 1.
I'd suggest a new exit code for the case
where we started receiving page bytes,
since on exit it would not be known if the page
would have completed, say according to
the size header.

And presumably if recursive or -i input mode,
do not exit, but move on to the next page.

I'm not sure what you're exactly looking for here, but I am starting to feel like you want to use the --spider option.

I originally called this 'exec' timeout, but 'page'
timeout seemed more specific to this case.
You may want to call it 'slow' timeout, or
'load' timeout.

Note 'read' timeout never triggers because
page bytes just keep coming in slowly. And I
can't use 'read' because I want more of the page
written than just the 'read' interpacket time
would allow.

Also a 'byte' timeout is needed to keep on receiving
a page until a specific byte count. 'quota' does not
do this.

Again, I see no use-case for this. And even if one were to exist, I'm highly sceptical of such a feature being added to Wget.
Thanks.


--
Thanking You,
Darshit Shah

Attachment: pgpKctxdZmba0.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]