[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Timeouts need 'page' and 'byte' options
From: |
Darshit Shah |
Subject: |
Re: [Bug-wget] Timeouts need 'page' and 'byte' options |
Date: |
Sun, 28 Sep 2014 00:45:33 +0530 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
Hi grarpamp,
Responding inline...
On 09/27, grarpamp wrote:
So I set '-T 10' briefly thinking that, no matter what, wget
would exit, with an appropriate status, after 10 sec.
No, the --timeout switch will only set dns, connect and read timeouts. It is not
meant to force Wget to exit after X seconds. This feature is clearly documented
in the Wget man page.
Yet on a slow loading page it keeps on slowly
receiving data up to around 210 sec, ie: how
long it takes to deliver the sample page I'm on.
This is one of the core tenets of GNU Wget. It will try and try and try to
download the page, no matter what.
wget 1.15 with connect/read/dns timeouts
being the only ones I see available.
I don't need the whole page, just status
off it after some while of the page coming
in and it is recorded to disk.
Can you add a --page-timeout which will
exit wget after n secs loading a single page?
Honestly, I'm unable to see any major use-case for such a functionality. Why
would you want to download a page for only a constant number of seconds?
Anyhow, if you do have a valid use case for this, I'd suggest scripting a
solution around Wget. You could invoke Wget in background mode with -b, sleep()
for the amount of time you want and then send a SIGKILL signal to the Wget
process. Killing Wget with SIGKILL will result in an exit status of 130, which
you can use to check if Wget was indeed downloading a file at the end of the
timeout or it had completed its job and exited, in which case the kill command
will exit with a status of 1.
I'd suggest a new exit code for the case
where we started receiving page bytes,
since on exit it would not be known if the page
would have completed, say according to
the size header.
And presumably if recursive or -i input mode,
do not exit, but move on to the next page.
I'm not sure what you're exactly looking for here, but I am starting to feel
like you want to use the --spider option.
I originally called this 'exec' timeout, but 'page'
timeout seemed more specific to this case.
You may want to call it 'slow' timeout, or
'load' timeout.
Note 'read' timeout never triggers because
page bytes just keep coming in slowly. And I
can't use 'read' because I want more of the page
written than just the 'read' interpacket time
would allow.
Also a 'byte' timeout is needed to keep on receiving
a page until a specific byte count. 'quota' does not
do this.
Again, I see no use-case for this. And even if one were to exist, I'm highly
sceptical of such a feature being added to Wget.
Thanks.
--
Thanking You,
Darshit Shah
pgpKctxdZmba0.pgp
Description: PGP signature