bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Timeouts need 'page' and 'byte' options


From: grarpamp
Subject: Re: [Bug-wget] Timeouts need 'page' and 'byte' options
Date: Sun, 28 Sep 2014 15:10:04 -0400

On Sat, Sep 27, 2014 at 3:15 PM, Darshit Shah <address@hidden> wrote:
> Honestly, I'm unable to see any major use-case for such a functionality. Why
> would you want to download a page for only a constant number of seconds?

And/or, a constant bytecount.

I validate/find tarpits, slow data links, high loaded servers, extremely
large pages/downloads and other such mechanisms/phenomena. Often
do not need the whole page data, nor need to wait for the whole page
to confirm function.

>> I'd suggest a new exit code for the case
>> where we started receiving page bytes,
>> since on exit it would not be known if the page
>> would have completed, say according to
>> the size header.

So we need to know the exits were reached not because
of some other already existing wget exit mechanism
or error issue, but via --slow-timeout or --byte-timeout.
Since it is possible to combine these two, two new exit
codes would be useful to distinguish which was hit.

>> And presumably if in recursive or -i input mode,
>> do not exit, but move on to the next page.

Though if -r or -i, we would not exit, so probably need to make
a new extended %parameter based '--extlog' log format, which is
actually needed for the rest of wget anyways to log various
things all on one line.

Another operation note, the -o log format would be normal to still
list the usual '[bytes received in brackets(/expected size header)]'
when running under either these two new options.

But to be real, --slow and --byte can come to be added first
before nice logs.

> I'd suggest scripting a
> solution around Wget. You could invoke Wget in background mode with -b,
> sleep() for the amount of time you want and then send a SIGKILL signal to
> the Wget process. Killing Wget with SIGKILL will result in an exit status of
> 130, which you can use to check if Wget was indeed downloading a file at the
> end of the timeout or it had completed its job and exited, in which case the
> kill command will exit with a status of 1.

No, here, actual a 'KILL' (9) results in 137, and 'TERM' (15) in 143.
You can't capture exit off of '-b' because it daemon the process away
from all parent job control, process group, and terminal attachment.
Also process numbers can and do recycle after exit, so may kill
wrong thing, or have nothing to kill. All giving the wrong actual status.
And still does not solve specified case of new 'we received some
page bytes flowing' exit code indication, but were terminated in process
by --slow-timeout or --byte-timeout.

You could wrap when in non-daemon background '&' for --slow-timeout,
but is very complex to do if '-O', many wget in parallel, or other
such intricacies.
And due to output buffering (in wget, shell/tool pipeline, filesystem) and other
issues, it's basically impossible to emulate --byte-timeout with a wrapper.
Best for all users is to simply implement the --slow and --byte timeouts.

Note: '-b' should be renamed from 'background' to 'daemon' to
be correct in context of actual operation of '-b' option.


> like you want to use the --spider option.

No, --spider just send a HEAD request. HEAD req/reply
are too short to properly time/characterize/validate, and
do not contain page body content of various pages to
store/diff their said content reply.


>> I originally called this 'exec' timeout, but 'page'
>> timeout seemed more specific to this case.
>> You may want to call it 'slow' timeout, or
>> 'load' timeout.

Or --pagetime-timeout, or somesuch.

>> Note 'read' timeout never triggers because
>> page bytes just keep coming in slowly. And I
>> can't use 'read' because I want more of the page
>> written than just the 'read' interpacket time
>> would allow.
>>
>> Also a 'byte' timeout is needed to keep on receiving
>> a page until a specific byte count. '--quota' does not
>> do this.

> sceptical of such a feature being added to Wget.

Someones had their use for --read-timeout, it appeared.
Someones had their own thought for --dns-timeout, it appeared.
Someones had their request for --connect-timeout, it appeared.

In the example of new features for users that is :)

I think these two emails have most of the specification
needed I can think of so far for people to use.

Thanks.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]