[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Unexpected wget -N behaviour for 1.17 onwards?

From: Tim Rühsen
Subject: Re: [Bug-wget] Unexpected wget -N behaviour for 1.17 onwards?
Date: Tue, 12 Feb 2019 10:06:47 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0

On 2/12/19 12:21 AM, Darshit Shah wrote:
> * Tim Rühsen <address@hidden> [190211 13:45]:
>> You are right, --if-modified-since changes -N behavior in case a file is
>> incomplete. --if-modified-since can't easily be fixed since the 304
>> response does not include file size information.
>> As you suggest, we should disable this option by default or at least
>> discuss the options we have.
> That's correct. While, the lack of a Content-Length header on a 304 response
> causes problems, we can't rely on it to exist even for normal 200 / 206
> response.
> Let me try to aggregate some of the possible options (I'm not saying any of
> these are particularly a good idea):
> 1. Write file to a tmpfile and on successful download, move it to the real
>    location.
>    This option has multiple problems. Firstly, people don't expect Wget to
>    write to a tmp file. This can be problematic, especially when people try to
>    play streaming data without a -O. But for the purposes of dealing with -N
>    and --if-modified-since, this is the best option.
> 2. Issue a utime() call after every write() in order to set the mtime again to
>    something older than the one reported by the server.
>    In this, we would need to issue a utime() after each call to write() in 
> order
>    to reset its mtime to an earlier time. After the file is fully downloaded,
>    set the mtime to the actual one as provided by the server. This introduces
>    an issue where Wget is issuing too many system calls. And with Wget2, it
>    might get really bad due to downloading ~30+ files in parallel. I'm also
>    unsure of how the kernel handles races between write() and utime() calls. 
> We
>    don't want to set the mtime of the file and have it overwritten by the
>    previous write() call. This might be valid option, especially since it is
>    cross platform. However, the performance impact would need to be evaluated.
> 3. Only enable If-Modified-Since when xattr is available.
>    The idea here is simple, on systems where xattr is possible, store either 
> an
>    old timestamp or a completion flag in the attributes. Use this metadata to
>    issue a If-Modified-Since header. If xattr is not available or the
>    attributes are not found, use the HEAD+GET approach.
> Are there any other options that I've missed?

4. Do not use --if-modified-since by default with -N - let the user
control it.
We only have an issue if the -N download gets interrupted and should be
continued later. This is often not the case - like in my personal
interactive '-r -N' scenarios. Of course it's error-prone to non-aware
users. But you asked for other options.

Didn't I solve that issue for Wget2 already ?
From src/wget.c (http_receive_response):
if (resp->last_modified) {
  /* If program was aborted, we store file times one second less than
the server time.
   * So a later download with -N would start over instead of leaving
incomplete data.
   * Or a later download with -c -N would continue with a

   if (config.xattr && !terminate)
      write_xattr_last_modified(resp->last_modified, context->outfd);

   set_file_mtime(context->outfd, resp->last_modified - terminate);

Regards, Tim

>> On 2/10/19 2:42 PM, Lawrence Wade wrote:
>>> Hi Tim,
>>> Okay. Using the OpenSUSE-packaged wget (1.19.5) that comes with Leap 15.0:
>>> $ wget -r -N
>>> ...
>>> Reusing existing connection to
>>> HTTP request sent, awaiting response... 304 Not Modified
>>> File ‘
>>> 4’ not modified on server. Omitting download.
>>> This file is incomplete in my local copy.
>>> Trying again as you suggest,
>>> $ wget -r -N --no-if-modified-since
>>> ...
>>> --2019-02-10 08:35:14--
>>> Reusing existing connection to
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 38044195 (36M) [application/octet-stream]
>>> The sizes do not match (local 8643456) -- retrieving.
>>> --2019-02-10 08:35:14--
>>> Reusing existing connection to
>>> HTTP request sent, awaiting response... 200 OK
>>> Length: 38044195 (36M) [application/octet-stream]
>>> Saving to: ‘
>>> z6Y.mp4
>>> ...
>>> And it appears to work as expected. Won't this change to the behaviour
>>> of -N option subtly break a lot of scripts which rely on wget?
>>> Thanks so much, Tim. I do have an answer and a workaround though my
>>> concerns remain.
>>> Lawrence Wade
>>> Ottawa, Canada
>>> On Sun, Feb 10, 2019 at 2:11 AM Lawrence Wade <address@hidden> wrote:
>>>> Hi Everyone,
>>>> This might be a corroboration of this
>>>> http://lists.gnu.org/archive/html/bug-wget/2018-10/msg00049.html
>>>> and this
>>>> https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1715481
>>>> I use wget to backup my cellphone running Palapa Web Server, and it
>>>> has worked well for me for years. Since upgrading to OpenSUSE Leap 15,
>>>> I have been having corrupted files.
>>>> My method is
>>>> $ wget -r -N
>>>> and if the connection is interrupted for any reason, the next time I
>>>> call wget it would complete any incomplete files. And since Leap 15, I
>>>> have been getting gradually corrupted backups. I was tearing my hair
>>>> out looking at wgetrc and other things.
>>>> With one long file that I knew was incomplete, I got a Not Modified -
>>>> omitting download, even though I knew the file sizes were different
>>>> between the server and wget's copy - though the wget man page
>>>> explicitly states that if the file sizes do not match, -N will trigger
>>>> a download.
>>>> I tried on OpenSUSE 42.3 (wget 1.14) and the incomplete file triggered
>>>> a download, even though wgetrc was identical.
>>>> Again, on Leap 15, I compiled 1.20.1 (latest), 1.17.1, and then
>>>> finally with 1.16.3 the behaviour went back to what I expected (and I
>>>> got my corrupted phone backups fixed).
>>>> Was a bug possibly introduced in 1.17 with the support for 
>>>> --if-modified-since?
>>>> Version shipping with OpenSUSE Leap 15:
>>>> GNU Wget 1.19.5 built on linux-gnu.
>>>> +cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
>>>> +ntlm +opie +psl +ssl/openssl
>>>> Last version I tried where "wget -r -N" works as expected:
>>>> GNU Wget 1.16.3 built on linux-gnu.
>>>> +digest +https +ipv6 -iri +large-file +nls +ntlm +opie +psl +ssl/gnutls
>>>> I'm open to the possibility that there may be something else causing
>>>> this bug, I have not found many mentions of it, but then again it is
>>>> subtle. You get pretty confident when you just let wget do its thing,
>>>> so there may be a lot of incomplete files out there... :)
>>>> Thanks so much for your help. I can provide any other info that would
>>>> be helpful.
>>>> Lawrence Wade
>>>> Ottawa, Canada

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]