bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Unexpected wget -N behaviour for 1.17 onwards?


From: Darshit Shah
Subject: Re: [Bug-wget] Unexpected wget -N behaviour for 1.17 onwards?
Date: Tue, 12 Feb 2019 00:21:58 +0100
User-agent: NeoMutt/20180716

* Tim Rühsen <address@hidden> [190211 13:45]:
> You are right, --if-modified-since changes -N behavior in case a file is
> incomplete. --if-modified-since can't easily be fixed since the 304
> response does not include file size information.
> 
> As you suggest, we should disable this option by default or at least
> discuss the options we have.

That's correct. While, the lack of a Content-Length header on a 304 response
causes problems, we can't rely on it to exist even for normal 200 / 206
response.

Let me try to aggregate some of the possible options (I'm not saying any of
these are particularly a good idea):

1. Write file to a tmpfile and on successful download, move it to the real
   location.
   This option has multiple problems. Firstly, people don't expect Wget to
   write to a tmp file. This can be problematic, especially when people try to
   play streaming data without a -O. But for the purposes of dealing with -N
   and --if-modified-since, this is the best option.

2. Issue a utime() call after every write() in order to set the mtime again to
   something older than the one reported by the server.
   In this, we would need to issue a utime() after each call to write() in order
   to reset its mtime to an earlier time. After the file is fully downloaded,
   set the mtime to the actual one as provided by the server. This introduces
   an issue where Wget is issuing too many system calls. And with Wget2, it
   might get really bad due to downloading ~30+ files in parallel. I'm also
   unsure of how the kernel handles races between write() and utime() calls. We
   don't want to set the mtime of the file and have it overwritten by the
   previous write() call. This might be valid option, especially since it is
   cross platform. However, the performance impact would need to be evaluated.

3. Only enable If-Modified-Since when xattr is available.
   The idea here is simple, on systems where xattr is possible, store either an
   old timestamp or a completion flag in the attributes. Use this metadata to
   issue a If-Modified-Since header. If xattr is not available or the
   attributes are not found, use the HEAD+GET approach.


Are there any other options that I've missed?

> On 2/10/19 2:42 PM, Lawrence Wade wrote:
> > Hi Tim,
> > 
> > Okay. Using the OpenSUSE-packaged wget (1.19.5) that comes with Leap 15.0:
> > 
> > $ wget -r -N 192.168.2.100:8080
> > ...
> > Reusing existing connection to 192.168.2.100:8080.
> > HTTP request sent, awaiting response... 304 Not Modified
> > File ‘192.168.2.100:8080/OaP6ysTyz6Y.mp
> > 4’ not modified on server. Omitting download.
> > 
> > This file is incomplete in my local copy.
> > 
> > Trying again as you suggest,
> > 
> > $ wget -r -N --no-if-modified-since 192.168.2.100:8080
> > ...
> > --2019-02-10 08:35:14--  http://192.168.2.100:8080/OaP6ysTyz6Y.mp4
> > Reusing existing connection to 192.168.2.100:8080.
> > HTTP request sent, awaiting response... 200 OK
> > Length: 38044195 (36M) [application/octet-stream]
> > The sizes do not match (local 8643456) -- retrieving.
> > --2019-02-10 08:35:14--  http://192.168.2.100:8080/OaP6ysTyz6Y.mp4
> > Reusing existing connection to 192.168.2.100:8080.
> > HTTP request sent, awaiting response... 200 OK
> > Length: 38044195 (36M) [application/octet-stream]
> > Saving to: ‘192.168.2.100:8080/OaP6ysTy
> > z6Y.mp4
> > ...
> > 
> > And it appears to work as expected. Won't this change to the behaviour
> > of -N option subtly break a lot of scripts which rely on wget?
> > 
> > Thanks so much, Tim. I do have an answer and a workaround though my
> > concerns remain.
> > 
> > Lawrence Wade
> > Ottawa, Canada
> > 
> > On Sun, Feb 10, 2019 at 2:11 AM Lawrence Wade <address@hidden> wrote:
> >>
> >> Hi Everyone,
> >>
> >> This might be a corroboration of this
> >> http://lists.gnu.org/archive/html/bug-wget/2018-10/msg00049.html
> >> and this
> >> https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1715481
> >>
> >> I use wget to backup my cellphone running Palapa Web Server, and it
> >> has worked well for me for years. Since upgrading to OpenSUSE Leap 15,
> >> I have been having corrupted files.
> >>
> >> My method is
> >> $ wget -r -N 192.168.2.100:8080
> >> and if the connection is interrupted for any reason, the next time I
> >> call wget it would complete any incomplete files. And since Leap 15, I
> >> have been getting gradually corrupted backups. I was tearing my hair
> >> out looking at wgetrc and other things.
> >>
> >> With one long file that I knew was incomplete, I got a Not Modified -
> >> omitting download, even though I knew the file sizes were different
> >> between the server and wget's copy - though the wget man page
> >> explicitly states that if the file sizes do not match, -N will trigger
> >> a download.
> >>
> >> I tried on OpenSUSE 42.3 (wget 1.14) and the incomplete file triggered
> >> a download, even though wgetrc was identical.
> >>
> >> Again, on Leap 15, I compiled 1.20.1 (latest), 1.17.1, and then
> >> finally with 1.16.3 the behaviour went back to what I expected (and I
> >> got my corrupted phone backups fixed).
> >>
> >> Was a bug possibly introduced in 1.17 with the support for 
> >> --if-modified-since?
> >>
> >> Version shipping with OpenSUSE Leap 15:
> >> GNU Wget 1.19.5 built on linux-gnu.
> >> +cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls
> >> +ntlm +opie +psl +ssl/openssl
> >>
> >> Last version I tried where "wget -r -N" works as expected:
> >> GNU Wget 1.16.3 built on linux-gnu.
> >> +digest +https +ipv6 -iri +large-file +nls +ntlm +opie +psl +ssl/gnutls
> >>
> >> I'm open to the possibility that there may be something else causing
> >> this bug, I have not found many mentions of it, but then again it is
> >> subtle. You get pretty confident when you just let wget do its thing,
> >> so there may be a lot of incomplete files out there... :)
> >>
> >> Thanks so much for your help. I can provide any other info that would
> >> be helpful.
> >>
> >> Lawrence Wade
> >> Ottawa, Canada
> > 
> 



-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]