[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] Timestamp behaviour with modified local files
From: |
Ian Wienand |
Subject: |
[Bug-wget] Timestamp behaviour with modified local files |
Date: |
Tue, 28 Jul 2015 15:32:36 +1000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 |
Hi,
The manual says
"If the local file does not exist, or the sizes of the files do not
match, Wget will download the remote file no matter what the
time-stamps say."
In two cases I'm not seeing this:
1) With if-modified-since I don't believe the content-length is
checked at all
2) Without if-modified-since, if the remote end returns a 416 we don't
re-download if the file-on-disk is larger than the remote end.
Here's a quick example where we increase the size of the file
$ ./wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
$ truncate -s 10M cirros-0.3.4-x86_64-uec.tar.gz # modify the file size
So firstly, when using current git, we see the "If-Modified-Since"
request sent, but I guess the server does not look at "Range" because
it just returns 304, despite us asking for bytes the file doesn't
have. wget doesn't notice that the local file is a different size.
---
$ ./wget --debug --timestamping -c
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
Setting --timestamping (timestamping) to 1
Setting --continue (continue) to 1
DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu.
URI encoding = ‘UTF-8’
--2015-07-28 13:00:28--
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
Resolving download.cirros-cloud.net (download.cirros-cloud.net)...
69.163.241.114
Caching download.cirros-cloud.net => 69.163.241.114
Connecting to download.cirros-cloud.net
(download.cirros-cloud.net)|69.163.241.114|:80... connected.
Created socket 4.
Releasing 0x00000000014dc720 (new refcount 1).
---request begin---
GET /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1
If-Modified-Since: Tue, 28 Jul 2015 03:00:24 GMT
Range: bytes=10485760-
User-Agent: Wget/1.16.3.90-4e56a (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: download.cirros-cloud.net
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response... ---response begin---
HTTP/1.1 304 Not Modified
Date: Tue, 28 Jul 2015 03:00:30 GMT
Server: Apache
Connection: Keep-Alive
Keep-Alive: timeout=2, max=100
ETag: "848176-51580ae5ed140"
---response end---
304 Not Modified
Registered socket 4 for persistent reuse.
File ‘cirros-0.3.4-x86_64-uec.tar.gz’ not modified on server. Omitting
download.
---
Using --no-if-modified-since, we see the server does notice the range
and returns a 416 (Range Not Satisfiable).
---
$ ./wget --debug --no-if-modified-since --timestamping -c
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
Setting --timestamping (timestamping) to 1
Setting --continue (continue) to 1
DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu.
URI encoding = ‘UTF-8’
--2015-07-28 13:00:41--
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
Resolving download.cirros-cloud.net (download.cirros-cloud.net)...
69.163.241.114
Caching download.cirros-cloud.net => 69.163.241.114
Connecting to download.cirros-cloud.net
(download.cirros-cloud.net)|69.163.241.114|:80... connected.
Created socket 4.
Releasing 0x0000000000fbc6c0 (new refcount 1).
---request begin---
HEAD /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1
Range: bytes=10485760-
User-Agent: Wget/1.16.3.90-4e56a (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: download.cirros-cloud.net
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response... ---response begin---
HTTP/1.1 416 Requested Range Not Satisfiable
Date: Tue, 28 Jul 2015 03:00:41 GMT
Server: Apache
Vary: Accept-Encoding
Keep-Alive: timeout=2, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
---response end---
416 Requested Range Not Satisfiable
Registered socket 4 for persistent reuse.
URI content encoding = ‘iso-8859-1’
The file is already fully retrieved; nothing to do.
---
So this is due to [1] where, as the comment says
/* If `-c' is in use and the file has been fully downloaded (or
the remote file has shrunk), Wget effectively requests bytes
after the end of file and the server response with 416
(or 200 with a <= Content-Length. */
i.e. if the file on-disk & at the server is 150 bytes, then "-c" will
request from 150 onwards -- the server returns 416 and we assume the
file is downloaded. However, if the local file is 200 bytes, we
follow the same path but the assumption is now really invalid.
I think the first-case is more important; I think that with
If-Modified-Since the size-on-disk is not being accounted for at all.
Thanks,
-i
[1] http://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n3610