bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Timestamp behaviour with modified local files


From: Ian Wienand
Subject: Re: [Bug-wget] Timestamp behaviour with modified local files
Date: Wed, 29 Jul 2015 10:38:27 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 07/29/2015 05:35 AM, Ander Juaristi wrote:
> Thus, if the content hasn't been changed, the server just acts as if
> no Range header was sent.

> To me, the only sensible solution seems to be not to send
> If-Modified-Since when resuming downloads. Because if you send a
> conditional GET and the condition is met, the server will go no
> further.

So I think the issue is not just with continuation, although I had
that flag on as it is used in the script where the issue was noted.

In general the size of the file-on-disk is not checked with the
if-modified-since header

===
$ git describe
v1.16.3-90-g4e56a91

$ ./wget --debug --timestamping 
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
...
$ truncate -s 1M ./cirros-0.3.4-x86_64-uec.tar.gz

$ ./wget --debug --timestamping 
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
Setting --timestamping (timestamping) to 1
DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu.

URI encoding = ‘UTF-8’
--2015-07-29 09:34:25--  
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 
69.163.241.114
Caching download.cirros-cloud.net => 69.163.241.114
Connecting to download.cirros-cloud.net 
(download.cirros-cloud.net)|69.163.241.114|:80... connected.
Created socket 4.
Releasing 0x00000000007db700 (new refcount 1).

---request begin---
GET /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1
If-Modified-Since: Tue, 28 Jul 2015 23:34:12 GMT
User-Agent: Wget/1.16.3.90-4e56a (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: download.cirros-cloud.net
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 304 Not Modified
Date: Tue, 28 Jul 2015 23:34:25 GMT
Server: Apache
Connection: Keep-Alive
Keep-Alive: timeout=2, max=100
ETag: "848176-51580ae5ed140"

---response end---
304 Not Modified
Registered socket 4 for persistent reuse.
File ‘cirros-0.3.4-x86_64-uec.tar.gz’ not modified on server. Omitting download.
===

There's probably a strong argument that HTTP isn't the right way to be
checking the consistency of a local file to a remote one.  Even with
the old behaviour, just checking the content-length doesn't catch any
internal scrambling.  But it's good enough to catch interrupted
downloads, etc.

>> 2) Without if-modified-since, if the remote end returns a 416 we don't
>>     re-download if the file-on-disk is larger than the remote end.
>>
> Just thinking loudly... Maybe If-Range would be a solution here?

I don't think so, because from my reading this pairs with the Range
header, which will still be set to an invalid range due to [1] where

  /* If `-c' is in use and the file has been fully downloaded (or
     the remote file has shrunk), Wget effectively requests bytes
     after the end of file and the server response with 416
     (or 200 with a <= Content-Length.  */

i.e. if the file on-disk & at the server is 150 bytes, then "-c" will
request from 150 onwards -- the server returns 416 and we assume the
file is downloaded.  However, if the local file is 200 bytes, we
follow the same path but the assumption is now really invalid.

Admittedly this case of the local file being *larger* than the
remote-file is probably pretty obscure.

Thanks,

-i

[1] http://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n3610



reply via email to

[Prev in Thread] Current Thread [Next in Thread]