bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Timestamp behaviour with modified local files


From: Ian Wienand
Subject: [Bug-wget] Timestamp behaviour with modified local files
Date: Tue, 28 Jul 2015 15:32:36 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1

Hi,

The manual says

  "If the local file does not exist, or the sizes of the files do not
   match, Wget will download the remote file no matter what the
   time-stamps say."

In two cases I'm not seeing this:

1) With if-modified-since I don't believe the content-length is
   checked at all

2) Without if-modified-since, if the remote end returns a 416 we don't
   re-download if the file-on-disk is larger than the remote end.

Here's a quick example where we increase the size of the file

$ ./wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
$ truncate -s 10M cirros-0.3.4-x86_64-uec.tar.gz # modify the file size

So firstly, when using current git, we see the "If-Modified-Since"
request sent, but I guess the server does not look at "Range" because
it just returns 304, despite us asking for bytes the file doesn't
have.  wget doesn't notice that the local file is a different size.

---
$ ./wget --debug --timestamping -c  
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
  Setting --timestamping (timestamping) to 1
  Setting --continue (continue) to 1
  DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu.

  URI encoding = ‘UTF-8’
  --2015-07-28 13:00:28--  
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
  Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 
69.163.241.114
  Caching download.cirros-cloud.net => 69.163.241.114
  Connecting to download.cirros-cloud.net 
(download.cirros-cloud.net)|69.163.241.114|:80... connected.
  Created socket 4.
  Releasing 0x00000000014dc720 (new refcount 1).

  ---request begin---
  GET /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1
  If-Modified-Since: Tue, 28 Jul 2015 03:00:24 GMT
  Range: bytes=10485760-
  User-Agent: Wget/1.16.3.90-4e56a (linux-gnu)
  Accept: */*
  Accept-Encoding: identity
  Host: download.cirros-cloud.net
  Connection: Keep-Alive

  ---request end---
  HTTP request sent, awaiting response...   ---response begin---
  HTTP/1.1 304 Not Modified
  Date: Tue, 28 Jul 2015 03:00:30 GMT
  Server: Apache
  Connection: Keep-Alive
  Keep-Alive: timeout=2, max=100
  ETag: "848176-51580ae5ed140"

  ---response end---
  304 Not Modified
  Registered socket 4 for persistent reuse.
  File ‘cirros-0.3.4-x86_64-uec.tar.gz’ not modified on server. Omitting 
download.
---

Using --no-if-modified-since, we see the server does notice the range
and returns a 416 (Range Not Satisfiable).

---
$ ./wget --debug --no-if-modified-since --timestamping -c  
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
  Setting --timestamping (timestamping) to 1
  Setting --continue (continue) to 1
  DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu.

  URI encoding = ‘UTF-8’
  --2015-07-28 13:00:41--  
http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz
  Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 
69.163.241.114
  Caching download.cirros-cloud.net => 69.163.241.114
  Connecting to download.cirros-cloud.net 
(download.cirros-cloud.net)|69.163.241.114|:80... connected.
  Created socket 4.
  Releasing 0x0000000000fbc6c0 (new refcount 1).

  ---request begin---
  HEAD /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1
  Range: bytes=10485760-
  User-Agent: Wget/1.16.3.90-4e56a (linux-gnu)
  Accept: */*
  Accept-Encoding: identity
  Host: download.cirros-cloud.net
  Connection: Keep-Alive

  ---request end---
  HTTP request sent, awaiting response...   ---response begin---
  HTTP/1.1 416 Requested Range Not Satisfiable
  Date: Tue, 28 Jul 2015 03:00:41 GMT
  Server: Apache
  Vary: Accept-Encoding
  Keep-Alive: timeout=2, max=100
  Connection: Keep-Alive
  Content-Type: text/html; charset=iso-8859-1

  ---response end---
  416 Requested Range Not Satisfiable
  Registered socket 4 for persistent reuse.
  URI content encoding = ‘iso-8859-1’

      The file is already fully retrieved; nothing to do.
---

So this is due to [1] where, as the comment says

  /* If `-c' is in use and the file has been fully downloaded (or
     the remote file has shrunk), Wget effectively requests bytes
     after the end of file and the server response with 416
     (or 200 with a <= Content-Length.  */

i.e. if the file on-disk & at the server is 150 bytes, then "-c" will
request from 150 onwards -- the server returns 416 and we assume the
file is downloaded.  However, if the local file is 200 bytes, we
follow the same path but the assumption is now really invalid.

I think the first-case is more important; I think that with
If-Modified-Since the size-on-disk is not being accounted for at all.

Thanks,

-i

[1] http://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n3610



reply via email to

[Prev in Thread] Current Thread [Next in Thread]