bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: Wget -c option possible errors ...


From: address@hidden
Subject: Fwd: Wget -c option possible errors ...
Date: Fri, 21 Jan 2022 15:51:28 +0200 (SAST)


----- Forwarded Message -----
From: gerdd@mweb.co.za
To: "Tim Ruehsen" <tim.ruehsen@gmx.de>
Sent: Friday, January 21, 2022 1:59:41 PM
Subject: Re: Wget

I happen to have an old XP with a wget 1.9.1 installed (and used a lot in its 
day), which, I presume predates the 1.11.4 (9 < 11 if numeric?) That machine 
will be tasked with running my garden's irrigation, once I can convince it to 
stay up long enough. 

In the meantime I could try to run a few tests. But let me share a few 
experiences with the -c feature, which may (or may not) help explaining what 
you see: 

I used wget extensively over the years and a lot of the work was downloading 
large-ish media files and/or packed archives. Both types would be very 
unforgiving with corrupt files. 

As far as I can see the -c function just tells the server that you want the 
download to start at byte x (which is the next byte after the last byte 
received in your previous download. 

These are the possible outcomes that I have seen: 

1) the server does as instructed and all fits together and all is good. 
2) the server looks at the file to download and returns a verdict that what you 
have is as long or longer than the file it has. Nothing more happens. Either 
your file is already complete or the server copy was replaced by a shorter one, 
in which case you might want to download the whole new version - or you will be 
stuck with an incomplete copy of the previous version. 
3) the server is stupid, doesn't know the "from" function and gives you the 
whole file instead without comment and wget stitches it on to the existing 
file, resulting in a corrupt file. 
4) a new version of the file has been put on the server, which is longer, but 
different. wget will receive the tail end of this new file and stitch it on to 
the incomplete file it already has, quite likely resulting in a corrupt file. 

To prevent the error conditions you may need to compare file timestamps (not 
necessarily 100% sure but a good test in any event.) 

Or wget gets a new feature that downloads a configurable size chunk of the 
already downloaded file to compare and continue the download only if the chunk 
matches its counterpart in the existing fragment. Otherwise a configurable 
option could be used to download the whole file or give an error message; the 
fresh download could either overwrite the existing fragment or start with a new 
name (as in the --no clobber function, for instance.) 

You might consider some of these changes for wget2 only. (Incidentally, I'll 
have to scour around one of these days for an executable of wget2 for Windows 
one of these days ...)

One thing I have never done is to reboot during a download - but I have had 
power dropped on me in the middle of one often. Static files were regularly 
completed correctly when the download was resumed (provided the server in 
question was up to it, which most of them seem to be nowadays ...)

In hopes that this is useful ...

Gerd Diederichs





----- Original Message -----
From: "Tim Ruehsen" <tim.ruehsen@gmx.de>
To: "Дмитрий Дмитрий" <kmb697@yandex.ua>, "bug-wget" <bug-wget@gnu.org>
Sent: Friday, January 21, 2022 11:56:00 AM
Subject: Re: Wget

Hi,

I guess nobody even tries to reproduce the issue as nobody uses XP or 
the old wget 1.11.4. For example, I don't even have a Windows license 
and thus no Windows installed.

To get better feedback from other users, I would suggest

- update to the latest wget (hundreds of bugs have been fixed 
meanwhile). Static binaries for 32/64 bit Windows can be found at 
https://eternallybored.org/misc/wget/.

- try to reproduce the problem with a minimal set of command line 
options (else others have to do that, and that will costs other people's 
time)

- provide exact steps to reproduce

Without the above, me and others can only guess what is happening.

E.g. pressing reboot may result in unwanted bytes in a file and an 
inconsistent file system. Download continuation is based on the file 
size, not the contents. Wget has no possibility to see if the existing 
file contents are correct or not - it can only see if bytes are missing 
and download+append the missing bytes. Wget also doesn't see if the file 
on the server has been changed or not.

In short: continuation is not reliable.
If you need a byte-exact download, make sure the provider (server) also 
provides a checksum so that you can verify your downloaded file. Without 
it, better don't use -c.

Also think of possible MITM attacks: try to avoid plain text HTTP - use 
HTTPS instead.

Regards, Tim

On 21.01.22 04:03, Дмитрий Дмитрий wrote:
> I am russian.
> Excuse me for my English.
> 
> I used old version wget-1.11.4 several years ago.
> I noticed what sometimes happens download errors.
> Wget incorrect getting continue a partially-downloaded file (option -c).
> 
> I had two versions this error.
> In one case several bites were incorrect.
> In other case was change size of file. File got other size (more or less as 
> original size).
> This happened when wget was close reboot of computer (reboots'button).
> Wget could not correct continue download a file.
> Errors didn't happen always.
> Sometimes.
> 
> Because of it if I didn't have checksum (md5 for example) I must downloaded 
> files two time.
> And compared its.
> 
> About it I wrote here.
> https://lists.gnu.org/archive/html/bug-wget/2021-03/msg00025.html
> I think I was not understanded.
> When You don't understand me - let's ask me.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]