bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip


From: Tim Rühsen
Subject: Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip-compressed tarballs (HTTP-header dependent)
Date: Wed, 01 Nov 2017 19:57:16 +0100
User-agent: KMail/5.2.3 (Linux/4.13.0-1-amd64; KDE/5.37.0; x86_64; ; )

Hi Jens,

On Mittwoch, 1. November 2017 17:27:58 CET Jens Schleusener wrote:
> Hi,
> 
> the new "wget" release 1.19.2 has got a new feature:
> 
>   "gzip Content-Encoding decompression"
> 
> But that feature - at least for my self-conmpiled binary - leads to a
> problem if one downloads gzip-compressed tarballs from sites that send for
> e.g. an HTTP response header containing lines like
> 
>   Content-Type: application/x-tar
>   Content-Encoding: gzip

You describe clearly a broken server behavior. 

> 
> In that cases wget saves a downloaded gzip-compressed tarball now
> decompressed (!) what probably breaks a lot of scripts.

Not sure why anyone relies on broken behavior. What if the broken server 
configuration becomes fixed ? Then your script breaks as well.

> Additionally the
> tarball is saved nevertheless under a filename with the "tar.gz" extension
> and not with the "tar" extension.

At least on *nix, the file extension says nothing about the content. That is 
why we have the mime-type stated in Content-Type. 'x-tar' clearly is a non-
compressed tar file. Content-Encoding: gzip means that the data has been 
compressed for transportation purposes only.

Anyways, whatever we do - it will be broken on some servers and on others not.

> Solutions/workarounds may be on affected servers the delivering of an
> alternative HTTP header like
> 
>   Content-Type: application/x-gzip
>   (or Content-Type: application/octet-stream)
> 
> or on the client side the use of the new "wget" option
> 
>   --compression=none
> 
> But maybe it would be better if for such cases wget would revert its
> default behaviour to the old one. Or is the described behaviour the
> expected one?

Correct server behavior here would be:
Content-Type: application/gzip
together with Content-Encoding: identity, which also may be omitted since it's 
the default.

A good explanation is here:
https://superuser.com/questions/901962/what-is-the-correct-mime-type-for-a-tar-gz-file


We can discuss a proposal for a work-around that handles both cases, like
if Content-Encoding == gzip and filename ends with .gz then don't uncompress.

Caveat: this may break our --xattr feature, which saves the mime type with the 
file. And then we have to adjust the mime type as well - and that could be 
really tedious.

Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]