On Mittwoch, 1. November 2017 17:27:58 CET Jens Schleusener wrote:
Hi,
the new "wget" release 1.19.2 has got a new feature:
"gzip Content-Encoding decompression"
But that feature - at least for my self-conmpiled binary - leads to a
problem if one downloads gzip-compressed tarballs from sites that send for
e.g. an HTTP response header containing lines like
Content-Type: application/x-tar
Content-Encoding: gzip
You describe clearly a broken server behavior.
In that cases wget saves a downloaded gzip-compressed tarball now
decompressed (!) what probably breaks a lot of scripts.
Not sure why anyone relies on broken behavior. What if the broken server
configuration becomes fixed ? Then your script breaks as well.
Additionally the
tarball is saved nevertheless under a filename with the "tar.gz" extension
and not with the "tar" extension.
At least on *nix, the file extension says nothing about the content. That is
why we have the mime-type stated in Content-Type. 'x-tar' clearly is a non-
compressed tar file. Content-Encoding: gzip means that the data has been
compressed for transportation purposes only.
Solutions/workarounds may be on affected servers the delivering of an
alternative HTTP header like
Content-Type: application/x-gzip
(or Content-Type: application/octet-stream)
or on the client side the use of the new "wget" option
--compression=none
But maybe it would be better if for such cases wget would revert its
default behaviour to the old one. Or is the described behaviour the
expected one?
Correct server behavior here would be:
Content-Type: application/gzip
together with Content-Encoding: identity, which also may be omitted since it's
the default.
A good explanation is here:
https://superuser.com/questions/901962/what-is-the-correct-mime-type-for-a-tar-gz-file
We can discuss a proposal for a work-around that handles both cases, like
if Content-Encoding == gzip and filename ends with .gz then don't uncompress.
Caveat: this may break our --xattr feature, which saves the mime type with the
file. And then we have to adjust the mime type as well - and that could be
really tedious.