[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip
From: |
Tim Rühsen |
Subject: |
Re: [Bug-wget] New wget (1.19.2): Unexpected download behaviour for gzip-compressed tarballs (HTTP-header dependent) |
Date: |
Fri, 3 Nov 2017 10:00:25 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 |
On 11/03/2017 06:37 AM, James Cloos wrote:
>>>>>> "TR" == Tim Rühsen <address@hidden> writes:
>
> TR> I downloaded/tested thousands of web pages and they behave as if 'Content-
> TR> Encoding: gzip' is a compression for the transport. Uncompressing it
> 'on-the-
> TR> fly' and saving that uncompressed data was the correct behavior.
>
> Lots of servers have that misconfiguration; it was recommended in the
> past and apache defaulted to doing that when grabbing things like tar.gz.
>
> The gui browsers had to learn to work around that misconfig. wget also
> has to.
>
> In short, do not uncompress if the destination name has a compression
> suffix.
>
> Or, in that case, test whether the uncompressed data starts with gzip
> magic and complete one decompression if so, non if not so.
>
> And the same for the other compression formats.
Thanks for this insight !
Looking at the Mozilla/Gecko sources shows that gzip Content-Encoding is
just cleared for Content-Types application/x-gzip, application/gzip and
application/x-gunzip. That makes it straight forward to go that way.
With Best Regards, Tim
signature.asc
Description: OpenPGP digital signature