bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] --header="Accept-encoding: gzip"


From: andreas wpv
Subject: Re: [Bug-wget] --header="Accept-encoding: gzip"
Date: Wed, 23 Sep 2015 21:09:37 -0500

Thanks for the insights. and for working on the next version.
andreas

On Wed, Sep 23, 2015 at 3:10 AM, Tim Ruehsen <address@hidden> wrote:

> > wget --user-agent "Mozilla/5.0 (Windows NT x.y; WOW64; rv:10.0)
> > Gecko/20100101 Firefox/10.0" -e robots=off --header="accept-encoding:
> gzip
> > " -p -H "www.google.com"
> >
> > Still only gives me 52 kb! and one file: index.html
> >
> > So, accept encoding seems to work, but only for the main file?
>
> As Ángel said, the main file is gzipped but wget can't parse it.
> That's why you just get one file (index.html). (This file could be named
> index.html.gz to reflect the content.)
> You could manually gzip -d it and feed the resulting HTML file to wget
> manually, like wget -r --force-html --input-file index.html --base
> www.google.com
>
> There have been patches to support gzip encoding, but either they were
> half-
> baken or the authors did not sign the FSF copyright assignment.
>
> *Note*
> [Meanwhile, we are working on wget2. Content encodings like gzip and
> deflate
> are already built in here. Also lzma and bzip2 for even better compression
> (but servers don't support it out-of-the-box yet).]
>
> Regards, Tim
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]