bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] --header="Accept-encoding: gzip"


From: Tim Ruehsen
Subject: Re: [Bug-wget] --header="Accept-encoding: gzip"
Date: Wed, 23 Sep 2015 10:10:12 +0200
User-agent: KMail/4.14.10 (Linux/4.1.0-2-amd64; KDE/4.14.12; x86_64; ; )

> wget --user-agent "Mozilla/5.0 (Windows NT x.y; WOW64; rv:10.0)
> Gecko/20100101 Firefox/10.0" -e robots=off --header="accept-encoding: gzip
> " -p -H "www.google.com"
> 
> Still only gives me 52 kb! and one file: index.html
> 
> So, accept encoding seems to work, but only for the main file?

As Ángel said, the main file is gzipped but wget can't parse it.
That's why you just get one file (index.html). (This file could be named 
index.html.gz to reflect the content.)
You could manually gzip -d it and feed the resulting HTML file to wget 
manually, like wget -r --force-html --input-file index.html --base 
www.google.com

There have been patches to support gzip encoding, but either they were half-
baken or the authors did not sign the FSF copyright assignment.

*Note*
[Meanwhile, we are working on wget2. Content encodings like gzip and deflate 
are already built in here. Also lzma and bzip2 for even better compression 
(but servers don't support it out-of-the-box yet).]

Regards, Tim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]