bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls hel


From: Micah Cowan
Subject: Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls help
Date: Tue, 11 Nov 2008 12:27:05 -0800
User-agent: Thunderbird 2.0.0.17 (X11/20080914)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ben Smith wrote:

> Subject: Re: [Bug-wget] Re: Bug-wget Digest, Vol 1, Issue 10

>> When replying, please edit your Subject line so it is more specific
>>  than "Re: Contents of Bug-wget digest..."

It's helpful if you adhere to this guideline; otherwise it's hard to
follow threads. (I've fixed the subject in my reply.)

> It would be theoretically possible by using grep and sed to strip out
> the links to the cached files and piping that to wget.  However,
> Google appears to block access to results pages and cached pages via
> wget.  I tried to download several using wget and got a 403 Forbidden
> response.

http://wget.addictivecode.org/FrequentlyAskedQuestions#not-downloading
should be helpful for such problems (using -U is the most applicable
suggestion, but you may also run into the others). Please also consider
adding --limit-rate or --wait.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJGeqZ7M8hyUobTrERAnb3AJ9QExH/DgExUu+9TMVLMzyEcXGLQgCeIwYf
//x+tvr1nFsS978kVWX75cg=
=tZzE
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]