[Bug-wget] Re: Bug-wget Digest, Vol 1, Issue 10

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Re: Bug-wget Digest, Vol 1, Issue 10

From:	Ben Smith
Subject:	[Bug-wget] Re: Bug-wget Digest, Vol 1, Issue 10
Date:	Tue, 11 Nov 2008 11:25:13 -0800 (PST)

It would be theoretically possible by using grep and sed to strip out the links 
to the cached files and piping that to wget.  However, Google appears to block 
access to results pages and cached pages via wget.  I tried to download several 
using wget and got a 403 Forbidden response.



----- Original Message ----
> From: "address@hidden" <address@hidden>
> To: address@hidden
> Sent: Tuesday, November 11, 2008 12:05:02 PM
> Subject: Bug-wget Digest, Vol 1, Issue 10
> 
> Send Bug-wget mailing list submissions to
>     address@hidden
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>     http://lists.gnu.org/mailman/listinfo/bug-wget
> or, via email, send a message with subject or body 'help' to
>     address@hidden
> 
> You can reach the person managing the list at
>     address@hidden
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bug-wget digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Fwd: Trying to download HTML from Google's Cache.    Pls
>       help (Micah Cowan)
>    2. Re: Fwd: Trying to download HTML from Google's Cache. Pls
>       help (Yan Grossman)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 10 Nov 2008 15:22:45 -0800
> From: Micah Cowan 
> Subject: Re: [Bug-wget] Fwd: Trying to download HTML from Google's
>     Cache.    Pls help
> To: Yan Grossman 
> Cc: address@hidden
> Message-ID: <address@hidden>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Yan Grossman wrote:
> > Hi, sorry, I didn't get it myself and didn't find on archives so I sent
> > it again. Sorry, I will just wait next time.
> > 
> > Anyway, I did read the manual and got a pretty good understanding of how
> > it works. I could probably run it now pointed to a regular domain and
> > download the files.
> > 
> > But in this case, I am trying to download from Googles Cache, so I can't
> > use my domain. I think I need to go through Google domain to mine. Do
> > you know what I mean?
> > 
> > here is how I see all my pages on google cache
> > 
> > site:www.snowbrasil.com/fotos 
> > 
> > so you see there are about 500 pages. But I can't do a wget
> > on www.snowbrasil.com/fotos cause
> > those are the exact pages I lost. They are not on my server anymore. So
> > I am trying to get from google cache.
> 
> That sounds more like a Google question than a Wget question, then.
> 
> Just find the cache pages you want, and see what Google's URL for them
> is, and feed that to Wget.
> 
> > I would appreciate if you can suggest what command options to use.
> 
> That depends greatly on what you want Wget to do with your pages; I
> can't really help you there without more information.
> 
> - --
> Micah J. Cowan
> Programmer, musician, typesetting enthusiast, gamer.
> GNU Maintainer: wget, screen, teseq
> http://micah.cowan.name/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFJGMJE7M8hyUobTrERAspMAJoD9XzdbteHavQDD+2C2vxCF7DT2ACdHgsA
> 1M4iypEUaLMwUBNEMFT/G0w=
> =dd0x
> -----END PGP SIGNATURE-----
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 10 Nov 2008 20:22:20 -0800
> From: "Yan Grossman" 
> Subject: Re: [Bug-wget] Fwd: Trying to download HTML from Google's
>     Cache. Pls    help
> To: address@hidden
> Message-ID:
>     
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hi, is there anybody that can help me with that?
> 
> Anyway, I did read the manual and got a pretty good understanding of how it
> works. I could probably run it now pointed to a regular domain and download
> the files.
> 
> But in this case, I am trying to download from Googles Cache, so I can't use
> my domain. I think I need to go through Google domain to mine. Do you know
> what I mean?
> 
> here is how I see all my pages on google cache
> 
> site:www.snowbrasil.com/fotos
> 
> so you see there are about 500 pages. But I can't do a wget on
> www.snowbrasil.com/fotos cause those are the exact pages I lost. They are
> not on my server anymore. So I am trying to get from google cache.
> 
> I would like to save and download those files. The HTML files only.As if I
> > was going into each cached page and saving the HTML, but instead of doing
> > one at a time I would use wget to do it in batch.
> >
> > Thanks
> >
> >
> > On Mon, Nov 10, 2008 at 3:22 PM, Micah Cowan wrote:
> >
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Yan Grossman wrote:
> >> > Hi, sorry, I didn't get it myself and didn't find on archives so I sent
> >> > it again. Sorry, I will just wait next time.
> >> >
> >> > Anyway, I did read the manual and got a pretty good understanding of how
> >> > it works. I could probably run it now pointed to a regular domain and
> >> > download the files.
> >> >
> >> > But in this case, I am trying to download from Googles Cache, so I can't
> >> > use my domain. I think I need to go through Google domain to mine. Do
> >> > you know what I mean?
> >> >
> >> > here is how I see all my pages on google cache
> >> >
> >> > site:www.snowbrasil.com/fotos 
> >> >
> >> > so you see there are about 500 pages. But I can't do a wget
> >> > on www.snowbrasil.com/fotos cause
> >> > those are the exact pages I lost. They are not on my server anymore. So
> >> > I am trying to get from google cache.
> >>
> >> That sounds more like a Google question than a Wget question, then.
> >>
> >> Just find the cache pages you want, and see what Google's URL for them
> >> is, and feed that to Wget.
> >>
> >> > I would appreciate if you can suggest what command options to use.
> >>
> >> That depends greatly on what you want Wget to do with your pages; I
> >> can't really help you there without more information.
> >>
> >> - --
> >> Micah J. Cowan
> >> Programmer, musician, typesetting enthusiast, gamer.
> >> GNU Maintainer: wget, screen, teseq
> >> http://micah.cowan.name/
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.7 (GNU/Linux)
> >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >>
> >> iD8DBQFJGMJE7M8hyUobTrERAspMAJoD9XzdbteHavQDD+2C2vxCF7DT2ACdHgsA
> >> 1M4iypEUaLMwUBNEMFT/G0w=
> >> =dd0x
> >> -----END PGP SIGNATURE-----
> >>
> >>
> >>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: 
> http://lists.gnu.org/pipermail/bug-wget/attachments/20081110/c2bf36a4/attachment.html
> 
> ------------------------------
> 
> _______________________________________________
> Bug-wget mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/bug-wget
> 
> 
> End of Bug-wget Digest, Vol 1, Issue 10
> ***************************************

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] Re: Bug-wget Digest, Vol 1, Issue 10, Ben Smith <=
- Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls help, Micah Cowan, 2008/11/11

Prev by Date: Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls help
Next by Date: Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls help
Previous by thread: [Bug-wget] Re: Pleease add support --load-cookies-sqllite in WGET !!!
Next by thread: Re: [Bug-wget] Fwd: Trying to download HTML from Google's Cache. Pls help
Index(es):
- Date
- Thread