Re: [Bug-wget] Difficulty downloading a site from archive.org

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Difficulty downloading a site from archive.org

From:	Micah Cowan
Subject:	Re: [Bug-wget] Difficulty downloading a site from archive.org
Date:	Sat, 13 Aug 2011 09:39:01 -0700
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11

On 08/12/2011 11:56 AM, phil curb wrote:

I've been looking at downloading a site that's on archive.org

Archive.org's TOS on their website expressly forbids the use of"downloading agents", and names wget explicitly.

All URLs on archive.org always point at the _original_ (either modern,or nonexistent) locations they pointed to when they were archived. Theselinks are pretty much never the ones you want. Then they embed someJavaScript that goes through and rewrites all these URLs to point atarchive.org. This means that in a browser, you'll see the "correct" URLswhen you hover, and when you click to follow.

The problem of course is that tools like wget won't run the script, sothe original (useless) URLs remain, and it tries to follow these. Notreally a lot you can do about it without rolling up your sleeves andhacking around the problem. But as I say, their TOS forbids you fromaccessing their site with wget anyway... they want you to always usetheir site directly.

(I'd be interested in knowing whether folks actually have legalobligations to respect TOS to an unrestricted-access site like that... Iimagine it might even vary by location)


--
Micah J. Cowan
http://micah.cowan.name/

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] Difficulty downloading a site from archive.org, phil curb, 2011/08/13
- [Bug-wget] Difficulty downloading a site from archive.org, phil curb, 2011/08/13
  - Re: [Bug-wget] Difficulty downloading a site from archive.org, Micah Cowan <=
    - Re: [Bug-wget] Difficulty downloading a site from archive.org, Tony Lewis, 2011/08/14
    - Re: [Bug-wget] Difficulty downloading a site from archive.org, Micah Cowan, 2011/08/14

Prev by Date: Re: [Bug-wget] [wget 1.13] [configure error] Forcing to use GnuTLS? --with-ssl was given, but GNUTLS is not available
Next by Date: Re: [Bug-wget] Difficulty downloading a site from archive.org
Previous by thread: [Bug-wget] Difficulty downloading a site from archive.org
Next by thread: Re: [Bug-wget] Difficulty downloading a site from archive.org
Index(es):
- Date
- Thread