Re: [Bug-wget] --page-requisites and robot exclusion issue

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] --page-requisites and robot exclusion issue

From:	Paul Wratt
Subject:	Re: [Bug-wget] --page-requisites and robot exclusion issue
Date:	Tue, 6 Dec 2011 01:46:56 +1300

if it does not obey - server admins will ban it

the work around:
1) get single html file first - edit out meta tag - re-get with
--no-clobber (usually only in landing pages)
2) empty robots.txt (or allow all - search net)

possible solutions:
A) command line option
B) ./configure --disable-robots-check

Paul

On Mon, Dec 5, 2011 at 10:33 AM, Giuseppe Scrivano <address@hidden> wrote:
> address@hidden writes:
>
>> But in cases where you *are* recursively downloading and using
>> --page-requisites, it would be polite to otherwise obey the robots
> 5B> exclusion standard by default. Which you can't do if you have to use -e
>> robots=off to ensure all requisites are downloaded.
>
> it seems a good idea to handle -r and --page-requisites in this case,
> wget shouldn't obbey the robots exclusion directives.
>
> Thanks,
> Giuseppe
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] --page-requisites and robot exclusion issue, markk, 2011/12/04
- Re: [Bug-wget] --page-requisites and robot exclusion issue, Giuseppe Scrivano, 2011/12/04
  - Re: [Bug-wget] --page-requisites and robot exclusion issue, Paul Wratt <=
    - Re: [Bug-wget] --page-requisites and robot exclusion issue, markk, 2011/12/05
    - Re: [Bug-wget] --page-requisites and robot exclusion issue, Giuseppe Scrivano, 2011/12/05

Prev by Date: Re: [Bug-wget] --page-requisites and robot exclusion issue
Next by Date: [Bug-wget] error message
Previous by thread: Re: [Bug-wget] --page-requisites and robot exclusion issue
Next by thread: Re: [Bug-wget] --page-requisites and robot exclusion issue
Index(es):
- Date
- Thread