bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Bug in <meta name="robots" content="nofollow" />


From: Micah Cowan
Subject: Re: [Bug-wget] Bug in <meta name="robots" content="nofollow" />
Date: Thu, 04 Mar 2010 14:52:55 -0800
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

Augustin, Stefan wrote:
> Hello,
> 
> I want to crawle a web site which uses
>  <meta name="robots" content="nofollow" />
> in the HTML HEAD,
> which should be XTHML instead of plain HTML.
> But wget seems to ignore this control information.
> 
> Unfortunately, I can't change the code in the HTML pages of this web server.

If I understand you correctly, I think you meant that "wget seems to
obey this control information", otherwise, what would be preventing you
from crawling a web site?

Have a look at
http://wget.addictivecode.org/FrequentlyAskedQuestions#robots for the
solution.

-- 
Micah J. Cowan
http://micah.cowan.name/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]