[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Bug in <meta name="robots" content="nofollow" />

From: Augustin, Stefan
Subject: [Bug-wget] Bug in <meta name="robots" content="nofollow" />
Date: Thu, 4 Mar 2010 15:54:19 +0100


I want to crawle a web site which uses
 <meta name="robots" content="nofollow" />
in the HTML HEAD,
which should be XTHML instead of plain HTML.
But wget seems to ignore this control information.

Unfortunately, I can't change the code in the HTML pages of this web server.
Can somebody help me?
- is it a bug (or not implemented feature) in wget?
- if so, is there a fix available?

Best regards
Stefan Augustin

Siemens AG
Corporate Technology
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 (89) 636-47061 
Fax: +49 (89) 636-49438 
Mobil: +49 (172) 8455616
mailto:address@hidden  <mailto:address@hidden> 

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; 
Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe 
Kaeser, Barbara Kux, Hermann Requardt, Siegfried Russwurm, Peter Y. Solmssen; 
Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht: Berlin 
Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322

reply via email to

[Prev in Thread] Current Thread [Next in Thread]