bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] can't get wget to not download


From: Paul Wratt
Subject: Re: [Bug-wget] can't get wget to not download
Date: Sun, 18 Mar 2012 23:54:38 +1300

all but that last sentence is true (as of 1.13)

in fact wget can only "convert links" OR "use appropriate chars for
platform in filename", not both (as with 1.12)

yes wget can record filenames unbrowsable on some platforms - no you
can no longer set a broad platform safe filename (win) AND convert
links

the only fall-through here is when the webpage explicitly uses
relative links, and/or no query string in href (? is illegal on
windows fs)

Paul

On Sun, Mar 18, 2012 at 2:35 PM, Henrik Holst
<address@hidden> wrote:
> Wget only obeys robots.txt when doing a full recursive download of a
> complete site:
>
> "
>       Wget can follow links in HTML, XHTML, and CSS pages, to create local
>       versions of remote web sites, fully recreating the directory
> structure
>       of the original site.  This is sometimes referred to as "recursive
>       downloading."  While doing that, Wget respects the Robot Exclusion
>       Standard (/robots.txt).  Wget can be instructed to convert the links
> in
>       downloaded files to point at the local files, for offline viewing.
> "
>
> /HH
>
> 2012/3/16 phil curb <address@hidden>
>
>> i've made a file robots.txt but wget doesn't seem to be responding to it.
>> it always downloads.
>> http://pastebin.com/raw.php?i=kt1mV2af
>>
>>
>> C:\r>wget 127.0.0.1:56
>> --2012-03-16 19:45:32--  http://127.0.0.1:56/
>> Connecting to 127.0.0.1:56... connected.
>> HTTP request sent, awaiting response... 200 OK
>> Length: 3 [text/html]
>> Saving to: `index.html' 100%[======================================>] 3
>>         --.-K/s   in 0s 2012-03-16 19:45:32 (20.0 KB/s) - `index.html'
>> saved [3/3] C:\r>wget 127.0.0.1:56/robots.txt
>> --2012-03-16 19:45:43--  http://127.0.0.1:56/robots.txt
>> Connecting to 127.0.0.1:56... connected.
>> HTTP request sent, awaiting response... 200 OK
>> Length: 26 [text/plain]
>> Saving to: `robots.txt' 100%[======================================>] 26
>>        --.-K/s   in 0s 2012-03-16 19:45:43 (175 KB/s) - `robots.txt' saved
>> [26/26] C:\r>type robots.txt
>> User-agent: *
>> Disallow: /
>> C:\r>
>>
>>
>>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]