[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Potential Wget bug
From: |
Ángel González |
Subject: |
Re: [Bug-wget] Potential Wget bug |
Date: |
Sun, 20 Apr 2014 22:08:00 +0200 |
User-agent: |
Thunderbird |
On 20/04/14 17:19, A Kelly wrote:
Links to those pages are available from a link starting on the sites main
index page. e.g:
The page: www.secrethotels.eu/fabulous-rates-in-trendy-chelsea-4-star-london
has a link to it on: www.secrethotels.eu/london/page/9/
which was manually arrived at by starting at www.secrethotels.eu and
clicking the 'London' anchor text which links to
www.secrethotels.eu/london/then traversing the 'Older posts' anchor at
the bottom of that page.
Also httrack is able to fetch those pages.
If this does turn out to be a Wget bug or even if it doesn't and you find
out whats wrong, I would be incredibly grateful if I could be updated on
this matter.
Many thanks,
I confirm the behavior. Yes, I do think it is a wget bug.
I think the reason for wget becoming confused is related to the bar link
going to http://www.secrethotels.eu/london (no slash), which ends up
with a london *file* after the download, and no contents from its
subfolders.
However, when I tried to recreate a reduced testcase for this, I got the
-also wrong- opposite behavior: london is converted into a subfolder and
the contents of london are lost (Removing wget/london because of
directory danger!).
Extract of download log:
www.secrethotels.eu/index.html: merge('http://www.secrethotels.eu/',
'http://www.secrethotels.eu/london') -> http://www.secrethotels.eu/london
appending 'http://www.secrethotels.eu/london' to urlpos.
Enqueuing http://www.secrethotels.eu/london at depth 1
Queue count 30, maxcount 30.
www.secrethotels.eu/london:
merge('http://www.secrethotels.eu/london/',
'http://www.secrethotels.eu/london/page/2/') ->
http://www.secrethotels.eu/london/page/2/
appending 'http://www.secrethotels.eu/london/page/2/' to urlpos.
Queue count 1043, maxcount 1068.
-- http://www.secrethotels.eu/london/page/2/
Found www.secrethotels.eu in host_name_addresses_map (0x7ddbe0)
Connecting to www.secrethotels.eu
(www.secrethotels.eu)|178.79.171.89|:80... connected.
Created socket 4.
Releasing 0x00000000007ddbe0 (new refcount 1).
Registered socket 4 for persistent reuse.
URI content encoding = 'UTF-8'
Length: unspecified [text/html]
www.secrethotels.eu/london/page/2: Not a
directorywww.secrethotels.eu/london/page/2/index.html: Not a directory
Disabling further reuse of socket 4.
Closed fd 4
Cannot write to 'www.secrethotels.eu/london/page/2/index.html' (Not a
directory).
Dequeuing http://www.secrethotels.eu/london/ at depth 2
Queue count 1042, maxcount 1068.