bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Potential Wget bug


From: Ángel González
Subject: Re: [Bug-wget] Potential Wget bug
Date: Sun, 20 Apr 2014 22:08:00 +0200
User-agent: Thunderbird

On 20/04/14 17:19, A Kelly wrote:
Links to those pages are available from a link starting on the sites main
index page. e.g:

The page: www.secrethotels.eu/fabulous-rates-in-trendy-chelsea-4-star-london

has a link to it on: www.secrethotels.eu/london/page/9/

which was manually arrived at by starting at www.secrethotels.eu and
clicking the 'London' anchor text which links to
www.secrethotels.eu/london/then traversing the 'Older posts' anchor at
the bottom of that page.

Also httrack is able to fetch those pages.

If this does turn out to be a Wget bug or even if it doesn't and you find
out whats wrong, I would be incredibly grateful if I could be updated on
this matter.

Many thanks,
I confirm the behavior. Yes, I do think it is a wget bug.

I think the reason for wget becoming confused is related to the bar link going to http://www.secrethotels.eu/london (no slash), which ends up with a london *file* after the download, and no contents from its subfolders. However, when I tried to recreate a reduced testcase for this, I got the -also wrong- opposite behavior: london is converted into a subfolder and the contents of london are lost (Removing wget/london because of directory danger!).


Extract of download log:
www.secrethotels.eu/index.html: merge('http://www.secrethotels.eu/', 'http://www.secrethotels.eu/london') -> http://www.secrethotels.eu/london
appending 'http://www.secrethotels.eu/london' to urlpos.
Enqueuing http://www.secrethotels.eu/london at depth 1
Queue count 30, maxcount 30.
www.secrethotels.eu/london: merge('http://www.secrethotels.eu/london/', 'http://www.secrethotels.eu/london/page/2/') -> http://www.secrethotels.eu/london/page/2/
appending 'http://www.secrethotels.eu/london/page/2/' to urlpos.
Queue count 1043, maxcount 1068.
-- http://www.secrethotels.eu/london/page/2/
Found www.secrethotels.eu in host_name_addresses_map (0x7ddbe0)
Connecting to www.secrethotels.eu (www.secrethotels.eu)|178.79.171.89|:80... connected.
Created socket 4.
Releasing 0x00000000007ddbe0 (new refcount 1).

Registered socket 4 for persistent reuse.
URI content encoding = 'UTF-8'
Length: unspecified [text/html]
www.secrethotels.eu/london/page/2: Not a directorywww.secrethotels.eu/london/page/2/index.html: Not a directory
Disabling further reuse of socket 4.
Closed fd 4

Cannot write to 'www.secrethotels.eu/london/page/2/index.html' (Not a directory).
Dequeuing http://www.secrethotels.eu/london/ at depth 2
Queue count 1042, maxcount 1068.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]