bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] --convert-links and filenames with colons


From: Joachim Breitner
Subject: [Bug-wget] --convert-links and filenames with colons
Date: Mon, 26 Oct 2015 13:42:41 +0100

Dear wget developers,

it seems that "wget -r -k" is a bit careless with creating relative
URLs that start with “something:”, which would then be mis-interpreted
as the protocol specification of an URL.

For example, downloading these two files:

/tmp/wget/input $ head *
==> file:with:colon.html <==
<html>
<body>
<a href="./file:with:colon.html">Foo</a>
<a href="./file_without_colon.html">Bar</a>
</body>
</html>

==> file_without_colon.html <==
<html>
<body>
<a href="./file:with:colon.html">Foo</a>
<a href="./file_without_colon.html">Bar</a>
</body>
</html>

with "wget -k -r" produces this output:

==> localhost:8000/file:with:colon.html <==
<html>
<body>
<a href="file:with:colon.html">Foo</a>
<a href="file_without_colon.html">Bar</a>
</body>
</html>

==> localhost:8000/file_without_colon.html <==
<html>
<body>
<a href="file:with:colon.html">Foo</a>
<a href="file_without_colon.html">Bar</a>
</body>
</html>

and the browser will not be able to follow the link to Foo.

This is a practical problem when trying to mirror a mediawiki
installation.
I suggest to avoid the issue by prepending relative links with "./",
either always (why not?), or when there relative file name started with
something that looks like “foo:”.


Thanks,
Joachim
-- 
Joachim “nomeata” Breitner
  address@hidden • http://www.joachim-breitner.de/
  Jabber: address@hidden  • GPG-Key: 0xF0FBF51F
  Debian Developer: address@hidden

Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]