[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] wget properly downloads pages and files, but does not lin
Re: [Bug-wget] wget properly downloads pages and files, but does not link some files while correctly linking others
Mon, 17 Aug 2009 13:05:08 -0700
Thunderbird 184.108.40.206 (X11/20090608)
-----BEGIN PGP SIGNED MESSAGE-----
David Mcconnell wrote:
> In testing wget against a web site that I created using googlepages, some
> pages do not have some of the appropriate files linked in the download even
> though the files themselves are downloaded.
> My web site is:
> The one of the pages in which all but one of the pictures fails to link is:
> Only the last picture on the page actually links while all of the pictures
> download. This happens on other pages as well. However, solving the problem
> for this page will no doubt solve the problem on the other pages. It may
> very well be that there is some sort of nuance in the web page code that is
> the source of the problem, but I've not been able to locate it.
> I should add that all pictures correctly display on the web site.
> Any help would be most appreciated.
In the future, please be specific about the version of Wget you're
using, and the options you specified to it. However, in this specific
case I was able to reproduce the problem.
The trouble appears to be that wget downloads the relative link,
"cartoon21.jpg/cartoon21-medium;init:.jpg", but doesn't percent-encode
the ; or : (it seems to be the semicolon specifically that causes the
issue for me). While the semicolon is allowed to appear unescaped in
that location by the generic URI syntax defined by RFC 3986, RFC 1738
seems to have left it out of the specific syntax for "file://" URIs. So
Wget probably ought to escape it (when -k has been specified).
It doesn't, though, and I don't know of an easy workaround, apart from
using sed or similar to find these semicolons and replace them with %3b.
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----