bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] wget properly downloads pages and files, but does not lin


From: Micah Cowan
Subject: Re: [Bug-wget] wget properly downloads pages and files, but does not link some files while correctly linking others
Date: Mon, 17 Aug 2009 13:05:08 -0700
User-agent: Thunderbird 2.0.0.22 (X11/20090608)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Mcconnell wrote:
> In testing wget against a web site that I created using googlepages, some 
> pages do not have some of the appropriate files linked in the download even 
> though the files themselves are downloaded.
> 
> My web site is:
> 
> http://handleyhigh.googlepages.com
> 
> The one of the pages in which all but one of the pictures fails to link is:
> 
> http://handleyhigh.googlepages.com/biographyideas
> 
> Only the last picture on the page actually links while all of the pictures 
> download.  This happens on other pages as well.  However, solving the problem 
> for this page will no doubt solve the problem on the other pages.  It may 
> very well be that there is some sort of nuance in the web page code that is 
> the source of the problem, but I've not been able to locate it.
> 
> I should add that all pictures correctly display on the web site.
> 
> Any help would be most appreciated.

In the future, please be specific about the version of Wget you're
using, and the options you specified to it. However, in this specific
case I was able to reproduce the problem.

The trouble appears to be that wget downloads the relative link,
"cartoon21.jpg/cartoon21-medium;init:.jpg", but doesn't percent-encode
the ; or : (it seems to be the semicolon specifically that causes the
issue for me). While the semicolon is allowed to appear unescaped in
that location by the generic URI syntax defined by RFC 3986, RFC 1738
seems to have left it out of the specific syntax for "file://" URIs. So
Wget probably ought to escape it (when -k has been specified).

It doesn't, though, and I don't know of an easy workaround, apart from
using sed or similar to find these semicolons and replace them with %3b.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqJt/QACgkQ7M8hyUobTrEAuQCfXbUdN7WznUD3JjaNZwSQUbBB
rrEAoIAzevEl3JMTVRkyWVT33bBG+70I
=RqpG
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]