bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problems recursively downloading a directory tree


From: Micah Cowan
Subject: Re: [Bug-wget] Problems recursively downloading a directory tree
Date: Tue, 10 Nov 2009 20:58:06 -0800
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

KARR, DAVID (ATTCINW) wrote:
>> -----Original Message-----
>> From: Tony Lewis [mailto:address@hidden
>> Sent: Tuesday, November 10, 2009 1:58 PM
>> To: address@hidden
>> Cc: KARR, DAVID (ATTCINW)
>> Subject: RE: [Bug-wget] Problems recursively downloading a directory
>> tree
>>
>> KARR, DAVID (ATTCINW) wrote:
>>
>>> I must be missing something simple.
>> wget translates file name based on valid file name characters for your
>> local
>> file system.
> 
> I don't think it's as simple as that.  There are no "special" characters
> used in file names in this tree.  All the base file names represent the
> names of Java classes.  It seems like it's making an unexpected decision
> based on the links it's finding in the HTML files.

Yes, there is a special character: the "?" character, which gets
replaced by a "@".

> For instance, one of the produced file names
> "address@hidden" appears to be
> produced by this request:
> 
> -----------
> --2009-11-10 15:09:03--
> http://ecom.cingular.net/wiki_downloads/documentation/ATGPlatform20071do
> cs/apidoc/index.html?atg/adapter/gsa/BcpDBCopier.html
> Reusing existing connection to ecom.cingular.net:80.
> HTTP request sent, awaiting response... 200 OK
> Length: 1319 (1.3K) [text/html]
> Saving to:
> `ecom.cingular.net/wiki_downloads/documentation/ATGPlatform20071docs/api
> doc/address@hidden'
> -----------
> 
> I don't know why the request has a query parameter.

Because the link it comes from had a query parameter. And the resulting
file name is exactly what I'd expect.

> What's even more curious is that earlier in the wget output, I note the
> request for a similar file:
> 
> -------------
> --2009-11-10 15:05:16--
> http://ecom.cingular.net/wiki_downloads/documentation/ATGPlatform20071do
> cs/apidoc/atg/adapter/gsa/BcpDBCopier.html
> Reusing existing connection to ecom.cingular.net:80.
> HTTP request sent, awaiting response... 200 OK
> Length: 28327 (28K) [text/html]
> Saving to:
> `ecom.cingular.net/wiki_downloads/documentation/ATGPlatform20071docs/api
> doc/atg/adapter/gsa/BcpDBCopier.html'
> -------------
> 
> Note that both of these requests produce a file name ending with
> "BcpDBCopier.html", but the latter is in the directory I expect, and the
> other is with the weird name in the root directory.  These two files are
> different.  The funny-named one is pretty short, and uses framesets.
> The normal looking one is longer, and looks like the expected javadoc
> for the class.

The two files are different because the links from whence they came
reflect that difference. I'm not sure what you expect, but this matches
what I expect. Except for your claim that the "weird one" is in the root
directory, which strikes me as quite odd (and, well, very improbable,
unless custom modifications were made by Cygwin's wget packagers).

The message from wget says it was written to
ecom.cingular.net/wiki_downloads/documentation/ATGPlatform20071docs/apidoc/.
Perhaps there's a file there, and the file you're finding in the root
directory is actually from a different link?

-- 
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
Maintainer of GNU Wget and GNU Teseq
http://micah.cowan.name/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]