bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Problem with ÅÄÖ and wget


From: Tim Ruehsen
Subject: Re: [Bug-wget] Problem with ÅÄÖ and wget
Date: Mon, 16 Sep 2013 12:50:07 +0200
User-agent: KMail/4.10.5 (Linux/3.10-3-amd64; KDE/4.10.5; x86_64; ; )

> > I switched my environment to UTF-8 now and it seems to work:
> On my main-machine to, didn't have access to that one yesterday-evening.

Just to have it mentioned:
Your download (wget -r http://bmit.se/wget) succeeds, but it shouldn't !
IMHO, Wget has a bug here and just because of this bug your test case 
succeeds.

Why ?
Your wget/index.html holds the UTF-8 encoded URL 'teståäöÅÄÖ', but neither the 
server header (Content-Type: text/html) nor the document itself (META http-
equiv ...) defines the charset. That means the charset encoding of index.html 
should be ISO-8859-1. See [1].
Wget should have taken the URL 'teståäöÅÄÖ' as ISO-8859-1 and convert it into 
UTF-8, which would fail to download.

Conclusion
1. Be prepared that Wget will change it's behaviour sooner or later (make 
sure, you specify / deliver the charset encoding of your documents).
2. Wget will/does have problems with ISO-8859-1 text/html pages if the charset 
is not  specified AND special chars are used.

Someone proving me wrong ?

[1] http://nikitathespider.com/articles/EncodingDivination.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]