[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Andries E. Brouwer |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Fri, 21 Aug 2015 14:22:22 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Fri, Aug 21, 2015 at 01:31:45PM +0200, Tim Ruehsen wrote:
> > There is a remote site.
> > Nothing is known about this remote site.
>
> Wrong. Regarding HTTP(S), we exactly know the encoding
> of each downloaded HTML and CSS document
> (that's what I call 'remote encoding').
You are an optimist. In my experience Firefox rarely gets it right.
Let me find some random site. Say
http://web2go.board19.com/gopro/go_view.php?id=12345
If I go there with Firefox, I get a go board with a lot of mojibake
around it. Firefox took the encoding to be Unicode. Trying out what
I have to say in the "Text encoding" menu, it turns out to be
"Chinese, Traditional".
> Leaving these misconfigured servers away as a special case
But most of the East Asian servers I meet are misconfigured in this way.
They announce text/html with charset utf-8 and come with some random
charset.
So trusting this announced charset should be done cautiously.
And you say "misconfigured servers", but often one gets a
Unix or Windows file hierarchy, and several character sets occur.
The server doesnt know. The sysadmin doesnt know. A university
machine will have many users with files in several languages
and character sets.
Moreover, the character set of a filename is in general unrelated
to the character set of the contents of the file. That is most clear
when the file is not a text file. What character set is the filename
http://www.win.tue.nl/~aeb/linux/lk/kn%e4ckebr%f6d.jpg
in? You recognize ISO 8859-1 or similar. My local machine is on UTF-8.
The HTTP headers say "Content-Type: image/jpeg".
How can wget guess?
Andries
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again),
Andries E. Brouwer <=
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Rühsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/24
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/25
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Ángel González, 2015/08/20