[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Filename Encoding
From: |
John Darrington |
Subject: |
Re: Filename Encoding |
Date: |
Sun, 22 Dec 2013 17:51:47 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Tue, Dec 10, 2013 at 12:38:04PM -0800, Ben Pfaff wrote:
in syntax and the output engine, we tend to convert everything we
receive externally into UTF-8 for internal processing, and then convert
back to other encodings as necessary.
I'm not sure this was the best decision. How do we know to which encoding we
should
convert back to?
Consider this scenario:
On a GNU/Linux system (where the filesystem is encoding agnostic) there exists
two files
which I shall call fileA and fileB.
Let us assume that the bytes which comprise the the name of fileA happen to be
valid UTF-8. Let us also assume that the bytes which comprise the name of fileB
happen to be valid ISO-8859-1. Further, let us also assume that when the name
of fileB is converted from ISO-8859-1 to UTF-8 the result happens to be
identical
to the name of fileA.
On "normal" applications the question is not relevant. Filenames are simply
byte
strings. However because we convert everything to UTF-8 in syntax (for example:
GET FILE="Äpfelfaß.sav".) We no longer know the encoding of that filename.
I don't know how to solve this problem.
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature