[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnumed-devel] Encoding (viewing) on Mac OS
From: |
Karsten Hilbert |
Subject: |
Re: [Gnumed-devel] Encoding (viewing) on Mac OS |
Date: |
Tue, 15 Nov 2011 13:39:28 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Tue, Nov 15, 2011 at 04:50:15AM +0000, Jim Busser wrote:
> Judging from my favourite Mac text processor TextWrangler
> -- a free version of BBedit -- I think I figured out a Mac
> vulnerability when processing a file encoded as
>
> Latin1
>
> because TextWrangler (perhaps with a dependency on the OS)
> has trouble to appropriately auto-detect which form of Latin
> 1 encoding…
Latin1 is Latin1, there's no two ways about it that I can see
http://en.wikipedia.org/wiki/ISO/IEC_8859-1
regardless of what a Mac may think.
The problem is likely rather that "auto-detecting" Latin1 is
impossible because it overlaps with many other encodings. If
a file only contains characters from the overlap no
auto-disambiguation is conceptually possible.
> it tends to select
>
> Western (Mac OS Roman)
>
> even when this results in incorrect characters
That's worse yet.
> e.g. in the server sql country-specific file
>
> gmDemographics-Data.ca.sql
>
> it yields
>
> <snip>
> select i18n.upd_tx('fr_CA', 'Nova Scotia', 'Nouvelle-…cosse');
> select i18n.upd_tx('fr_CA', 'Prince Edward Island',
> 'Œle-du-Prince-…douard');
> select i18n.upd_tx('fr_CA', 'Quebec', 'QuÈbec');
> <snip>
>
> whereas
>
> Western (Windows Latin 1)
> Western (ISO Latin 1)
They are identical.
> yield
>
> <snip>
> select i18n.upd_tx('fr_CA', 'Nova Scotia', 'Nouvelle-Écosse');
> select i18n.upd_tx('fr_CA', 'Prince Edward Island',
> 'Île-du-Prince-Édouard');
> select i18n.upd_tx('fr_CA', 'Quebec', 'Québec');
> <snip>
>
> If necessary, I can open such files manually choosing one of the other Latin1
> encodings, change the selection to UTF8 and save it.
Yes, that would (IMO) be the correct way of going about it.
> I wonder however whether in future -- in spite of the Canadian source having
> been Latin 1 -- there is any reason why the sql files cannot be saved in UTF8?
They already *are* UTF8 -- because for all relevant
characters utf8 and latin1 overlap (unless I am mistaken).
Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346