[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnumed-devel] Encoding (viewing) on Mac OS
From: |
Karsten Hilbert |
Subject: |
Re: [Gnumed-devel] Encoding (viewing) on Mac OS |
Date: |
Wed, 16 Nov 2011 10:25:44 +0100 |
> Nicolas Barbier <address@hidden> wrote:
>
> > They already *are* UTF8 -- because for all relevant
> > characters utf8 and latin1 overlap (unless I am mistaken).
>
> Latin 1 (= ISO 8859-1) and Unicode overlap in such a way (see the
> table in [1], the description of the block that starts at 00C0).
> However, when using UTF-8 as the Unicode encoding, the bytes used to
> represent those codes are not the same.
>
> [1] <URL:http://en.wikipedia.org/wiki/Latin_characters_in_Unicode>
>
> For example: “é” (small e with acute), has code E9 in both Latin 1
> and
> Unicode. UTF-8 encodes that number as C3 A9 (i.e., two bytes), whereas
> Latin 1 just encodes it as the single byte E9. A UTF-8 file containing
> that symbol, interpreted as UTF-8, would yield “é” (capital A with
> tilde + copyright sign).
Thanks for the clarification. I was under the impression that
there IS a way to use the very same byte sequence for both latin1
and utf8 as long as there's only overlapping characters in the file.
After all, it IS possible to say either of "coding latin1" or
"coding utf8" at the top of a Python file and have the Python
interpreter properly read, say, German umlauts within said file (the
byte sequence does not change, just the declaration) ?
Karsten
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de