help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: desktop and encodings


From: Peter Dyballa
Subject: Re: desktop and encodings
Date: Mon, 23 May 2005 20:01:54 +0200


Am 23.05.2005 um 16:13 schrieb Mads Jensen:

æøå gets turned into something like Â¥...


What see is the 'translation' of some ISO Latin encoding into UTF-8 and then displaying these double byte values as unibytes!

This could explain a bit:

;   oct   dec   hex    UCS2    UTF-8
;=====================================
  = 240 = 160 = A0 = U+00A0 =    C2 A0 : NO-BREAK SPACE
Ą = 241 = 161 = A1 = U+0104 = C4 84 : LATIN CAPITAL LETTER A WITH OGONEK
ĸ = 242 = 162 = A2 = U+0138 =    C4 B8 : LATIN SMALL LETTER KRA
Ŗ = 243 = 163 = A3 = U+0156 = C5 96 : LATIN CAPITAL LETTER R WITH CEDILLA
¤ = 244 = 164 = A4 = U+00A4 =    C2 A4 : CURRENCY SIGN
Ĩ = 245 = 165 = A5 = U+0128 = C4 A8 : LATIN CAPITAL LETTER I WITH TILDE Ļ = 246 = 166 = A6 = U+013B = C4 BB : LATIN CAPITAL LETTER L WITH CEDILLA
§ = 247 = 167 = A7 = U+00A7 =    C2 A7 : SECTION SIGN
¨ = 250 = 168 = A8 = U+00A8 =    C2 A8 : DIAERESIS
Š = 251 = 169 = A9 = U+0160 = C5 A0 : LATIN CAPITAL LETTER S WITH CARON Ē = 252 = 170 = AA = U+0112 = C4 92 : LATIN CAPITAL LETTER E WITH MACRON Ģ = 253 = 171 = AB = U+0122 = C4 A2 : LATIN CAPITAL LETTER G WITH CEDILLA Ŧ = 254 = 172 = AC = U+0166 = C5 A6 : LATIN CAPITAL LETTER T WITH STROKE
­ = 255 = 173 = AD = U+00AD =    C2 AD : HYPHEN-MINUS
Ž = 256 = 174 = AE = U+017D = C5 BD : LATIN CAPITAL LETTER Z WITH CARON

Á = 301 = 193 = C1 = U+00C1 = C3 81 : LATIN CAPITAL LETTER A WITH ACUTE Â = 302 = 194 = C2 = U+00C2 = C3 82 : LATIN CAPITAL LETTER A WITH CIRCUMFLEX Ã = 303 = 195 = C3 = U+00C3 = C3 83 : LATIN CAPITAL LETTER A WITH TILDE Ä = 304 = 196 = C4 = U+00C4 = C3 84 : LATIN CAPITAL LETTER A WITH DIAERESIS Å = 305 = 197 = C5 = U+00C5 = C3 85 : LATIN CAPITAL LETTER A WITH RING ABOVE
Æ = 306 = 198 = C6 = U+00C6 =    C3 86 : LATIN CAPITAL LETTER AE

æ = 346 = 230 = E6 = U+00E6 =    C3 A6 : LATIN SMALL LETTER AE


First column contains the glyphs as they are, next columns have the glyph's byte value expressed as octal, decimal, or hexadecimal numerals. Next column, UCS2, show the slot number (ASCII code) of that glyph in Unicode (which, I think, is too the internal representation in GNU Emacs). The next column now shows into which bytes the glyphs from column 1 are translated as UTF-8. As you can see you can 'see' the UTF-8 bytes as 'normal' characters, a UTF-8 encoded æ is just 'ÄĻ' if displayed in ISO Latin-4, 'Ħ' in ISO Latin-1 ...

So, to conclude: your Emacs obviously saves your input as UTF-8, and you have to make the buffer display in UTF-8 too! The correct headers would look like

        ;;; -*- mode: Text; coding: utf-8; -*-

Once you have the file opened in the wrong encoding you can change that with revert-buffer-with-coding-system, C-x RET r utf-8 RET.

Have you thought of

(prefer-coding-system     'utf-8-unix)

Could be it cures a lot. There is too (set-language-environment 'Danish) ...


--
Mit friedvollen Grüßen

  Pete

In a world without walls and fences, who needs gates and windows?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]