[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: searching for non ascii characters
From: |
Peter Dyballa |
Subject: |
Re: searching for non ascii characters |
Date: |
Wed, 3 Aug 2005 16:09:13 +0200 |
Am 03.08.2005 um 15:28 schrieb rahed@cwazy.co.uk:
character: š (01210241, 331937, 0x510a1)
charset: mule-unicode-0100-24ff (Unicode characters of the range
U+0100..U+24FF.)
code point: 33 33
syntax: word
category: l:Latin
buffer code: 0x9C 0xF4 0xA1 0xA1
file code: B9 (encoded by coding system iso-latin-2-unix)
font: -outline-Courier
New-normal-r-normal-normal-13-97-96-96-c-80-iso10646-1
My own test file with ISO 8859-2 encoding has this in GNU Emacs 23:
character: š (0541, 353, 0x161)
preferred charset: iso-8859-2 (ISO/IEC 8859/2)
code point: 0xB9
syntax: w which means: word
category: j:Japanese l:Latin
buffer code: 0xC5 0xA1
file code: 0xB9 (encoded by coding system iso-latin-2-unix)
display: by this font (glyph code)
-B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60-ISO10646-1
(0x161)
and this in GNU Emacs 22 and 21.3:
character: š (04471, 2361, 0x939, U+0161)
charset: [latin-iso8859-2]
(Right-Hand Part of Latin Alphabet 2 (ISO/IEC 8859-2):
ISO-IR-101.)
code point: [57]
syntax: w which means: word
category: l:Latin
buffer code: 0x82 0xB9
file code: 0xB9 (encoded by coding system iso-latin-2-unix)
display: by this font (glyph code)
-B&H-LucidaTypewriter-Medium-R-Normal-Sans-10-100-75-75-M-60-ISO8859-2
(0xB9)
Both use the right charset and encoding. If you close and open again
that file and it has that '-*- coding: iso-8859-2; -*-' in its header,
among the first six or nine lines, Emacs should switch to that coding
-- except you have at the file's end a block of local or file variables
that say something different. Or it has a fixation to a specific
coding-system. Did you launch your Emacs after changing .emacs? Can you
check the variable's state (C-h v on this variable in .emacs in newly
launched Emacs)? If it's something different than set then you either
have this statement not executed or it exists more than once and gets
reset some time after this line ... What does your file's tail look
like?
The last thing I think of is the use of fontsets instead of fonts. What
is your status?
Your file has at LATIN SMALL LETTER S WITH CARON's position the correct
byte, 0xB9. So it is presumingly still correctly encoded. To see it in
ISO/IEC 8859-2 you can revert-buffer-with-coding-system, C-x RET r
CODING-SYSTEM. Use M-x list-coding-systems to see what your system has.
--
Greetings
Pete
Windows, c'est un peu comme le beaujolais nouveau: à chaque nouvelle
cuvée on sait que ce sera dégueulasse, mais on en prend quand même, par
masochisme.