help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File Encoding Issue on Windows


From: Peter Dyballa
Subject: Re: File Encoding Issue on Windows
Date: Tue, 12 Mar 2013 11:50:40 +0100

Am 12.03.2013 um 04:08 schrieb Tech Stuff:

>  ¿En qué fecha llegaron
> 
> when I should see:
> 
> ¿En qué fecha llegaron

The first line encodes the text of the last line in UTF-8 encoding, but is 
displayed to you in a different, an 8-bit encoding. In UTF-8 more than one 
byte, more than 8 bits, are used to encode the characters. Only the characters 
of the US-ASCII range (U+0001 - U+007E), i.e. the digits, non-accented 
characters, punctuation, are encoded by one byte.

The character ¿, INVERTED QUESTION MARK, U+00BF, is encoded in UTF-8 as two 
bytes: C2BF. These two bytes are in Notepad interpreted as some Latin or MS 
Windows encoding, i.e. as two different characters, as  and as ¿, which are 
then displayed as such.

The character é, LATIN SMALL LETTER E WITH ACUTE, U+00E9, is encoded in UTF-8 
as two bytes: C3A9. These two bytes are in Notepad interpreted as some Latin or 
MS Windows encoding, i.e. as two different characters and then displayed as à 
and as ©.

In MS Windows code page CP1252 uses for encoding:

        A9 = ©, COPYRIGHT SIGN
        BF = ¿, INVERTED QUESTION MARK
        C2 = Â, LATIN CAPITAL LETTER A WITH CIRCUMFLEX
        C3 = Ä, LATIN CAPITAL LETTER A WITH DIAERESIS

So Notepad is using this code page, CP1252, to display the UTF-8 encoded file. 
What you need to do is to tell Notepad to use UTF-8.

--
Greetings

  Pete

Give a man a fish, and you've fed him for a day. Teach him to fish, and you've 
depleted the lake.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]