help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File Encoding Issue on Windows


From: Tech Stuff
Subject: Re: File Encoding Issue on Windows
Date: Tue, 12 Mar 2013 07:57:39 -0700 (PDT)

Hi Peter,

Thanks for taking the time to reply.  Though it was useful, I'm still confused about how to resolve this issue.  To be clear, when I posted yesterday, it was in emacs that I was seeing the extraneous characters, not in notepad.  However I just opened it again in notepad to check on the encoding and now I'm seeing the extra characters there as well.  So something must have changed when as part of trying to figure out what was going on, I saved the file in Emacs.  Emacs seems to be the culprit.  Is there something that I can put in my .emacs to tell it to save automatically in utf-8?  Or am I maybe still not understanding things.

Thanks again.

-ts1971 



From: Peter Dyballa <Peter_Dyballa@Web.DE>
To: Tech Stuff <techstuff1971@yahoo.com>
Cc: "help-gnu-emacs@gnu.org" <help-gnu-emacs@gnu.org>
Sent: Tuesday, March 12, 2013 3:50 AM
Subject: Re: File Encoding Issue on Windows


Am 12.03.2013 um 04:08 schrieb Tech Stuff:

>  ¿En qué fecha llegaron
>
> when I should see:
>
> ¿En qué fecha llegaron

The first line encodes the text of the last line in UTF-8 encoding, but is displayed to you in a different, an 8-bit encoding. In UTF-8 more than one byte, more than 8 bits, are used to encode the characters. Only the characters of the US-ASCII range (U+0001 - U+007E), i.e. the digits, non-accented characters, punctuation, are encoded by one byte.

The character ¿, INVERTED QUESTION MARK, U+00BF, is encoded in UTF-8 as two bytes: C2BF. These two bytes are in Notepad interpreted as some Latin or MS Windows encoding, i.e. as two different characters, as  and as ¿, which are then displayed as such.

The character é, LATIN SMALL LETTER E WITH ACUTE, U+00E9, is encoded in UTF-8 as two bytes: C3A9. These two bytes are in Notepad interpreted as some Latin or MS Windows encoding, i.e. as two different characters and then displayed as à and as ©.

In MS Windows code page CP1252 uses for encoding:

    A9 = ©, COPYRIGHT SIGN
    BF = ¿, INVERTED QUESTION MARK
    C2 = Â, LATIN CAPITAL LETTER A WITH CIRCUMFLEX
    C3 = Ä, LATIN CAPITAL LETTER A WITH DIAERESIS

So Notepad is using this code page, CP1252, to display the UTF-8 encoded file. What you need to do is to tell Notepad to use UTF-8.

--
Greetings

  Pete

Give a man a fish, and you've fed him for a day. Teach him to fish, and you've depleted the lake.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]