help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Convert UTF-8


From: Giorgos Keramidas
Subject: Re: Convert UTF-8
Date: Wed, 17 Dec 2008 13:17:04 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (berkeley-unix)

On Wed, 17 Dec 2008 00:41:47 -0800 (PST), YOUNG <breadncup@gmail.com> wrote:
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.

ASCII contains only 7-bit characters.  All the characters of the 7-bit
ASCII character set map to themselves in the UTF-8 coding system.

This means that when a file contains only characters from the ASCII
character set no conversion at all is needed from UTF-8 to ASCII or vice
versa.

If you set the buffer-file-coding system to UTF-8 *and* type some text
that requires at least 8-bits to be represented correctly in in UTF-8,
then the file will be saved in UTF-8.

> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do
> "set-buffer-file-coding- system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.

CP437 is very different from plain ASCII.  It contains 8-bit characters
and there are other differences in the 0x00 - 0x1F code range.  If you
ignore the 0x00-0x1F character differences you might be able to say that
CP437 is a 'superset' of ASCII, but they are not the same thing.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]