Re: Convert UTF-8

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Convert UTF-8

From:	YOUNG
Subject:	Re: Convert UTF-8
Date:	Thu, 18 Dec 2008 00:35:18 -0800 (PST)
User-agent:	G2/1.0

On Dec 17, 4:04 am, Xah Lee <xah...@gmail.com> wrote:
> On Dec 17, 12:41 am, YOUNG <breadn...@gmail.com> wrote:
>
>
>
> > Well, I have no problem to load UTF-8 file with emacs at all.
>
> > The problem is that emacs is not able to write UTF-8 at all.
>
> > For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> > Latin 1 to 9; there are various aliases to indicating of it, but you
> > already know what it means.), I set it up with M-x set-buffer-file-
> > coding-system for writing utf-8 encoding. And, write (or save) it.
> > After that, exit the emacs and re-run it again, and try to read the
> > saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> > It does not mean emacs can't read utf-8, but the file itself is not
> > encoded UTF-8. I check the file's encoding system with other
> > application like NotePAD++ or other editors, and all say the file is
> > still ASCII mode even though I write it as utf-8 in emacs.
>
> > Again, there is no problem in reading utf-8. When a file is encoded
> > utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> > emacs is not able to write utf-8 if the file is encoded in ASCII. It
> > only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> > system"
>
> > So, if somebody knows this issue and how to write utf-8 correctly when
> > a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> > information, it would be appreciated.
>
> > Thanks,
>
> as other have mentioned, utf-8 is just a super set of ascii, so files
> encoded in either are identical.
>
> You mentioned ISO8859, which is not ascii. I read your 2 posts, but
> don't quite understand what you wanted.
>
> For some unicode with emacs tips, you might checkout:
>
> • Emacs and Unicode Tips
>  http://xahlee.org/emacs/emacs_n_unicode.html
>
> You might also beefup understanding of char encoding:
>
> http://en.wikipedia.org/wiki/ISO8859http://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/UTF-8
>
>   Xah
> ∑http://xahlee.org/
>
> ☄

Hi,

Finally, I know what is the problem. Thank you guys for helping this
issues.

I am not expert on encoding system, though, I thank this opportunity
for me to learn it.

The problem is BOM (Byte Order Mark). In case of utf-8, it is avoided
since BOM header could cause conflict when the expected special
character is starting position like '#!' in Unix shell script.
Therefore, if there is no text written at least 8-bits to be
represented in utf-8, the text encoding is not defined or ASCII (I am
not sure if it is right term, but here, let's say it is ASCII for
convenience.) in emacs.

I could conclude emacs does not have the feature of having BOM in
utf-8. It only supports utf-8 without BOM. So, I could understand why
the text was not written in utf-8 if the text does not contain actual
utf-8 characters. If there is a text in utf-8 character and save it as
utf-8, then there is no problem in writing utf-8 without BOM.

Detailed information about unicode and BOM is found in
http://unicode.org/faq/utf_bom.html

Thank you,

[Prev in Thread]

Current Thread

[Next in Thread]

Convert UTF-8, YOUNG, 2008/12/17
- Re: Convert UTF-8, Andreas Politz, 2008/12/16
  - Re: Convert UTF-8, Harald Hanche-Olsen, 2008/12/17
    - Re: Convert UTF-8, YOUNG, 2008/12/17
    - Re: Convert UTF-8, Thierry Volpiatto, 2008/12/17
    - Re: Convert UTF-8, Giorgos Keramidas, 2008/12/17
    - Re: Convert UTF-8, Xah Lee, 2008/12/17
    - Re: Convert UTF-8, YOUNG <=
    - Re: Convert UTF-8, Harald Hanche-Olsen, 2008/12/18
- Re: Convert UTF-8, Peter Dyballa, 2008/12/17

Prev by Date: Re: Transient Mark Mode on by Default?
Next by Date: Re: How to conduct common file tests in elisp
Previous by thread: Re: Convert UTF-8
Next by thread: Re: Convert UTF-8
Index(es):
- Date
- Thread