help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problem with editing/decoding utf-8 text


From: Kai Großjohann
Subject: Re: problem with editing/decoding utf-8 text
Date: Fri, 23 May 2003 18:50:08 +0200
User-agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3.50 (gnu/linux)

Fery <engard.ferenc@innomed.hu> writes:

> I have a UTF-8 text file, containing latin-1 text. When I try to edit it
> with emacs, it does not detect that it is utf-8; the
> describe-coding-system gives back 'iso-latin-1-unix'. (And I see the
> two-byte representation of latin1 chars, which is not bad to me.)

Released versions of Emacs put UTF-8 at a rather low priority for
automatic encoding detection.  So you need to help Emacs by
explicitly specifying the encoding.  Do C-x RET c utf-8 RET before
using C-x C-f to open the file.

You can also put utf-8 somewhat earlier in the list for automatic
encoding detection.  I think this can be achieved in the following
way, but I'm not sure.  I'm not a Mule expert.  If anyone knows
better, please help out.

(setq coding-category-list
      (cons 'coding-category-utf-8
            (delq 'coding-cateogcoding-utf-8
                  coding-category-list)))

> When I save the buffer, it displays an error message:
>
> These default coding systems were tried:
>   iso-latin-1-unix
> However, none of them safely encodes the target text.
>
> Now, no matter what I choose (raw-text, no-conversion, utf-8), it
> modifies all of the utf8 chars which are not fit into the ascii charset.
> It seems, that it inserts a \201 before every char which is not in the
> ascii charset. I.e. if I just load and save a file, emacs does not
> behaves transparently.

You should make sure that UTF-8 is properly recognized when opening
the file, then saving will Just Work.

> I have found one solution: opening the file with
> universal-coding-system-argument, using even UTF-8 (then I see correctly
> the chars, although it is not always important) or e.g. no-conversion.

Do not use no-conversion.  The file is UTF-8, so UTF-8 is the right
encoding to specify.

> My questions:
>
> 0. What is this \201 byte?

Emacs encodes Latin-1 characters internally by a two-byte sequence.
The first byte is \201 (indicating the Latin-1 character set), and
the second byte is the actual character.  \202 stands for Latin-2, as
you might guess.

> 1. Cannot I tell to a buffer (after the load of a file) that interpet it
> as binary, and save exactly the same bytes what it did read into the
> buffer (i.e. transparent buffer)?

It's not a good idea.  The buffer contents might already be munged at
that point.

> 2. What is the difference between raw-text, no-conversion, binary? On
> some places, I can choose any of them, on other places not... This whole
> coding system is a nightmare... :(((

The differences are rather subtle, I'm afraid.  I think binary is an
alias for no-conversion.  raw-text does EOL conversion, whereas
no-conversion doesn't.

> 3. Cannot I tell to emacs that interpret the keyboard input as
> "raw"? I have set input-meta to On, convert-meta to Off in .inputrc,
> and if I could tell emacs that "just interpret the bytes from the
> terminal input what they are", then I could copy/paste utf-8 data
> (in raw format) from another application. (I run emacs on linux,
> with the 'putty' terminal on windows).

It does not make sense to do that, IMHO.  For example, M-f would
cease to work because Emacs wouldn't know what characters are
represented by the bytes, and so it wouldn't know which characters
are parts of words.

But it seems your terminal uses utf-8, so you can just teach Emacs
about this: C-x RET k utf-8 RET.
-- 
This line is not blank.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]