help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Single unrecognized character wrecks entire display


From: Peter Dyballa
Subject: Re: Single unrecognized character wrecks entire display
Date: Wed, 22 Aug 2012 17:18:36 +0200

Am 22.08.2012 um 11:36 schrieb Alexandre Oberlin:

> The problem is that it names a full list as bad characters, when only one.
>   utf-8-mac cannot encode these: \351 \350 \351 \342 \351 \234 \350 \350
> \311 \240
> How can I spot the true non utf-8 without trying them all?

When you set read-quoted-char-radix to 8 you can search for these "characters" 
in the text by:

        C-s C-q 3 5 1 RET

Hopefully! I think the problem is that your convertor (can't you use something 
reliable like iconv or recode?) makes mistakes. \240 or A0 in hex exists as 
partner of another byte (with C2 it constructs NO-BREAK SPACE, with C3 it's 
LATIN SMALL LETTER A WITH GRAVE, …), \234 or 9C builds with C3 LATIN CAPITAL 
LETTER U WITH DIAERESIS etc. I think what GNU Emacs wants to tell you and what 
I did not understand the first time is, that some characters obviously are not 
encoded correctly so that these "isolated" *bytes* are left over, they don't 
fit into regular 2- or 3- or even 4-byte codes of the UTF-8 encoding – and of 
course none of them is an ASCII character encoded by one byte (i.e., itself).

The utf-8-mac encoding in GNU Emacs is UTF-8 that uses ^M or CR as end of line 
character (UNIX uses ^J or Line Feed).

Can you give us some more details of the original source and the convertor, and 
its working principle (command line options)? How do you open it in GNU Emacs? 
How does it behave when you had launched GNU Emacs with -Q, i.e., with none of 
your possibly faulty customisation? By using for example on the command line:

        env LC_CTYPE=UTF-8 LANG=fr_FR.UTF-8 emacs -Q &

or

        env LC_CTYPE=UTF-8 LANG=fr_FR.UTF-8 
/Applications/Emacs.app/Contents/MacOS/Emacs -Q &

GNU Emacs should then automatically switch to some UTF-8 encoding – whether 
it's Apple or UNIX or MS line endings should not play such a role. You should 
see, if the input is faulty, searchable octal codes.

--
Greetings

  Pete

A lot of us are working harder than we want, at things we don't like to do. 
Why? ...In order to afford the sort of existence we don't care to live.
                                – Bradford Angier




reply via email to

[Prev in Thread] Current Thread [Next in Thread]