help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

replacing characters and whacky trans-buffer conversion


From: ken
Subject: replacing characters and whacky trans-buffer conversion
Date: Tue, 06 Mar 2007 10:15:00 -0500
User-agent: Thunderbird 2.0pre (X11/20070214)

An email comes in with this (emdash) character in it: –

It looks like an em-dash until the text containing it is pasted into an
emacs buffer; then it appears as a series of "garbage characters".
(Copy and paste the emdash into an emacs buffer yourself, and perhaps
you'll see what I mean.)

To me and, possibly to you, this emdash appears in emacs as nine (9)
"garbage" characters.

Because I want to programmatically replace these 9 garbage characters
into something latin1-friendly, I copy-and-paste these nine characters
into an *.el file containing a line like this:

  (replace-string "–" "--" nil (point-min) (point-max))

The sought string (i.e., the first argument above) isn't found, however
because, for some whacky reason, the emdash pasted into the *.el file is
different-- by one character-- from exactly the same emdash pasted into
the other emacs buffer (the one I'm saving the email in).

In the emacs buffer containing the email, the fourth garbage character
(as shown by C-u C-x=) is:

  character: β (05542, 2914, 0xb62)
    charset: greek-iso8859-7
             (Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7): 
ISO-IR-126)
 code point: 98
     syntax: word
   category: g:Greek
buffer code: 0x86 0xE2
  file code: not encodable by coding system undecided-unix
       font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-7

In the *.el buffer, the fourth garbage character (which should be
exactly the same character) is:

  character: â (0342, 226, 0xe2)
    charset: eight-bit-graphic (8-bit graphic char (0xA0..0xFF))
 code point: 226
     syntax: whitespace
   category:
buffer code: 0xE2
  file code: 0xE2 (encoded by coding system raw-text-unix)
       font: -ETL-Fixed-Medium-R-Normal--16-160-72-72-C-80-ISO8859-1

I tried entering "C-q 5542 RETURN" into the *.el file, but emacs
immediately makes it into the second (â, or 0342) character.  Doing the
same into the other emacs buffer (containing my copy of the email)
*does* enter the good (β, or 05542) character.

All I really want is for the above replace-string function to work as
expected.  But emacs consistently converts that fourth character in the
emdash string into a different character, subsequently causing the
search to fail.  So how do I get the correct "garbage" characters into
the first argument of the replace-string function-- i.e., into the *.el
file?


tnx,
ken


-- 
"Genius might be described as a supreme capacity for getting its
possessors into trouble of all kinds."
        -- Samuel Butler




reply via email to

[Prev in Thread] Current Thread [Next in Thread]