help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: those funny non-ASCII characters


From: Buchs, Kevin
Subject: Re: those funny non-ASCII characters
Date: Fri, 25 May 2012 08:40:25 -0500

Thanks, Xah and Eli, for contributing to my further understanding. I
went to a specific website where I got the content I copied and pasted
and I can see from the HTML that it has a charset=UTF-8, so I understand
that is Unicode 8-bit. Using the C-u C-x =, I see that the particular
character I pasted has a code point of 0x2013 (U+2013). I didn't see,
however, what the UTF-8 encoding of that code point was. Should I be
able to read that somewhere on the buffer of information I get with C-u
C-x = ? I was poking around the www.unicode.org website, trying to
understand how this U+2013 code point is encoded into UTF-8, but I
haven't determined that yet.

A fresh buffer in emacs for me on my Win-7 box has an encoding system of
iso-latin-1-dos. The coding system used to open and save files is the
same.

So, help me piece together what happens as I paste the UTF-8 text into a
buffer. First, the paste buffer must define that it is in UTF-8. Emacs
reads this information and inserts it into the byte string that defines
the buffer. Now, how does emacs record that it was a UTF-8 encoded
character? Does it translate it into a different internal encoding
instead of just recording the 8 bits transferred? Is this encoding used
as a superset of all possible encoding systems that emacs supports?

Now,  Xah, you suggest I embrace Unicode. What does that mean? Would it
involve marking all my lisp library files and my org-mode files with the
file variable -*- coding: utf-8 -*- ? Or is there another way to go
Unicode automatically? 

I assume that if my lisp library files are encoded utf-8, then I can
paste that character from the web page into my call to replace-string in
order to substitute the longer dash of Unicode U+2013 with an ascii
hyphen or double hyphen. But, how does that really work? If the lisp
file is encoded utf-8, then how can I put an ascii character in the
replacement string?

I would appreciate it if someone could help me open this new door in my
brain a bit further.

Kevin Buchs | Senior Engineer | SPPDG | 507-538-5459 |
buchs.kevin@mayo.edu
Mayo Clinic | 200 First Street SW | Rochester, MN 55905 |
http://www.mayo.edu/sppdg 

-----Original Message-----
With cursor on that character, type "C-u C-x =", and Emacs will show
everything it knows about that character, including its canonical
name.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]