help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

character encoding question


From: Eric Abrahamsen
Subject: character encoding question
Date: Wed, 20 Feb 2013 14:34:55 +0800
User-agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (gnu/linux)

I'm trying to get a better understanding of character encodings, as I
often have to deal with mis-encoded or mystery-encoded files. I've read
the Non-ASCII Characters section of the elisp manual, and have a fair
sense of what's going on, with a couple of remaining questions.

So the character 中 has a codepoint of #o47055 in octal notation.
Meanwhile:

(string-as-unibyte "中") --> \344\270\255

I understand that each of these three sections is a byte, also in octal.
What's the correspondence between these bytes and the multibyte
character's octal codepoint? Are there any functions that will get from
one to the other?

Second question: If emacs can't guess the encoding of a file, it gives
you an error message showing the bytes it can't decode, plus the
charsets it tried to use. How do I replicate that process manually?
Given a series of mystery bytes, can I test them against different
charsets, and see what gibberish emacs comes up with? I guess I'm
imagining something like "decode-char", except being able to feed it
bytes instead of a character...

Thanks!
Eric




reply via email to

[Prev in Thread] Current Thread [Next in Thread]