character encoding question

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

character encoding question

From:	Eric Abrahamsen
Subject:	character encoding question
Date:	Wed, 20 Feb 2013 14:34:55 +0800
User-agent:	Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (gnu/linux)

I'm trying to get a better understanding of character encodings, as I
often have to deal with mis-encoded or mystery-encoded files. I've read
the Non-ASCII Characters section of the elisp manual, and have a fair
sense of what's going on, with a couple of remaining questions.

So the character 中 has a codepoint of #o47055 in octal notation.
Meanwhile:

(string-as-unibyte "中") --> \344\270\255

I understand that each of these three sections is a byte, also in octal.
What's the correspondence between these bytes and the multibyte
character's octal codepoint? Are there any functions that will get from
one to the other?

Second question: If emacs can't guess the encoding of a file, it gives
you an error message showing the bytes it can't decode, plus the
charsets it tried to use. How do I replicate that process manually?
Given a series of mystery bytes, can I test them against different
charsets, and see what gibberish emacs comes up with? I guess I'm
imagining something like "decode-char", except being able to feed it
bytes instead of a character...

Thanks!
Eric

[Prev in Thread]

Current Thread

[Next in Thread]

character encoding question, Eric Abrahamsen <=
- Re: character encoding question, Peter Dyballa, 2013/02/20
- Re: character encoding question, Stefan Monnier, 2013/02/20
  - Re: character encoding question, Eli Zaretskii, 2013/02/20
    - Re: character encoding question, Eric Abrahamsen, 2013/02/20

Prev by Date: Re: create new key prefix
Next by Date: Re: create new key prefix
Previous by thread: create new key prefix
Next by thread: Re: character encoding question
Index(es):
- Date
- Thread