[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Turning HTML character references into something readable?
From: |
Karl Eichwalder |
Subject: |
Re: Turning HTML character references into something readable? |
Date: |
Sun, 27 Apr 2003 21:09:39 +0200 |
User-agent: |
Gnus/5.09002 (Oort Gnus v0.20) Emacs/21.3.50 (gnu/linux) |
Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:
> Actually that literal seems to be in some JIS encoding on my side,
> while Я indicates Unicode.
Gnus decided to turn it into JIS; initially it was Unicode/UTF-8.
> (char-to-string (decode-char 'ucs 1071))
Yes, this is a good hint!
> If you want to get this into an interactive command, you'd need some
> more coding. Or maybe PSGML or some other SGML/HTML/XML mode may have
> that functionality already.
In this case I cannot use PSGML because de.wikipedia.org is based on a
free style markup language...
>> On the command line recode can do the trick:
>>
>> echo "Я" | recode html..utf-8
>
> You can use shell-command-on-region (M-|) to use "recode html..utf-8"
> directly.
I completely forgot about this possibility. But now it turns out,
"recode html..utf-8" is too ambitious; if the file already contains
umlaut characters they will be encoded twice:
echo "Danke schön ЮЯ" | recode html..utf-8
Danke schön ��
I must find a way to tell recode to leave "Danke schön" untouched.
--
| ,__o
http://www.gnu.franken.de/ke/ | _-\_<,
ke@suse.de (work) / keichwa@gmx.net (home) | (*)/'(*)
Re: Turning HTML character references into something readable?, Colin Marquardt, 2003/04/28