help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Turning HTML character references into something readable?


From: Karl Eichwalder
Subject: Re: Turning HTML character references into something readable?
Date: Sun, 27 Apr 2003 21:09:39 +0200
User-agent: Gnus/5.09002 (Oort Gnus v0.20) Emacs/21.3.50 (gnu/linux)

Benjamin Riefenstahl <Benjamin.Riefenstahl@epost.de> writes:

> Actually that literal seems to be in some JIS encoding on my side,
> while &#1071; indicates Unicode.

Gnus decided to turn it into JIS; initially it was Unicode/UTF-8.

>   (char-to-string (decode-char 'ucs 1071))

Yes, this is a good hint!

> If you want to get this into an interactive command, you'd need some
> more coding.  Or maybe PSGML or some other SGML/HTML/XML mode may have
> that functionality already.

In this case I cannot use PSGML because de.wikipedia.org is based on a
free style markup language...

>> On the command line recode can do the trick:
>> 
>>    echo "&#1071;" | recode html..utf-8
>
> You can use shell-command-on-region (M-|) to use "recode html..utf-8"
> directly.

I completely forgot about this possibility.  But now it turns out,
"recode html..utf-8" is too ambitious; if the file already contains
umlaut characters they will be encoded twice:

    echo "Danke schön &#1070;&#1071;" | recode html..utf-8
    Danke schön ��

I must find a way to tell recode to leave "Danke schön" untouched.

-- 
                                                         |      ,__o
http://www.gnu.franken.de/ke/                            |    _-\_<,
ke@suse.de (work) / keichwa@gmx.net (home)               |   (*)/'(*)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]