help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: making "玄奘" say "Xuanzang" in chinese


From: Joe Corneli
Subject: Re: making "玄奘" say "Xuanzang" in chinese
Date: Thu, 24 Mar 2005 19:07:40 -0600

   Joe Corneli <jcorneli@math.utexas.edu> writes:
   > Adapted from w3m-filter.el:
   >
   > (while (re-search-forward "&#\\([0-9]+\\);" nil t)
   >   (setq ucs (string-to-number (match-string 1)))
   >   (delete-region (match-beginning 0) (match-end 0))
   >   (insert-char ucs 1))
   >
   > This would appear to work if the characters themselves were recognized...
   >
   > But when I run this expression on a buffer containing the string
   > "&#29572;&#22872;" what I get is an error, like this:

   Is that really what w3m does?

Hm... well I did doctor it up a bit.  In particular, I took out some
code that wrapped `ucs' in the last line with the function defined by:

 (defun w3m-ucs-to-char (codepoint)
   (or (decode-char 'ucs codepoint) ?~))

But keeping the function around wasn't helping either.  Except, when I
tried it again, it worked, so I must have gotten something wrong.

This code seems a little more readable than the code you
supplied...  but they seem to have the same effect.

Anyway, your advice got me past whatever I was stumbling over.

Can you suggest something that will work on this content from the
gnu.org homepage?  Neither the w3m code nor your code seems to produce
human readable output on this stuff (maybe I'm missing some fonts or
something?).  I get a bunch of control-at characters... (oh yeah,
after modifying the "[0-9]" to be ".....".

  [ Az@rbaycanca | Bahasa Indonesia | Bosanski | Catal`
  | &#x7b80;&#x4f53;&#x4e2d;&#x6587; |
  &#x7e41;&#x9ad4;&#x4e2d;&#x6587; | Cesky | Dansk |
  Deutsch | English | Ellynika' | Espaqol | Frangais
  | Hrvatski | Italiano | E+B+R+J+T+ |
  &#x65e5;&#x672c;&#x8a9e; | &#xd55c;&#xad6d;&#xc5b4; |
  Magyar | Nederlands | Norsk | Polski | Portugujs |
  Rombna | Russkij | Srpski | Shqip | Suomi |
  Svenska | Tagalog |
  &#x0e20;&#x0e32;&#x0e29;&#x0e32;&#x0e44;&#x0e17;&#x0e22; |
  T|rkge | Tie>'ng Vie>-.t | Ukrayins'ka ]
 
 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]