help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Unidecode" functionality in Emacs


From: John Mastro
Subject: Re: "Unidecode" functionality in Emacs
Date: Tue, 20 Mar 2018 10:23:03 -0700

Eli Zaretskii <eliz@gnu.org> wrote:
>> There are "Unidecode" packages for Perl[1], Python[2], and Emacs[3]
>> (derived from one another in that order). They each transliterate
>> Unicode text to ASCII, e.g.:
>>
>>     (unidecode "Déjà vu")
>>     ;=> "Deja vu"
>>     (unidecode "北亰")
>>     ;=> "Bei Jing "
>>
>> Does Emacs have equivalent functionality built-in?
>
> It's possible to remove accents (the first example) using the
> functionality in ucs-normalize.el.  Some transliteration is possible
> for scripts for which there exists a "transliteration" input method,
> using the code by Michael Welsh Duggan posted here:
>
>   http://lists.gnu.org/archive/html/emacs-devel/2018-02/msg00387.html
>
> For example, you can transliterate Cyrillic text using the
> cyrillic-translit input method that comes with Emacs.  But there are
> no general-purpose transliteration capabilities in Emacs, AFAIK.

Thanks, I'll take a look at those.

> However, it looks like the Perl package is just a huge database of
> precomputed transliterations, in which case doing the same in Emacs
> Lisp should be almost trivial.

Yep, that's how the Emacs package works too. It boils down to 25 lines
of Lisp[1] plus the database[2].

Thanks

        John

[1]: 
https://github.com/sindikat/unidecode/blob/5502ada9287b4012eabb879f12f5b0a9df52c5b7/unidecode.el#L56-L82
[2]: 
https://github.com/sindikat/unidecode/tree/5502ada9287b4012eabb879f12f5b0a9df52c5b7/data



reply via email to

[Prev in Thread] Current Thread [Next in Thread]