help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Automatic recognition of some specific coding systems


From: Jürgen Hartmann
Subject: RE: Automatic recognition of some specific coding systems
Date: Fri, 27 Feb 2015 13:12:46 +0100

Thank you, Yuri Khan, for widening the perspective:

> The general problem you’re solving is that of encoding detection.
> There exist ready-made solutions for that, e.g. by computing byte
> frequencies and matching them against known character frequencies in
> your language. One of these is called enca.
>
> Googling for “emacs enca” yields a post by Dmitriyi Paduchikh in
> gnu.emacs.sources, dated 2007.
>
> https://lists.gnu.org/archive/html/gnu-emacs-sources/2007-06/msg00037.html

To use Google is always a good advise that I will gratefully follow
once more with respect to this broader background.

Actually I didn't know Enca at all up to now: A language based attempt
to recognize encoding is an interesting idea.

Unfortunately, Enca can not be used in my special case, because--I
didn't mention this before, sorry--the text files to handle are mostly
in English and German. For the former ones encoding is not an issue,
and for the latter the language German is not supported by Enca.

Enca 1.14 for example only supports

   Belarussian
   Bulgarian
   Czech
   Estonian
   Croatian
   Hungarian
   Lithuanian
   Latvian
   Polish
   Russian
   Slovak
   Slovene
   Ukrainian
   Chinese

But for people that use any of these languages this might be a
promising option.

Apart from that--and this might be helpful in my case also--the idea
to use an external software to detect encoding is very charming, and
maybe it is possible to adapt the lisp snippets contained in your link
to other programs. E.g.

   find -bi ...

is capable to identify file encodings although it recognizes cp850
rather non-specifically as "unknown-8bit".

So thank you very much for your suggestions.

Juergen

                                          

reply via email to

[Prev in Thread] Current Thread [Next in Thread]