bug-ocrad
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-ocrad] Request about adding more characters


From: Donald Rogers
Subject: [Bug-ocrad] Request about adding more characters
Date: Mon, 10 Jan 2005 08:32:37 +1300
User-agent: Mozilla Thunderbird 0.8 (X11/20041020)

I have recently started using ocrad for OCR of English texts. I am impressed with it - partly because it handles UTF-8 text. IMHO any OCR program that does not handle Unicode characters is useless. I would like to use ocrad for OCR of Esperanto texts. What is involved with adding the recognition of extra characters to ocrad? I have looked up the Unicode values of all the accented Esperanto letters and here they are in the format used in file ucs.h:

Unicode characters for Esperanto:
CCCIRCU = 0x010C, // latin capital letter c with circumflex
SCCIRCU = 0x010D, // latin small letter c with circumflex
CGCIRCU = 0x011C, // latin capital letter g with circumflex
SGCIRCU = 0x011D, // latin small letter g with circumflex
CHCIRCU = 0x0124, // latin capital letter h with circumflex
SHCIRCU = 0x0125, // latin small letter h with circumflex
CJCIRCU = 0x0134, // latin capital letter j with circumflex
SJCIRCU = 0x0135, // latin small letter j with circumflex
CSCIRCU = 0x015C, // latin capital letter s with circumflex
SSCIRCU = 0x015D, // latin small letter s with circumflex
CUBREVE = 0x016C, // latin capital letter u with breve
SUBREVE = 0x016D, // latin small letter u with breve

I noticed in the ocrad source code that there are already some characters with breves and some with circumflexes. Would it be a big job for you to add the extra 12 characters? The Esperanto letters are also in ISO-8859-3. I can send you a list of their codes in this set too if you wish. I could also send a file or two of scanned Esperanto text in say PBM format, with the 12 letters: ĈĜĤĴŜŬ ĉĝĥĵŝŭ.

Donald Rogers
New Zealand





reply via email to

[Prev in Thread] Current Thread [Next in Thread]