[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-ocrad] Re: Request about adding more characters
From: |
Antonio Diaz Diaz |
Subject: |
[Bug-ocrad] Re: Request about adding more characters |
Date: |
Mon, 10 Jan 2005 12:40:57 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.3) Gecko/20040913 |
Hello Donald. Thanks for your interest in Ocrad.
Donald Rogers wrote:
I have recently started using ocrad for OCR of English texts. I am
impressed with it - partly because it handles UTF-8 text. IMHO any
OCR program that does not handle Unicode characters is useless.
Well, my life would be a lot easier with an 8-bit charset, but people
can't stop inventing letters. ;-)
I would like to use ocrad for OCR of Esperanto texts. What is
involved with adding the recognition of extra characters to ocrad?
A lot of work. I have hacked ocrad too much adding new characters. I
have to rewrite some things to add support for a new charset (iso-8859-3
in this case). (Offtopic note: Given that Zamenhof invented Esperanto,
why did he choose accented letters instead of, say, the Latin alphabet?).
I noticed in the ocrad source code that there are already some
characters with breves and some with circumflexes.
Yes, some from iso-8859-15 and some from iso-8859-9.
I could also send a file or two of scanned Esperanto text in say PBM
format, with the 12 letters:
Please, send them. I will try to add Esperanto support in version 0.12
of ocrad.
Regards,
Antonio Diaz.