[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aspell-user] Problems with Arabic.
From: |
Mohammed Sameer |
Subject: |
Re: [Aspell-user] Problems with Arabic. |
Date: |
Sun, 5 Mar 2006 11:03:32 +0200 |
User-agent: |
Mutt/1.5.11+cvs20060126 |
On Sat, Mar 04, 2006 at 09:59:18PM -0700, Kevin Atkinson wrote:
>
>
> On Sun, 5 Mar 2006, Mohammed Sameer wrote:
>
> >Hi,
> >
> >I've created a simple wordlist for Arabic.
> >It contains +40,000 just to test aspell and Arabic.
> >
> >Looks like everything is fine with aspell from the command line, But
> >when using abiword or any other graphical tools to generate the
> >suggestions, I find that the suggestions are Latin letters not Arabic
> >words.
> >
> >I had to encode the files in ISO 8859-6 as aspell didn't accept UTF-8
> >for the data files. I think this might be the source of the problem but
> >I can't be sure.
> >
> >Now my question is: How can I force the output from libaspell to be
> >UTF-8 ? I tried the "data-encoding utf-8" in the ar.dat file but it
> >didn't work.
>
> You can't really "force" Aspell to output UTF-8. Aspell will output what
> every encoding the application ask it to. if it ask's for "utf-8" it will
> get it.
>
Abiword is using enchant, I had a look at enchant source code.
enchant is asking libaspell to output in utf-8 but it's not working.
The point is that it's working with other languages "Otherwise people might've
complained"
but not with Arabic, That's why I'm a bit lost.
> What encoding is your word list in you used to generate the dictionary?
iso 8859-6
> It needs to be in the same encoding the "data-encoding" is in. It can be
> ISO 8859-6 and Aspell will still output UTF-8 when an application asks for
> it as Aspell will convert the output to UTF-8.
I've uploaded all the files here: http://www.foolab.org/aspell.tgz
Can you please have a look ?
Here's how I generated the ar.rws file:
aspell --lang ar create master ./ar.rws < wordlist
Many thanks,
--
GNU/Linux registered user #224950
Proud Egyptian GNU/Linux User Group <www.eglug.org> Admin.
Life powered by Debian, Homepage: www.foolab.org
--
Don't send me any attachment in Micro$oft (.DOC, .PPT) format please
Read http://www.gnu.org/philosophy/no-word-attachments.html
Preferable attachments: .PDF, .HTML, .TXT
Thanx for adding this text to Your signature
signature.asc
Description: Digital signature