aspell-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aspell-user] aspell_speller_add_to_personal() doesn't accept hyphen


From: Michael Bunk
Subject: Re: [Aspell-user] aspell_speller_add_to_personal() doesn't accept hyphens
Date: Mon, 15 Aug 2005 08:36:05 +0530
User-agent: KMail/1.5.4

Hi Gary,

thanks for your reply, though you didn't make it too easy for me :)

> Why are you using such an early version? I had issues with that
> aspect of aspell some time ago. I thought they were resolved.

I have tried 0.60.3 now, but the behaviour is the same.

> You can see what I did in this project:
> http://sourceforge.net/projects/descdatadiary/
> It is a windows project. It does not try to do a Unix install,
> but you may find what I did in speller_impl.cpp interesting.

I have seen that you modified
aspell-0.60.1-win32/modules/speller/default/speller_impl.cpp
by implementing 3 new functions:

int aspell_speller_word_seperator_length(speller, char *)
It returns the number of bytes till the next word character, using the aspell 
internal function !lang_->is_alpha().

int aspell_speller_word_length(speller, char *)                                 
                    
It returns the number of bytes till the next non-word character, using 
lang_->is_alpha() as well.

aspell_speller_add_lower_to_personal()
This adds a lowercased version of the given string to the personal word list. 
I guess you implemented this for capitalized words at sentence starts?

The problem I see with this approach is that you modified aspell internal 
functions. But since I want to use aspell as a library, such modifications 
are ruled out.

While looking through the code I found that aspell implements a Tokenizer 
class, which seems to be designed to do the same. It is not exported, but
it is used by the DocumentChecker class. Maybe I should try to use that?

But its documentation in aspell.h is confusing (besides being misspelled :):

/* process a string                                                             
                    
 * The string passed in should only be split on white space                     
                    
 * characters.  Furthermore, between calles to reset, each string               
                    
 * should be passed in exactly once and in the order they appeared              
                    
 * in the document.  Passing in stings out of order, skipping                   
                    
 * strings or passing them in more than once may lead to undefined              
                    
 * results. */                                                                  
                    
void aspell_document_checker_process(struct AspellDocumentChecker * ths, const 
char * str, int size)

Does it mean I have to split my string to be checked at white space before 
passing in the pieces to this function? Or does it mean that this function 
usually only splits at white space?

Kindest regards,
 Michael





reply via email to

[Prev in Thread] Current Thread [Next in Thread]