aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell-devel] Re: ASpell


From: Barry Cavanaugh
Subject: [aspell-devel] Re: ASpell
Date: Wed, 10 Jan 2007 20:43:02 -0500

On 1/10/07, Kevin Atkinson <address@hidden> wrote:

Please post this to address@hidden.  And I will respond there so
others can benefit from my response.

If you are not subscribed I will approve you mail within a day.


Edited and refined for public audience and clarity, as in no way was this an attempt to show superiority. I made the suggestion in the hopes that it may help you break out of a mold if possibly you were trapped in one. Since you have felt it is worth repeating I will elaborate, and make my thoughts more clear so that there is no misunderstanding between us or anyone else for that matter.

I was downloading ASpell and looking at your project page. I have done some coding in higher level languages and am good at figuring things out. I'm not bragging, I'm just saying though I can be a bit of a duffus in many ways I have found that I can solve problems others can't, or can sometimes offer a fresh perspective. I am certain I do not have the experience to speak as an authority

I was thinking about the desire to scan multiple languages somewhat concurrently. Your statements showed you felt that the code was getting very complicated to accomplish the task at hand. Hence as I thought about it, and have seen in such cases is the need to simplify and modularize the code. Doing so makes the desired results more fathomable and the seemingly impossible becomes possible.

I'm shooting in the dark here but please bear with me. Your processes can be grouped follows, identify misspelled words, find suggestions and presenting the results. The presentation code needs to be handled completely separately from your two main processes, checking for misspellings and offering suggestions. Yes I know I am oversimplifying but this is necessary to rethink the processes.

The text to be checked goes through the process with a "header", identifying the language and from makes it possible for the sending process to reclaim it and correctly handle its position as well as the results as respect the submitted language. So the presentation layer is holding additional information and needs to be restructured accordingly. Since this more presentation structure by necessity becomes more complicated it in fact also becomes somewhat more trivial to handle multiple languages at the same time.

In other words the spell check and suggestion process, even if combined, get the data stream and the language concerned at the same time and then expects to find the optimized processed data already in place. In other words the data storage structure is also tagged with identifying information. The optimized data tables are accessed or queried with the tagging information also considered.

The idea is to streamline and dumb down the lookup procedure in a sense. I guessing by the tone of your reply that you feel these processes can't be separated, and come to think of it the extra information would not be understood by the calling application. That of course does not include your back end tagging and really then calls for additional tags that the calling application may not understand.

Hence this is an advanced mode that can be dropped and the smart application would call the process once per language. The language tag is included in the response or dropped as necessary.  This way the receiving application can then table the data appropriately. Your data optimization is language specific and hence your working data for the most part then needs to be split per language to be efficient and full featured. Your accomplishing this in the back end makes it reasonable and possible for the calling application to call ASpell on a per language bases and hence edit a multi language document. Of course if the ASpell end gave consideration to primary language and secondary language then the application could look much smarter and approach the feature set of a single language document.

The spell check process should be fed the language, file and other preference flags and it then marks misspellings. This process spell check is then ready for the next file. Now here is why I want to break apart from the suggested listings. The page could then be submitted for the secondary language. You now have two masks for the same file. Remove all misspellings that are correct in either one of the two languages. In the presentation you highlight the language by color so that if the writer accidentally slips into Russian in a English sentence or a misspelled English word mimics a Russian he can identify it because the highlighting color changes. This leaves open the possibility too of selecting the word and choosing "Check Selection in English | Russian".

Now take all your misspelled words and get the suggested replacements in the primary language, do the same with the secondary language and combine without care as to which is which as that was never determined . You could though, in your presentation data highlight the language surrounding, proceeding or following. In other words stage one is leave the misspelled words without the language of origin highlighting. Next you could make a few rules as best guess what language and someday write a new algorithm that syntactically considers the context.

Anyway it was neat to consider and I hope you found it entertaining,
Barry
reply via email to

[Prev in Thread] Current Thread [Next in Thread]