[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Spellcheck against multiple dictionaries?
From: |
Sergei |
Subject: |
Re: Spellcheck against multiple dictionaries? |
Date: |
Thu, 19 Mar 2009 02:30:35 -0700 (PDT) |
User-agent: |
G2/1.0 |
---- martin:
>> I've downloaded speck.el file, but I'm not sure how do I use it.
>> I've created a test file containing mixed correct and incorrect
>> words, in Russian and English:
>> Test тест correct очепятка incorect верно
>> Then I've done M-x speck-mode. Emacs said that Speck-mode has been
>> activated and is using ru_RU dictionary, but nothing has changed in
>> the test buffer. From your description I was expecting that the
>> incorrect words would be highlighted somehow. Am I missing
>> something?
I do not know about speck-mode, but at least ispell.el would pick up
only what looks like a word in the currently enabled language; only
such words are recoded according to the current ispell dictionary
requirements and passed to the ispell process.
This means that "Test" is skipped in the Russian mode (just like
=%==!!.... etc); and conversely, очепятка and верно are skipped in a
Latin-alphabet context. And this is really convenient. (While the
users of Latin-alphabet languages should stumble at any foreign word.)
> I don't have a Russian spell-checking engine installed so I can't
> comment your example directly. Suppose I have a file with the line
> Test Test correct Duckfehler incorect richtig
> Doing M-x speck-mode here starts an Aspell process checking with my
> default language which is English, flagging the last three words as
> incorrect. I can now set the region around the word "Duckfehler"
> and type C-2 C-? to set the speck language text property of that
> word to German, which will still flag the word as incorrect but now
> with the appropriate German suggestions how to correct it.
There are some formal text (like html or xml) which allow for a
language markup. Something like
,----
| correct <i lang="de">Duckfehler</i> incorect <i lang="de">richtig</
i>
`----
>> I think that the ispell-ish behavior would indeed be nice. I've
>> looked through the ispell code, and it looks like Emacs raises some
>> kind of exception if the ispell process returns "invalid"
>> status. Do you think it is possible to fallback to another
>> dictionary on such an event?
> With my Aspell engine I can write (and bind) a trivial command like
> (defun ispell-check-word (arg)
> (interactive "p")
> (if (= arg 2)
> (ispell-change-dictionary "de_DE")
> (ispell-change-dictionary "en_US"))
> (ispell-word))
> here and probably get what you want. Note, however, that each time you
> change the language with this command, Emacs kills an old and spawns a
> new process of the Aspell engine.
Yes, because everything has to be changed: the filtering rules, the
affix grammar, the word provision.
> Changing `ispell-word' as you say seems hardly possible because in
> general there's no way to distinguish a word written incorrectly in
> language A from a word written correctly in language B. For the
> special English/Russian case you could probably investigate the
> character properties at `point' and spark the appropriate
> word-checking process.
In principle one could create a combined grammar for Russian and
English; actually it would be a "direct sum" of the two grammars,
as the word spaces are completely disjoint because the alphabets are
disjoint. Such a combined processor exists in TeX for a combined
English-Russian hyphenation. It would be more efficient too, because
there would be no need to spawn a new process at every change from
Russian to English.
But presently it would be easier to use a two-pass approach:
1. check the Russian spelling (ignoring all Latin characters);
2. check the English spelling (ignoring all Cyrillic characters)
Both passes are faster then in a switching mode -- and no extra work
is required. Besides, you could spellcheck the Russian+French or
Russian+German combinations (but not Russian+French+English, of
course; while Russian+German+Armenian is still possible).
--
Sergei