aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [aspell-devel] Thoughts on using aspell for Indian language ing


From: gora
Subject: Re: [aspell-devel] Thoughts on using aspell for Indian language ing
Date: Mon, 13 Nov 2006 20:16:19 +0100

On 6:37:15 pm 11/13/06 "Ethan Bradford" <address@hidden> wrote:
> Kevin, one small fact on Indic graphology that may help: consonants
> have an intrinsic vowel (an "a"), so "ka" is one Unicode character.
> There are then combining vowels, so to write "ko", you use "ka" plus
> the combining "o".  To get pairs of consonants, you need to suppress
> the inherent vowel, which is what the halant does.  Thus, "kra" is
> "ka + drop-the-a + ra".

That is indeed correct. I should have included that, but am so used
to this that I assumed that everyone knew what I was talking about.
Thus, the "syllable" that I think that we need to operate on is a
consonant (or a conjunct formed by combining consonants with a halant),
along with any vowel modifiers. Also, while most Indian languages have
a set of allowed conjuncts, it is possible to create new conjuncts
arbitrarily by combining consonants, using a halant. This creation of
new conjuncts is often needed to spell words borrowed from English, and
other languages. Thus, the set of possible conjuncts is not bounded.

> Gora, how do Hindi keyboards support the entry of halant?  If
> entering the halant is just another keystroke (so the codes are
> entered as they are stored in Unicode), then why wouldn't the
> transposition of a halant with another keystroke be just as likely as
> any other transposition?  "Teh" makes no sense whatever in English,
> but I type it often enough.  Or are there separate keystrokes for the
> half-width (i.e. vowel-less) consonants, which automatically add a
> halant?  If that's the case, then we want to treat "ka+halant"
> special, not "kra".

That depends on the keyboard input method, of which there are a variety
to choose from. There are two main classes of input methods, (a) phonetic
(well, actually pseudo-phonetic, as they use some kind of transliteration
scheme from English), and (b) non-phonetic mappings that aim for
efficiency in typing, and where the keyboard layout has nothing to do
with the sound of the character. For the first type, e.g., with the
ITRANS scheme, a single keystroke can produce a half-consonant, e.g.,
the English letter 'k' produces "ka + halant". The non-phonetic mappings
have a separate key for the halant.

  I believe that there are at least two levels at which spelling errors
are made: At the mental level while composing sentences in your mind
(such as "occasion" mispelled as "ocassion"), and at the typing level
where the wrong character is typed ("teh" for "the", as you note). While
typing errors certainly do need to be accounted for, I would argue that
in this case, where an Indian language conjunct is involved, a typing
error leading to a transposition of a halant is less likely, as the
glyph would change leading to more of a visual feedback (at least if
one hunts and pecks, like me). E.g., "the" does not look too different
from "teh", but "च्कर" and "चक्र"  in my earlier example
do.For errors at
the mental level, I believe that it is well-established that such
confusion occurs between similar-sounding words. Therefore, my guess
would be that, in general, it is possible to confuse a conjunct with
another, or with the two (or more) consonants making up the conjunct,
but it is unlikely that a conjunct would be confused with a single
consonant, as they would sound quite different, and hence not be
remembered as similar.

  I was driven to think about this, because we have been trying out
aspell with new rules for Hindi, and the results have been counter-
intuitive. I also think that it is also possible that syllables
instead of characters need to be used when the scores are refined with
try_split(), try_ngram(), etc., but I would need to understand the
working of these functions better before I can make any kind of a
definitive statement. Admittedly, all this is anecdotal at this point,
and we need to do some quantitative measurements with a decently-sized
test kernel of mispelled words like Kevin has done for English. I
am also enlisting Hindi linguists, and those from other languages to
help design an Indian language spellchecker. I will make the design
work available on a public Wiki.

Regards,
Gora





reply via email to

[Prev in Thread] Current Thread [Next in Thread]