[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freebangfont-devel] Re: [Issue N22662] Bengali rendering bugs in
From: |
Deepayan Sarkar |
Subject: |
Re: [Freebangfont-devel] Re: [Issue N22662] Bengali rendering bugs in Qt 3.2 beta |
Date: |
Wed, 21 May 2003 13:16:19 -0500 |
User-agent: |
KMail/1.5.1 |
Hi,
wow, that was fast ! I'll report on the rest when I finish compiling (which
may take a while), but let me answer you on the a + ya-phala thing.
[Others on the FBF list, please correct me if I have said something wrong]
On Wednesday 21 May 2003 05:01, address@hidden wrote:
> > 2. a + ya-phala
> > ===============
> >
> > The sequence 0985 09CD 09AF 09BE is not rendered correctly.
> > (Microsoft's engine doesn't render this correctly either yet, but it
> > will.)
> >
> > I quote from http://www.unicode.org/faq/indic.html#13
> >
> > ------------
> >
> > Q: What are the Bengali characters used to transcribe the sound "a"
> > (as in English "bat") in Unicode?
> >
> >
> > A: In Bengali, the sequence "zophola" (U+09CD U+09AF) + the "aa" matra
> > (U+09BE) is used for transcribing the English "a" in "bat". This
> > zophola_aa can be seen as a special "composite" matra to write a
> > new Bengali sound, imported from English. Represent these sequences
> > using a halant (virama):
> >
> > Vowel_A_zophola_AA = 0985 09CD 09AF 09BE ( a- halant ya -aa )
> > Vowel_E_zophola_AA = 098F 09CD 09AF 09BE ( e- halant ya -aa )
> >
> > If you need to add a candrabindu or other combining mark in the
> > sequence, represent the sequence as:
> >
> > Vowel_A_zophola_AA + candrabindu = 0985 09CD 09AF 09BE 0981
> > ( a- halant ya -aa candrabindu )
> >
> > --------------
>
> Ok, that's something new for me. I always thought combinations of
> independent vowels+halant were forbidden, and that's how I handled it
> in Qt. Looks like I need an exception for bengali.
You are correct, from a linguistic point of view.
The problem was that there was no official way to write english words like
'at' (because there was no vowel with the correct sound -- I believe
Devanagari doesn't have one even now). I guess some smart guy decided that
Bengali should have the ability to do this, and essentially added a new vowel
to the language. Unfortunately, no new character was created to represent
this vowel, instead two completely arbitrary combinations of existing symbols
were assigned to represent this sound, namely what are referred to above as
Vowel_A_zophola_AA and Vowel_E_zophola_AA
Both are completely illegal constructs in classical Bengali.
> The faq entry you quote is not 100% clear to me. Does this mean any
> combination of
>
> independent vowel + halant + ya + -aa
>
> forms a valid syllable in bengali?
No, as far as I know, no other vowel should have this construct (but such
combinations would be illegal anyway, so personally I wouldn't care how they
are rendered).
> What about the general
> vowel + halant + consonant + matra
> case?
Nope, these should be illegal as well. Basically (as far as I know)
Vowel_A_zophola_AA and Vowel_E_zophola_AA are two very explicit exceptions to
the otherwise correct general rule you already have. (These 2 can be followed
by combining marks like candrabindu, bisarga, etc, but I don't think that's
an issue here.)
> I've worked around this for now by treating an Independent Vowel at the
> start of a syllable identical to a consonant for syllable breaking
> rules in Bengali, so your example renders correctly. I do however not
> know if this breaks anything else.
It shouldn't. The independent vowel + hasanta construct is illegal except for
these 2 exceptions, so they should never occur otherwise in valid bengali
text. I don't know what the unicode/opentype rules are when it comes to
displaying something invalid, but I don't foresee any practical problems.
Deepayan
P.S.: There's another related issue (which I don't think has been completely
resolved yet), which is how to render combinations of
ra + hasanta + ya (09B0 + 09CD + 09AF)
should it be "reph + ya" or "ra + zophola" (the second is rare, but needed
again for writing english words like 'rat'). This is an ambiguity in the
language, and at some point, Unicode should come up with a recommended way to
represent this. I'll let you know when I come to know of anything concrete.