aramorph-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aramorph-users] XML tables


From: Pierrick Brihaye
Subject: Re: [Aramorph-users] XML tables
Date: Wed, 15 Jun 2005 10:14:16 +0200
User-agent: Mozilla/5.0 (Windows; U; Win98; fr-FR; rv:1.7.8) Gecko/20050511

Hi,

Ahmed El-dawy wrote:

  I've attached a supposed format for dictionary and compatiblity tables.
You will find two .dtd files one for dictionary and the other for
compatibility tables. Also you wil find some .xml files as examples.
Once we agree on some xml structure I can write a small class to
transform dictionaries to the new format.

Fine. Here are my comments :

I start with prefix.xml :

<entry>w</entry> : why not "unvocalized" ?
<voc>wa</voc> : why not "vocalized" ?
<morpy>Pref-Wa</morpy> : why not "morphology" ? Or "morphological-category" to be in sync with the english docs ;-)...
<gloss>and</gloss>
<pos>wa/CONJ+</pos> : may be "grammatical category"...

Well, as you can see, I like verbose XML :-)

Also, we could use the capabilities of XML :

<gloss>and + by/with</gloss>

May be changed to :
<glosses>
  <gloss>and</gloss>
  <gloss>by/with</gloss>
</glosses>

Similarly :

<pos>wa/CONJ+bi/PREP+</pos>

May be changed to :
<grammatical-categories>
 <grammatical-category>wa/CONJ</grammatical-category>
 <grammatical-category>bi/PREP</grammatical-category>
</grammatical-categories>

And, of course, the arabic words sould be encoded... in arabic.

Regarding, the stems dictionary, the format has to be slightly different because we have additional information (see http://www.nongnu.org/aramorph/english/dictionaries.html) :

<root>ktb</root>
<lemmaID>katab-u_1</lemmaID>

and, maybe, a "normalised" lemma
<lemma>katab</lemmaID>

Regarding the compatibility tables, something like this would be nice :

<compatibility-table>

  <compatibility>
    <prefix>Pref-0</prefix>
    <stem>FW</stem>
  </compatibility>

  <compatibility>
    <prefix>Pref-0</prefix>
    <stem> FW-Wa</stem>
  </compatibility>

  <compatibility>
    <prefix>Pref-0</prefix>
    <stem>FW-WaBi</stem>
  </compatibility>

...

</compatibility-table>

And, of course, we may merge the 3 ones.

What do you think ?

Cheers,

--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:address@hidden
+33 (0)2 99 29 67 78




reply via email to

[Prev in Thread] Current Thread [Next in Thread]