aramorph-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aramorph-users] XML tables


From: Ahmed El-dawy
Subject: Re: [Aramorph-users] XML tables
Date: Thu, 16 Jun 2005 15:31:03 +0300

Hello,
> We may also go further however by introducing <prefix-pos/gloss>,
> <stem-pos/gloss> and <suffix-pos/gloss> which are *very* accurate in the
> stems dictionary. What's your mind on this point ?
> 
I don't get this point :(
Really I don't know the use of POS.
Can you please describe this point more clearly.

Please forgive me in the next days as I am completing my graduation
project. It is a client/server messenger using java. I will have
completed it by 5/7/2005. After that, I will complete working with you
in AraMorph.
Thanks very much
_____________________________________________________________
On 6/15/05, Pierrick Brihaye <address@hidden> wrote:
> Hi again,
> 
> Ahmed El-dawy wrote:
> 
> > Actually I made this part like this because it is so similar to this
> > of the sample at LDC.
> 
> The problem is that people from the LDC know what they are talking about :-)
> 
> > Also it keeps the xml files small if we consider
> > this matter.
> 
> You're right, but since we will provide the files in a jar file, the
> redundancy (and thus the compression rate) will be roughly the same.
> 
> > Anyway, it is changed now to be more readable. Actually,
> > I don't know the meaning of the word (pos) till now :)
> 
> POS = "part of speech".
> 
> >>May be changed to :
> >><glosses>
> >>  <gloss>and</gloss>
> >>  <gloss>by/with</gloss>
> >></glosses>
> >
> > Yes, you are right at this. I've changed it in the new version.
> 
> Thank you. On this point, my implementation differs from the original
> one because I try to "shift" prefixes and suffixes from the stem
> definition (and may even generate words with a NO_STEM type, eg. bihi,
> fyha...). Splitting the values in the XML files makes this process more
> obvious IMHO.
> 
> We may also go further however by introducing <prefix-pos/gloss>,
> <stem-pos/gloss> and <suffix-pos/gloss> which are *very* accurate in the
> stems dictionary. What's your mind on this point ?
> 
> >>And, of course, the arabic words sould be encoded... in arabic.
> >
> > I will do it after making the xml files, maybe at the same program who
> > translates current dictionary to xml files.
> 
> It would be a good idea.
> 
> > By the way, there's a
> > problem if we transformed to xml using the transliteration. One symbol
> > used is (>) which is already used for closing tag names in XML. We
> > will have to transform this into &gt;.
> 
> Correct. This shouldn't be a problem if the files are generated through
> a SAX parser that will handle this escaping automatically.
> 
> >>Regarding, the stems dictionary, the format has to be slightly different
> >>because we have additional information (see
> >>http://www.nongnu.org/aramorph/english/dictionaries.html) :
> >>
> >><root>ktb</root>
> >><lemmaID>katab-u_1</lemmaID>
> >>
> >>and, maybe, a "normalised" lemma
> >><lemma>katab</lemmaID>
> >>
> > I know that the lemma is the one at a line starting with two
> > semicolons (;;), but what is this root?
> 
> ;--- ktb
> 
> We first have to check if this formalism is consistent throughout the
> stems dictionary...
> 
> >>Regarding the compatibility tables, something like this would be nice :
> >>
> >
> > See the current version (attached) and tell me
> 
> Since the DTDs are very short, you shoud embed then in the XML files.
> See http://www.thescarms.com/XML/DTDTutorial.asp.
> 
> Regarding :
> 
> <grammatical-categories>
>   <grammatical-category>wa/CONJ</grammatical-category>
> </grammatical-categories>
> 
> we may consider a :
> 
> grammatical-categories|grammatical-category
> 
> content-model. I don't know if it's accurate though since such a less
> verbose format may introduce unnecessary processing difficulties.
> 
> Well, using an XML file format would greatly help us in providing a Web
> interface that could allow adding new words in the dictionaries.
> 
> Cheers,
> 
> --
> Pierrick Brihaye, informaticien
> Service régional de l'Inventaire
> DRAC Bretagne
> mailto:address@hidden
> +33 (0)2 99 29 67 78
> 
> 
> _______________________________________________
> Aramorph-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/aramorph-users
> 


-- 
Regards,
Ahmed Saad




reply via email to

[Prev in Thread] Current Thread [Next in Thread]