[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aramorph-users] XML tables
From: |
Ahmed El-dawy |
Subject: |
Re: [Aramorph-users] XML tables |
Date: |
Thu, 16 Jun 2005 15:31:03 +0300 |
Hello,
> We may also go further however by introducing <prefix-pos/gloss>,
> <stem-pos/gloss> and <suffix-pos/gloss> which are *very* accurate in the
> stems dictionary. What's your mind on this point ?
>
I don't get this point :(
Really I don't know the use of POS.
Can you please describe this point more clearly.
Please forgive me in the next days as I am completing my graduation
project. It is a client/server messenger using java. I will have
completed it by 5/7/2005. After that, I will complete working with you
in AraMorph.
Thanks very much
_____________________________________________________________
On 6/15/05, Pierrick Brihaye <address@hidden> wrote:
> Hi again,
>
> Ahmed El-dawy wrote:
>
> > Actually I made this part like this because it is so similar to this
> > of the sample at LDC.
>
> The problem is that people from the LDC know what they are talking about :-)
>
> > Also it keeps the xml files small if we consider
> > this matter.
>
> You're right, but since we will provide the files in a jar file, the
> redundancy (and thus the compression rate) will be roughly the same.
>
> > Anyway, it is changed now to be more readable. Actually,
> > I don't know the meaning of the word (pos) till now :)
>
> POS = "part of speech".
>
> >>May be changed to :
> >><glosses>
> >> <gloss>and</gloss>
> >> <gloss>by/with</gloss>
> >></glosses>
> >
> > Yes, you are right at this. I've changed it in the new version.
>
> Thank you. On this point, my implementation differs from the original
> one because I try to "shift" prefixes and suffixes from the stem
> definition (and may even generate words with a NO_STEM type, eg. bihi,
> fyha...). Splitting the values in the XML files makes this process more
> obvious IMHO.
>
> We may also go further however by introducing <prefix-pos/gloss>,
> <stem-pos/gloss> and <suffix-pos/gloss> which are *very* accurate in the
> stems dictionary. What's your mind on this point ?
>
> >>And, of course, the arabic words sould be encoded... in arabic.
> >
> > I will do it after making the xml files, maybe at the same program who
> > translates current dictionary to xml files.
>
> It would be a good idea.
>
> > By the way, there's a
> > problem if we transformed to xml using the transliteration. One symbol
> > used is (>) which is already used for closing tag names in XML. We
> > will have to transform this into >.
>
> Correct. This shouldn't be a problem if the files are generated through
> a SAX parser that will handle this escaping automatically.
>
> >>Regarding, the stems dictionary, the format has to be slightly different
> >>because we have additional information (see
> >>http://www.nongnu.org/aramorph/english/dictionaries.html) :
> >>
> >><root>ktb</root>
> >><lemmaID>katab-u_1</lemmaID>
> >>
> >>and, maybe, a "normalised" lemma
> >><lemma>katab</lemmaID>
> >>
> > I know that the lemma is the one at a line starting with two
> > semicolons (;;), but what is this root?
>
> ;--- ktb
>
> We first have to check if this formalism is consistent throughout the
> stems dictionary...
>
> >>Regarding the compatibility tables, something like this would be nice :
> >>
> >
> > See the current version (attached) and tell me
>
> Since the DTDs are very short, you shoud embed then in the XML files.
> See http://www.thescarms.com/XML/DTDTutorial.asp.
>
> Regarding :
>
> <grammatical-categories>
> <grammatical-category>wa/CONJ</grammatical-category>
> </grammatical-categories>
>
> we may consider a :
>
> grammatical-categories|grammatical-category
>
> content-model. I don't know if it's accurate though since such a less
> verbose format may introduce unnecessary processing difficulties.
>
> Well, using an XML file format would greatly help us in providing a Web
> interface that could allow adding new words in the dictionaries.
>
> Cheers,
>
> --
> Pierrick Brihaye, informaticien
> Service régional de l'Inventaire
> DRAC Bretagne
> mailto:address@hidden
> +33 (0)2 99 29 67 78
>
>
> _______________________________________________
> Aramorph-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/aramorph-users
>
--
Regards,
Ahmed Saad