silpa-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [silpa-discuss] Kannada -English Dictionary bot launched


From: Santhosh Thottingal
Subject: Re: [silpa-discuss] Kannada -English Dictionary bot launched
Date: Mon, 22 Nov 2010 22:39:29 -0800
User-agent: Zoho Mail

---- On Mon, 22 Nov 2010 19:55:18 -0800 Praveen A  wrote ---- 
>Can we think of a plugin to mediawiki that will respond in dict 
>protocol? Mediawiki knows well about the content and instead of trying 
>to make sense at our end it would be better to send better response in 
>the first place. What do you think? 

It cannot be a plugin. As per DICT protocol, the dict server is a server 
listening on port 2658(can be configured) and responds to the requests.

The wiktionary already provides an API for accessing the data.
Try this URL in your browser
http://ml.wiktionary.org/w/api.php?action=parse&format=xml&prop=text|revid|displaytitle&callback=?&page=gold

This is an API which gives the meaning of word 'gold' in xml format. Look at 
the xml data and see how bad it is.

And look at this  English wiktionary xml api
http://en.wiktionary.org/w/api.php?action=parse&format=xml&prop=text|revid|displaytitle&callback=?&page=gold


http://ml.wiktionary.org/w/api.php?action=parse&format=json&prop=text|revid|displaytitle&callback=?&page=gold
 will output in json format. But again the data is in bad shape and difficult 
to parse.

What  required is correcting this API. Instead of giving html markup in the API 
result, we need structured data. But that is not easy until wiktionary defines 
the meaning in structured manner instead of allowing users to add data in any 
format. It more resembles to a wiki page which talks about the meaning of the 
word. This need to be redesigned if  we need the large amount of valuable data 
present in wiktionaries suitable for machine consumption.


Since there are wiktionaries in all languages, building a cross language 
lexicon is a great thing we can derive from this large repository of words. It 
can be in open lexicon standards like http://www.olif.net/ . If the data is POS 
tagged, this is going to be valuable resource for anybody working with machine 
translation or NLP related area.


-Santhosh









reply via email to

[Prev in Thread] Current Thread [Next in Thread]