dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DotGNU]GNU.RDF update 29-3-2003


From: Peter Minten
Subject: Re: [DotGNU]GNU.RDF update 29-3-2003
Date: Wed, 02 Apr 2003 19:49:58 +0200

Chris Smith wrote:
> 
> On Tuesday 01 Apr 2003 09:29, Peter Minten wrote:
> 
> > Good idea. Would the plugin system make it do the VRS thing?
> >
> > Now here's a cool idea: run the server as a DGEE plugin and the agents as
> > local webservices (note that that does not mean the agents have to be
> > programmed in C#). The agents could communicate with the DGEE through
> > GNU.RDF API, an advantage of this system would be a good integration of the
> > agents with the rest of DotGNU.
> 
> I think that is what I was getting at.  I hadn't necessarily considered that
> you'd write the 'agents' as webservices that run in one of the VMs.  The DGEE
> itself has (or will have soon) _internal_ webservices used for administrative
> purposes, these internal webservices are actually part of the DGEE
> infrastructure and are written in C (99% probability).

Internal webservices, cool :-). That would also be a good (and fast) option for
the agents.

> You could write a DGEE module or two that makeup your discovery server.
> They're just like the ServiceManager or ResourceManager of the DGEE.  They
> have a role to perform and called when requred - like when a request comes in
> for a 'Discovery Webservice', it is routed to your DGEE component instead of
> the VM pool.

Yeah.

> This next bit assumes that when you call the DGEE 'discovery' webservices it
> searches its LOCAL copy of the worlds available 'stuff'... This assumption
> may be wrong, so read the following appropriately pls. :o)

90% of the queries for specialised servers are focussed, the other 10% could be
ignored by specialised servers. It's more of a hack than a real solution, but it
should keep the general traffic away from the discovery servers. The trick is
determining when a query is unfocussed.

Thus for now it's better to have a specialized RDF server than a general one.

> Creating the collection of data that your discovery server searches against
> and serves to requestors needs to be done seperatly, by some automatic
> process or other.  This _can_ be integrated into the DGEE too but it is
> extrememly important that you consider these to aspects to data capture
> *seperatly* if you want to do any of the above.
> 
> Data capture and searching must be just that.  Two independant steps:
> 1. discover whats out there and store that information.

This is a matter of metanodes. Metanodes are RDF servers that only contain
information about which external resource is backlinked to what other external
resource. Say for example I put up RDF data about how bad my government is doing
and have a link to their website, I can't expect them to link to my data so it
will be hard to find. If I register the link from my data to the government site
at a metanode then it will become more easy to find.

The idea here is that 90% of the objects (the things a triple points to) are
located on the same server. Thus those won't need to be accessible from a
central location. The only things you need to store centrally are the actual
links that lead interested persons to your metadata. Some description also needs
to be stored.

In my vision the metanodes are VRS servers that spend some of their power on
searching the Semantic Web using automatic update notification (something which
can be implemented as a agent and is the basis of news feeds in GNU.RDF).

> 2. Search against the stored information.

Easy. Searching against stored information is simply a matter of some SQL
queries against the database.

> .... if however, the called DGEE discovery service is supposed to go off and
> search the web for stuff, then that's a different matter all together - and I
> don't know how (regardless of the DGEE) that would be achieved successfully
> and in a timely fashion anyway.

Me neither, this is the Linking Problem again. If there is enough metadata you
could simply walk down a path with no uncertaincies, one node would simply point
to the next, this could be done pretty fast with a binary protocol (that's why I
like the idea of a binary protocol). However if you need to search for something
that does not have a direct path going to it the search boils down to a complete
search of all the RDF servers. Of course it would be possible to use some
techniques to reduce the search time but even then it would be hard.

One promissing solution to the linking problem is the VRS. The VRS could host
gigantic databases without one person having total control, if enough people
participate in one or a few mega servers the linking problem would become much
more solvable. This means the communication inside the VRS must be very very
fast though. Btw, I'm not just meaning the metanode information, but also normal
information.

All in all my problem boils down to this: the Linking Problem becomes more of a
problem when there are more servers in the Semantic Web, the less servers there
are though the more power the server owners have and that's bad.

Still the Linking Problem stays pesky, and I'm beginning to believe it called it
off over myself because of the URI based resolver system that determines the
server a resource is on based on the URI. This makes it hard to proxy things.
It's the fastest resolver I can think of though, all the others are a whole lot
slower.

I think I'll have to create an GNU.RDF.RFC system to work out all these
solutions to the Linking Problem ;-).

Greetings,

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]