gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Approaches to maintain clinical data uptime


From: Tim Churches
Subject: Re: [Gnumed-devel] Approaches to maintain clinical data uptime
Date: Mon, 01 May 2006 09:19:05 +1000
User-agent: Thunderbird 1.5.0.2 (Windows/20060308)

Syan Tan wrote:
> what's the principles in the solutions to network partition problem ?  The 
> vector clock seems to imply
> that  sites can get blocked waiting if a site which is more upto date has 
> sent a 
> message and then
> fails for a while.  I was looking for links, and started reading one article 
> found on google, but
> there were so many buzz words , my eyes glazed over, and wondered if it 
> wasn't 
> just advertising for
> an academic :(

We were taking advice from people who had implemented an epidemic
propagation algorithm and logical clocks for update sequencing in Python
in a high-throughput environment but with less than ten nodes (but
frequent network partition). It worked well, apparently, but it was for
an internal solution and was not able to be open sourced. However, they
felt that it could be re-implemented from a clean slate, in Python. An
issue is scaling to a larger number of nodes, as a matrix of clock
vectors needs to be kept, and that grows as the square of the number of
nodes, obviously. I also read several papers, but that was several
months ago and I don't recall the exact details. Advice was that
blocking of updates was not a problem but there are still circumstances
in which human intervention is needed to make a decision about update
ordering, but they should be very few and the computer will probably
guess correctly in such circumstances if you permit it to - but you
can't be certain.

Hopefully we'll be revisiting all this in July - happy to discuss it
more then. The problem with such things is that they really do need to
interact with the user application to some degree, so are difficult to
implement purely on the back-end (as Slony is), but by building them
into the user application they lose all generality. No-one has buuilt
them into open source middleware either, because there is so much
diversity in open source (and commercial) middleware.

If we implement this, then it will probably be in the NetEpi ORM layer,
just below it. That would prevent it from being immediately usable
elsewhere (eg GNUmed) but I dare say it could provide the basis of a
GNUmed implementation if that were desired. Anyway, if we get the chance
to implement it, we'll make it as general as possible, within time and
resource limits. It is a hard problem, it seems, if the very small
number of solutions available is any guide.

But then, perhaps we just worry too much about data integrity abd update
serialisation/causality. I've seen lots f distributed apps in which such
considerations are just ignored.

Tim C

> 
> 
> *On Sun Apr 30 17:32 , Tim Churches sent:
> 
> *
> 
>     Syan Tan wrote:
>      > couldn't you file a request for a academic replication system , like a
>     gossip
>      > architecture system ?
> 
>     Um, file a request with whom? Academics don't do anything without being
>     paid for it, these days.
> 
>      > BTW, I'm not quite clear about why lamport clocks as opposed to vector
>     clocks
>      > are used ;
>      >
>      > a lamport clock is just one sequence number for one site, which is kept
>     ordered
>      > whenever
>      >
>      > sites send messages to each other. Vector clocks are sequence numbers
>     kept at
>      > every site about
>      >
>      > every site , so when messages are received , changes can be causally 
> ordered
>      > between more
>      >
>      > than one other site . What sort of ordering is being aimed for the 
> netepi
>      > multi-site application and why ?
> 
>     Sorry - I said "some variation on Lamport clocks" by which I meant a
>     vector or logical clock, as you describe - they all grew out of the
>     original Lamport idea, I believe. Causal ordering is the aim. Multiple
>     flu clinics during a flu pandemic - a person may present to more than
>     one clinic, and clinics may have intermittent or unreliable connections.
> 
>     Tim C
> 
>      > *On Sun Apr 30 9:06 , Tim Churches sent:
>      >
>      > *
>      >
>      > James Busser wrote:
>      > > On Apr 29, 2006, at 4:35 AM, Tim Churches wrote:
>      > >
>      > >> (I keep wondering whether we should have used an EAV pattern for 
> storage
>      > >
>      > > Educated myself (just a bit) here
>      > >
>      > >
>      >
>     
> http://www.health-itworld.com/newsitems/2006/march/03-22-06-news-hitw-dynamic-data
>     
> <parse.pl?redirect=http%3A%2F%2Fwww.health-itworld.com%2Fnewsitems%2F2006%2Fmarch%2F03-22-06-news-hitw-dynamic-data>
>      >
>     
> www.health-itworld.com%2Fnewsitems%2F2006%2Fmarch%2F03-22-06-news-hitw-dynamic-data>
>      > >
>      > > http://www.pubmedcentral.gov/articlerender.fcgi\?artid=61439
>      > www.pubmedcentral.gov%2Farticlerender.fcgi%3Fartid%3D61439>
>      > > https://tspace.library.utoronto.ca/handle/1807/4677
>     
> <parse.pl?redirect=https%3A%2F%2Ftspace.library.utoronto.ca%2Fhandle%2F1807%2F4677>
>      >
>      > > http://www.jamia.org/cgi/content/abstract/7/5/475
>     
> <parse.pl?redirect=http%3A%2F%2Fwww.jamia.org%2Fcgi%2Fcontent%2Fabstract%2F7%2F5%2F475>
>      > www.jamia.org%2Fcgi%2Fcontent%2Fabstract%2F7%2F5%2F475>
>      >
>      > Thanks - we have copies of the latter three papers but I hadn't seen 
> the
>      > first article. Of course, PostGreSQL muddies the waters, because the 
> way
>      > it works under the bonnet (hood, engine cover) is rather similar to 
> (but
>      > not identical) to the EAV model - but all that is hidden behind the SQL
>      > interface which is not easy to bypass.
>      >
>      > We really wanted to use openEHR when we started in 2003 - openEHR can
>      > been seen as a very sophisticated metadata layer which can be used with
>      > an EAV-like back-end storage schema - but no openEHR storage engines
>      > were available then, and when I asked again earlier this year, there
>      > were still none available (as open source or closed source on a
>      > commercial basis) in a production-ready form.
>      >
>      > Anyway, plain old PostgreSQL tables work rather well, and are fast and
>      > reliable for large datasets - but we will need to build our own
>      > replication engine, I now think. What we really need is multi-master DB
>      > replication which can cope with slow and unreliable networks (hence it
>      > has to use asyncrhonous updates, not tightly-coupled synchronous 
> updates
>      > such as multi-phase commits) and with frequent "network partition". If
>      > we are funded to do that, then we'll write it in Python, probably using
>      > a stochastic "epidemic" model for the data propagation algorithm and
>      > some variation on Lamport logical clocks for data synchronisation. It
>      > als needs to propagate schema changes. Hopefully if we can make it
>      > sufficiently general so it might have utility for GNUmed eg when a copy
>      > of a clinic database is taken away on a laptop for use in the field eg
>      > at a nursing home or a satellite clinic, and network connection and
>      > synchronisation only occurs occasionally. However, we need the
>      > replication to scale to 200 to 300 sites. Interestingly, most of the
>      > commercial multi-master database replication products just gloss over
>      > the issue of data integrity, or leave it up to the application - but
>      > research in the 1990s showed that that is not good enough in more
>      > complex situations with more than a few master DB instances.
>      >
>      > >> - Slony would have worked with that..).
>      >
>      > There is a Slony-2 project, being done here in Sydney, but it is
>      > focussing on multi-master synchronous updates ie multiple servers in a
>      > single data centre, for load-balancing of write tasks as well as read
>      > tasks (for which Slony-1 can be used to facilitate load-balancing)
>      >
>      > Sorry to rave on, but don't let anyone tell you that there are some
>      > fundamental data management issues yet to be addressed by open source 
> or
>      > commercial software.
>      >
>      > Tim C
>      >
>      >
>      >
>      >
>      > _______________________________________________
>      > Gnumed-devel mailing list
>      > address@hidden
>     <javascript:top.opencompose('address@hidden','','','')>
>      > address@hidden
>     <javascript:top.opencompose('address@hidden','','','')>','','','')>
>      > http://lists.gnu.org/mailman/listinfo/gnumed-devel
>     
> <parse.pl?redirect=http%3A%2F%2Flists.gnu.org%2Fmailman%2Flistinfo%2Fgnumed-devel>
>      >
>      >
>      >
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]