gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: [Gnumed-devel] forms handling / NetEpi


From: Karsten Hilbert
Subject: Re: Re: [Gnumed-devel] forms handling / NetEpi
Date: Mon, 25 Jun 2007 11:13:26 +0200
User-agent: Mutt/1.5.13 (2006-08-11)

Tim,

thanks for your invaluable insights.

Let me try to get a few things cleared up in my mind:

> > Tim, what were your reasons to store the form definition as
> > an XML-based blob rather than in tables (roughly) like so:
...
> Easier extensibility of the form definitions/metadata,
I can see that. Put IOW: it seems safer to update data in
deployed databases rather than requiring the databases
themselves to be modified, eg. when one needs a new "field"
in the form metadata definition.

It *can* be done via ALTER TABLE but I agree it sounds safer
to rather do UPDATE ... WHERE ... ;

> and
> the form definitions need to be serialised anyway in order
> to share them around.
True. I was thinking of using

        COPY ... (SELECT ...)

to create singled out dumps for sharing forms (COPY these
days is able to dump the result of a SELECT statement :-)

> > One disadvantage for this is that the table might become
> > large and thus slower. Which could be alleviated by using
> > constraint-based partitioning (eg by age) and putting the
> > partitions onto more spindles.

Let me rephrase this:

You store form data in a-table-per-form because EAV for
all-form-data-in-one-table was tested to be too slow ?

If testing did NOT prove the approach to be too slow at the
time would you have chosen it over the approach you are
taking now ?

Did you test the explicit-table-for-form-definitions, too ?
If so, I assume that wasn't a bottleneck ? (apart from that
"parsing XML into Python objects was faster then parsing a
SELECT result into Python objects")

> We tested your approach, which is a form of the well-known
> EAV (entity-attribute-value) approach,
Correct.

> and found that performnce was lacking at around the
> 500,000 record mark with about a ten forms of 50 fields each
> i.e. about 250 million rows in the values tables.
Let's see:

We see about 150 patients per day. Let's say 200. Each
receives, on average, one form printout to go. Let's say 2.

That's 400 form instances per day.

Let's assume each form has 50 fields (rather high for the
average form I deal with). That's 20000 form field values
per day.

That'd make 800 000 form field values per year assuming 400
days in a year.

Or 8 000 000 form field values over 10 years. Let's double
that up for increase of bureaucracy: 16 million. Double
again just for good measure: 32 million.

Nothing which PostgreSQL cannot handle these days judging
from what people shove around on the PG mailing list. Also
nowhere near the 250 million rows you were benchmarking for
back in 2003 with a then-current PostgreSQL and that's over
a 10 year period which sees increases in hardware and
software speed.

Still really good to be able to learn from your numbers.

BTW, one reason we don't reach your numbers is that GNUmed
isn't an epidemiologist's tool which NetEpi decidedly is.

> If we were starting stoday, we would use openEHR
> archetypes or a variation thereof, but we'd need to spend
> 6-12 months building the engine ourselves.
If I had the manpower I'd go that route, too, yes.

> > > Data from the table for the previous versions of the form
> > > are rolled forward into the table for the new version of the form when
> > > it is deployed.
...
> There is a need for custom transformations in some rare
> roll-forward instances, which have to be written in Python
> outside the Web environment.
Sure, that's to be expected. It likely could be done via
dedicated stored procedures, perhaps written in pl/Python ?

Thanks,
Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346




reply via email to

[Prev in Thread] Current Thread [Next in Thread]