gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnumed-devel] Abstraction layer performance costs


From: Hilmar Berger
Subject: [Gnumed-devel] Abstraction layer performance costs
Date: Wed, 16 Oct 2002 19:33:41 +0200 (CEST)

On Mon, 14 Oct 2002, Horst Herb wrote:

> Hilmar Berger wrote:
> 
> > if we have to do all backend communication 'by hand'. Hiding
> > implementation features in hierarchies of objects has certainly a
> > perfomance cost but simplify coding a lot. If, however, the costs to
> 
> This was my initial thought. However, considering the time I spent 
> designing abstraction layers I must say that in the same time I would 
> have written the straightforward solutions twice.
Hm, surely you have to spent time before harvesting fruits, but that is
true for every 'invention' we wouldn't like to miss today. 
> The code didn't get simpler either - the opposite is the case.
At start the benefits might be small, but as the project grows and more
features are implemented simple interfaces pay off. Not to forget new
members of the development team that might benefit from a small learning
curve. 

> The main problem is our high degree of normalisation and our wish to 
> remain language independend on the backend.
> The normalisation makes it almost impossible to autocreate the python 
> objects from backend tables; putting rules into tables regarding this 
> would make the backend structure even more awkward
> Implementing persistent python objects in a straightforward way on the 
> backend side makes it difficult to access the data from other languages 
> (like web clients)
I didn't think on translating queries into native python objects (it
should be possible, though, pyPgSQL has done something similiar). I have
written a object that just holds named queries that provide their results
in dictionaries. The queries are initialized from an config file/table and
have information about input and output variables and the query string
itself. This design could come handy once there are many different
databases that should map to a specific backend. It is not completely
ready, but works well and will be connected to the frontend within the
next days.

I did some measurements, that suggest that:
1. THe overhead of reading data through my interface vs. PG directly is
small if the query results in a small number of rows returned and
reaches about maximal 20% (depending on the number of rows fetched).
2. pyPgSQL is much slower on large queries than pgdb.
3. one of the greatest performance hits was due logging the result to the
log file in DBObject :)

Results

1. Query: select brandname from amis_praeparate WHERE atc_code ~ '^C02A' 
(a query with a regular expression resulting in 73 rows, both pgdb and
pyPgSQL are almost equal in regard of speed)

time elapsed in accessing query via PG directly: 0.961644053459

Query (cursor.execute +fetchall through DBObject) takes:  1.13282394409
Abstraction layer (result->dict): 0.00261998176575
time elapsed in accessing query 1: 1.14686703682
-----------------------------------------------------------------

2. A large query
select brandname from amis_praeparate LIMIT 20000
pgDB:
time elapsed in accessing results via PG directly: 4.30877006054

through abstraction layer:
Query takes:  4.66334688663
result->dict takes: 0.879739999771
total time elapsed in accessing query: 5.64783298969

Now using pyPGSQL:
time elapsed in accessing query via PG directly: 12.9802569151 (about 3
times pgDB

through abstraction layer:
Query takes:  13.982041955
Fetching takes: 2.91963100433
total time elapsed in accessing query: 17.1527969837

(The query time differs between subsequent runs of the query. With
pyPgSQL listonly-mode was used.)

> individual performance penalties are small, but they do add up.
> 0.001 seconds is nothing, but 1000 times 0.001 seconds is a pain.
I'm convinced that it is possible to optimize queries/code in a way that
these performance penalties are minimized so that the user won't notice
the difference. And usually the user won't access huge lists of data as I
have done. Even then one could think about special methods of finding an
optimal index.

> That said - I am open for suggestions and would welcome whoever proves 
> me wrong.
I would suggest to stay with the abstraction layers we already have and
wait until we definitely know that there is no way but using direct PG
access.

> Horst
> 
> 
> 
> _______________________________________________
> Gnumed-devel mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/gnumed-devel
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]