[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnumed-devel] phrase usage scoring
From: |
richard terry |
Subject: |
Re: [Gnumed-devel] phrase usage scoring |
Date: |
Fri, 19 Sep 2003 08:31:42 +1000 |
User-agent: |
KMail/1.5 |
On Fri, 19 Sep 2003 12:03 am, Karsten Hilbert wrote:
> > Incremented the counter. Discuss the pitfalls however.
>
> Well, the obvious pitfalls are that
>
> a) a simple integer field will overflow eventually
Yes, at what actually number just for interest?
I just checked my database. The top weighted field I found was about 700
(Thats after nearly 6 years of daily use), most were pretty low. I'd hazard a
guess I will be well and truly dead and gnuMEd will be surplanted by
artificial intelligence before weighting on an incremental counter fucks your
database!
> b) terms that are used often will have astronomically high scores
>
This discrepency in the gaps of scores seems not to matter in practice for the
following reasons.
Lets say you are prescribing a drug amoxycillin, and use it for many
conditions (I just checked my pop up list and I've used it for a total of
only 19 different conditions in since 1997, of these when one takes synonyms
out (eg middle ear infection, acute otitis media, otitis media) (gum
infection, gingivitis) etc, there are very few.
Even at the maximum number there is very little scrolling down the list as
the terms we use commonly will always be in the top several on the list. With
other drugs such as beta-blockers with narrower indications the list is even
shorter, for e.g my beta blocker list contains just three items.
So even if I used amoxycillin for say acute otitis media and its cumulative
score was thousands and thousands, and the next one on the list had a score
of 200, they are still in the same relative frequency. It is only if you
change your prescribing habits or phrases that you get into trouble, and
there one needs, as I've mentioned in the docs, a mechanism to manually
re-weight or shuffle the lists. I've only ever had to do this once in the
last 6 years - I just edited my database, so you can see how rare this event
is.
> Simple percented score increases will not work as they make
> all terms asymptotically reach the same weighting unless some
> sort of percentage of sum of all scores is taken into account
> which is prohibitive in terms of speed.
>
> I am not sure I see a Good solution currently.
>
> Karsten