Re: [Gnumed-devel] re: Soundex

gnumed-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] re: Soundex

From:	J Busser
Subject:	Re: [Gnumed-devel] re: Soundex
Date:	Sat, 21 Aug 2004 17:57:48 -0700

At 9:15 AM +1000 8/22/04, Tim Churches wrote:

...another alternative is to use a technique we we have dubbed
"n-gram indexes" (since we developed the method for our record linkage
project). We still haven't written a definitive paper on it, but it is
implemented in the Febrl software and described in the manual, and there
is a paper describing it relative retrieval performance - see
http://datamining.anu.edu.au/publications/2003/kdd03-6pages.pdf

I plan to work on an improved implementation of this technique (in
Python of course) over the next several months for use in our public
health data collection systems (where case/patient look-up and
deduplication is vital, but where we have hundreds of thousands or
millions of records) - when this work is complete you might want to
evaluate it for use in GNUmed. It might be overkill for general practice
databases with a few thousand patients, but the technique is
conceptually simple and elegant and unlike teh phonetic indexing
functions, makes no assumptions about name or string morphology and
phonetics - thus it works equally well with alphabetic names from any
culture, including Pinying Chinese names. It takes a set-theoretic
approach, and the faster, built-in set data type in Python 2.4 improves
its speed considerably.


This sounds really interesting, look forward to your progress.

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnumed-devel] re: Soundex, sjtan, 2004/08/21
- Re: [Gnumed-devel] re: Soundex, Tim Churches, 2004/08/21
  - Re: [Gnumed-devel] re: Soundex, J Busser <=
  - Re: [Gnumed-devel] re: Soundex, Karsten Hilbert, 2004/08/22
    - Re: [Gnumed-devel] re: Soundex, Tim Churches, 2004/08/22
    - Re: [Gnumed-devel] re: Soundex, Karsten Hilbert, 2004/08/23

Prev by Date: Re: [Gnumed-devel] re: Soundex
Next by Date: [Gnumed-devel] record matching : thinking out loud
Previous by thread: Re: [Gnumed-devel] re: Soundex
Next by thread: Re: [Gnumed-devel] re: Soundex
Index(es):
- Date
- Thread