[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnumed-devel] Re: dbgen import csv scripts for gnumed.
From: |
sjtan |
Subject: |
[Gnumed-devel] Re: dbgen import csv scripts for gnumed. |
Date: |
Fri, 27 Aug 2004 00:59:55 +1000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040616 |
# run in the dbgen directory
csv='../dbgen/dataset1.csv'
So that seems to be FEBRL stuff ?
yes. The aim was to see if gnumed can interface with febrl in the dedup
use case.
the good news is that it wasn't too hard.
The steps were ( to run a febrl dbgen test dataset with gnumed):
1. write script(s) to input a dbgen csv file to gnumed (
correct_postcodes.py csv_to_gnumed.py)
2. write scripts to read gnumed data into febrl
a) write a convenient sql view ( v_febrl_demo_read_au.sql)
b) either
i) read view contents into csv file ( read_v_febrl_demo.py)
and then modify the csv filename in a copy of febrl/project-deduplicate.py
or
ii) write a pgdb adaption of DataSetSQL in febrl/dataset.py
, and use a copy of febrl/project-deduplicate.py and modify the indata
structure to read
the modified DataSet ( DataSetPGSQL)
both worked, although ii) discovered a bug in the febrl/dataset.py
read_records(self, start, number): method, where self.next_record_num
is incremented
by number ( the batch number of records to read) outside of the batch
processing loop, when it should be self.next_record_num += 1 inside the
loop.
I've put the modified dataset.py , datasetTest.py from febrl , as well
as the scripts for gnumed febrl input output in test-area/febrl.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Gnumed-devel] Re: dbgen import csv scripts for gnumed.,
sjtan <=