On Wed, Jul 28, 2010 at 11:57:03AM -0700, Jim Busser wrote:
1) the molecule name might not match what other countries use
even in the same language (e.g. maybe they kept their own
historical molecule names instead of updating data files to INN)
but
2) within any one country data source, each molecule likely has a
distinct name
do we have agreement on the above?
fine with me
and so what about a table
ATC_WITH_NAMES
with columns
ATC
INN_EN
INN_FR
INN_DE
...
MOL_NM_CA (which may be the same or different from IN_EN)
MOL_NM_US (which may be the same or different from IN_EN)
do we have agreement that the above has may some value in keeping
together the data (at most, one record per ATC)?
A proper database structure would be:
table atc
pk integer
code varchar
who_name varchar
(it can be discussed whether the official who_name should be
treated as the english INN verbatim or whether it should be
considered just a a textual rendering of the code -
pragmatically the former approach is likely rather safe)
table atc_translation
pk integer
fk_atc integer references (atc.pk)
language_code varchar
inn varchar
molecule_name varchar (null if not existant or equal to inn)
Again, this way one does not need to change the database
structure (add a column) when a new translation becomes
available.
BTW, is there a page on the freediams website which defines
the relevant terms and abbreviations (molecule, SPC, INN,
ATC) ? This would help tremendously in making sure everyone
is talking about the same thing.