[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnumed-devel] GNUmed upgrade preparation alert
From: |
Jim Busser |
Subject: |
Re: [Gnumed-devel] GNUmed upgrade preparation alert |
Date: |
Thu, 13 Oct 2011 09:40:22 -0700 |
On 2011-10-13, at 7:37 AM, Karsten Hilbert wrote:
> The next GNUmed release will tighten data sanity checks
> regarding demographics reference data such that it will, for
> example, no longer be possible to simultaneously store:
>
> Berlin
> BERLIN
>
> Tokyo
> TOKYO
> (or tOkYo, for that matter)
>
> The database upgrade v16 -> v17 will fail if there are any
> duplicates among each of:
>
> streets
> communities
> regions
>
> This GNUmed cannot know the capitalization you wish to keep
> in your database it cannot weed out duplicates for you. Thus
> it is necessary to check for duplicates yourself before
> attempting to upgrade (if you don't and the upgrade fails -
> no worry, just fix the duplicates and re-run the upgrade).
>
> To look for duplicates run these queries:
>
> Regions:
>
> SELECT * from dem.state where id not in (
> select max(s.id) from dem.state s group by lower(s.name),
> s.code, s.country
> );
>
> Communities:
>
> SELECT * from dem.urb where id not in (
> select max(u.id) from dem.urb u group by lower(u.name),
> lower(u.postcode), u.id_state
> );
>
> Streets:
>
> SELECT * from dem.street where id not in (
> select max(st.id) from dem.street group by lower(st.name),
> lower(st.postcode), st.id_urb
> );
>
> Then deduplicate until no more rows are returned.
Thanks, the above was really helpful to explain a subquery problem I am having
to try to be future-compatible, but contains a bug. The third query above is
missing the alias in 'from dem.street group by ... '
--> it needs 'st' after dem.street
FWIW, what gets returned by the above queries is *just* the excess rows -- the
selection of which was influenced by the sequence of creation among duplicates.
To better see each duplicate in the context of the duplication, a select like
the following – for streets – might be considered:
select id_urb, name, postcode from dem.street
where id_urb || lower(name) || lower(postcode) in (
select id_urb || lower(name) || lower(postcode) from dem.street
group by id_urb || lower(name) || lower(postcode)
having count(*) > 1
)
order by id_urb, lower(name), lower (postcode)
;
see appended for a sample of what I get.
My subquery problem was to link org to address, where I had managed (despite my
earlier precautions) to allow-in duplicates, and so when I more recently
attempted to link WHERE upper = upper it explains why I get too many rows in
the subquery which fails.
I held back to copy this to gnumed-announce, since there is maybe more here
than announce may want. Maybe include, in the correction, a link to your post
on devel archive from which people can thread more information if they want it?
http://lists.gnu.org/archive/html/gnumed-devel/2011-10/msg00115.html
-- Jim
id_urb | name | postcode
--------+---------------------+----------
9204 | 16th Ave W | V6K 4M1
9204 | 16th ave W | v6k 4m1
9204 | Ash St | V5Z 3C6
9204 | Ash ST | V5Z 3C6
9204 | Broadway W | V5Z 1K7
9204 | Broadway W | v5Z 1K7
9204 | Burrard ST | V6Z 1Y6
9204 | Burrard St | V6Z 1Y6
9204 | Burrard St | V6Z1Y6
9204 | Burrard ST | V6Z1Y6
9204 | Granville ST | V6M 3C5
9204 | Granville St | V6M 3C5
-- Jim