gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] GNUmed upgrade preparation alert


From: Jim Busser
Subject: Re: [Gnumed-devel] GNUmed upgrade preparation alert
Date: Thu, 13 Oct 2011 09:40:22 -0700

On 2011-10-13, at 7:37 AM, Karsten Hilbert wrote:

> The next GNUmed release will tighten data sanity checks
> regarding demographics reference data such that it will, for
> example, no longer be possible to simultaneously store:
> 
>       Berlin
>       BERLIN
> 
>       Tokyo
>       TOKYO
>       (or tOkYo, for that matter)
> 
> The database upgrade v16 -> v17 will fail if there are any
> duplicates among each of:
> 
>       streets
>       communities
>       regions
> 
> This GNUmed cannot know the capitalization you wish to keep
> in your database it cannot weed out duplicates for you. Thus
> it is necessary to check for duplicates yourself before
> attempting to upgrade (if you don't and the upgrade fails -
> no worry, just fix the duplicates and re-run the upgrade).
> 
> To look for duplicates run these queries:
> 
> Regions:
> 
>       SELECT * from dem.state where id not in (
>               select max(s.id) from dem.state s group by lower(s.name), 
> s.code, s.country
>       );
> 
> Communities:
> 
>       SELECT * from dem.urb where id not in (
>               select max(u.id) from dem.urb u group by lower(u.name), 
> lower(u.postcode), u.id_state
>       );
> 
> Streets:
> 
>       SELECT * from dem.street where id not in (
>               select max(st.id) from dem.street group by lower(st.name), 
> lower(st.postcode), st.id_urb
>       );
> 
> Then deduplicate until no more rows are returned.

Thanks, the above was really helpful to explain a subquery problem I am having 
to try to be future-compatible, but contains a bug. The third query above is 
missing the alias in 'from dem.street group by ... '

        --> it needs 'st' after dem.street

FWIW, what gets returned by the above queries is *just* the excess rows -- the 
selection of which was influenced by the sequence of creation among duplicates.

To better see each duplicate in the context of the duplication, a select like 
the following – for streets – might be considered:

        select id_urb, name, postcode from dem.street
        where id_urb || lower(name) || lower(postcode) in (
                select id_urb || lower(name) || lower(postcode) from dem.street
                group by id_urb || lower(name) || lower(postcode)
                having count(*) > 1
        )
        order by id_urb, lower(name), lower (postcode)
        ;

see appended for a sample of what I get.

My subquery problem was to link org to address, where I had managed (despite my 
earlier precautions) to allow-in duplicates, and so when I more recently 
attempted to link WHERE upper = upper it explains why I get too many rows in 
the subquery which fails.

I held back to copy this to gnumed-announce, since there is maybe more here 
than announce may want. Maybe include, in the correction, a link to your post 
on devel archive from which people can thread more information if they want it?

        http://lists.gnu.org/archive/html/gnumed-devel/2011-10/msg00115.html

-- Jim

 id_urb |        name         | postcode 
--------+---------------------+----------
   9204 | 16th Ave W          | V6K 4M1
   9204 | 16th ave W          | v6k 4m1
   9204 | Ash St              | V5Z 3C6
   9204 | Ash ST              | V5Z 3C6
   9204 | Broadway W          | V5Z 1K7
   9204 | Broadway W          | v5Z 1K7
   9204 | Burrard ST          | V6Z 1Y6
   9204 | Burrard St          | V6Z 1Y6
   9204 | Burrard St          | V6Z1Y6
   9204 | Burrard ST          | V6Z1Y6
   9204 | Granville ST        | V6M 3C5
   9204 | Granville St        | V6M 3C5

-- Jim






reply via email to

[Prev in Thread] Current Thread [Next in Thread]