gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Brasil cities and states (demographics)


From: Busser, Jim
Subject: Re: [Gnumed-devel] Brasil cities and states (demographics)
Date: Wed, 16 Nov 2011 06:56:02 +0000

On 2011-11-15, at 6:47 PM, Jim Busser wrote:

The only hiccup is that the original bootstrapped GNUmed data contained most (but not all) of Brazil's states

The above is incorrect:

1) GNUmed bootstrapped all of the states with correct (unaccented) names, it was only that a handful of the state abbreviations (2-character codes) were either incorrect or subsequently revised.
2) the patch in Brasilian bootstrap will correctly provide pt_BR for those states which are accented:

select i18n.upd_tx('pt_BR', 'Ceara', 'Ceará');
select i18n.upd_tx('pt_BR', 'Espirito Santo', 'Espírito Santo');
select i18n.upd_tx('pt_BR', 'Goias', 'Goiás');
select i18n.upd_tx('pt_BR', 'Maranhao', 'Maranhão');
select i18n.upd_tx('pt_BR', 'Para', 'Pará');
select i18n.upd_tx('pt_BR', 'Paraiba', 'Paraíba');
select i18n.upd_tx('pt_BR', 'Parana', 'Paraná');
select i18n.upd_tx('pt_BR', 'Piaui', 'Piauí');
select i18n.upd_tx('pt_BR', 'Rondonia', 'Rondônia');
select i18n.upd_tx('pt_BR', 'Sao Paulo', 'São Paulo');

However, my questions about the approach to be taken for populating unaccented vs accented names remain of interest to answer.

Regarding:

Found these


However

1) postgres does not support SQL99's convert('string', 'ENCODING')


2) to_ascii() supports only LATIN1, LATIN2, LATIN9, and WIN1250 and not UTF8 and even if we made the encodings LATIN1 or WIN1250 the output seems not what we want:

SELECT to_ascii('Ceará', 'LATIN1'); 
--> CearA 

SELECT to_ascii('Rondônia', 'LATIN1'); 
--> RondA'nia

3) Postgres 9 appears to support an unaccent() function


4) python


5) perl




-- Jim

reply via email to

[Prev in Thread] Current Thread [Next in Thread]