|
From: | Busser, Jim |
Subject: | Re: [Gnumed-devel] Brasil cities and states (demographics) |
Date: | Wed, 16 Nov 2011 06:56:02 +0000 |
On 2011-11-15, at 6:47 PM, Jim Busser wrote:
The only hiccup is that the original bootstrapped GNUmed data contained most (but not all) of Brazil's states The above is incorrect:
1) GNUmed bootstrapped all of the states with correct (unaccented) names, it was only that a handful of the state abbreviations (2-character codes) were either incorrect or subsequently revised.
2) the patch in Brasilian bootstrap will correctly provide pt_BR for those states which are accented:
select i18n.upd_tx('pt_BR', 'Espirito Santo', 'Espírito Santo'); select i18n.upd_tx('pt_BR', 'Goias', 'Goiás'); select i18n.upd_tx('pt_BR', 'Maranhao', 'Maranhão'); select i18n.upd_tx('pt_BR', 'Para', 'Pará'); select i18n.upd_tx('pt_BR', 'Paraiba', 'Paraíba'); select i18n.upd_tx('pt_BR', 'Parana', 'Paraná'); select i18n.upd_tx('pt_BR', 'Piaui', 'Piauí'); select i18n.upd_tx('pt_BR', 'Rondonia', 'Rondônia'); select i18n.upd_tx('pt_BR', 'Sao Paulo', 'São Paulo'); However, my questions about the approach to be taken for populating unaccented vs accented names remain of interest to answer.
Regarding:
However
1) postgres does not support SQL99's convert('string', 'ENCODING')
2) to_ascii() supports only LATIN1, LATIN2, LATIN9, and WIN1250 and not UTF8 and even if we made the encodings LATIN1 or WIN1250 the output seems not what we want:
SELECT to_ascii('Ceará', 'LATIN1');
--> CearA
SELECT to_ascii('Rondônia', 'LATIN1');
--> RondA'nia
3) Postgres 9 appears to support an unaccent() function
4) python
5) perl
-- Jim
|
[Prev in Thread] | Current Thread | [Next in Thread] |