gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Brasil cities and states (demographics)


From: Busser, Jim
Subject: Re: [Gnumed-devel] Brasil cities and states (demographics)
Date: Wed, 16 Nov 2011 02:47:38 +0000


On 2011-11-15, at 4:54 PM, Rogerio Luz Coelho wrote:

Well , its just that it is not that easy 

Say the doc types Paranagua (no accent), if he does not know the correct accentuation of the city he will never find it correct? 

Will a new entry be needed for every city? Can't the DB be told to search for a ã á when typed "a" ? 

Would the following work?

The source file came with a table providing both unaccented and accented values and after quite a bit of cleanup I was able to generate a table

INSERT INTO i18n_urbs_br (accented, unaccented, zip, state) VALUES ('Jitaúna', 'Jitauna', '45225-000', 'BA');
INSERT INTO i18n_urbs_br (accented, unaccented, zip, state) VALUES ('Crato', 'Crato', 'NULL', 'CE');
INSERT INTO i18n_urbs_br (accented, unaccented, zip, state) VALUES ('Guia', 'Guia', '63885-000', 'CE');
INSERT INTO i18n_urbs_br (accented, unaccented, zip, state) VALUES ('Quixoá', 'Quixoa', '63502-000', 'CE');
INSERT INTO i18n_urbs_br (accented, unaccented, zip, state) VALUES ('Jataí', 'Jatai', NULL, 'GO');
INSERT INTO i18n_urbs_br (accented, unaccented, zip, state) VALUES ('Anil', 'Anil', NULL, 'MA');
INSERT INTO i18n_urbs_br (accented, unaccented, zip, state) VALUES ('Sabará', 'Sabara', NULL, 'MG');

Presently, since the

streets / cities / zip

combinations in the source are all accented, it is easiest to populate the GNUmed urbs (cities) with the accented values.


It should be very possible to do either of the following:

1) create i18n pt_BR 'unaccented' translations for the accented 'originals' or

2) replace wherever possible, in the GNUmed dem.urb table, the accented value with an unaccented value and then create i18n (pt_BR) accented translations.

The only hiccup is that the original bootstrapped GNUmed data contained most (but not all) of Brazil's states, which were all bootstrapped unaccented, however when the beta Brazilian datapack runs, any states missing from dem.states will be populated from values which may have accents. However, there are not too many (appended).

Exists there any postgres function that should be applied to the UTF8 converting?

Found these

http://postgresql.1045698.n5.nabble.com/GENERAL-Remove-diacritical-marks-in-SQL-td1874140.html
http://scottbarnham.com/blog/2010/12/20/make-a-slug-in-postgresql-translating-diacritics/



    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('AC', 'Acre', 69900, 69999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('AL', 'Alagoas', 57000, 57999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('AM', 'Amazonas', 69000, 69299);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('AP', 'Amapá', 68900, 68999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('BA', 'Bahia', 40000, 48999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('CE', 'Ceará', 60000, 63999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('DF', 'Distrito Federal', 70000, 72799);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('ES', 'Espírito Santo', 29000, 29999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('GO', 'Goiás', 72800, 72999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('MA', 'Maranhão', 65000, 65999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('MG', 'Minas Gerais', 30000, 39999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('MS', 'Mato Grosso do Sul', 79000, 79999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('MT', 'Mato Grosso', 78000, 78899);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('PA', 'Pará', 66000, 68899);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('PB', 'Paraíba', 58000, 58999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('PE', 'Pernambuco', 50000, 56999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('PI', 'Piauí', 64000, 64999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('PR', 'Paraná', 80000, 87999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('RJ', 'Rio de Janeiro', 20000, 28999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('RN', 'Rio Grande do Norte', 59000, 59999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('RO', 'Rondônia', 78900, 78999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('RR', 'Roraima', 69300, 69399);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('RS', 'Rio Grande do Sul', 90000, 99999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('SC', 'Santa Catarina', 88000, 89999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('SE', 'Sergipe', 49000, 49999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('SP', 'São Paulo', 1000, 19999);
    INSERT INTO staging.state (abbr, name, zip1, zip2) VALUES ('TO', 'Tocantins', 77000, 77999);



reply via email to

[Prev in Thread] Current Thread [Next in Thread]