koha-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-devel] Koha 3.0 and UTF-8


From: Paul POULAIN
Subject: [Koha-devel] Koha 3.0 and UTF-8
Date: Wed, 04 Jan 2006 16:53:44 +0100
User-agent: Mozilla Thunderbird 1.0.6-7.1.20060mdk (X11/20050322)

utf8 is a : go for beta test in HEAD.

Some explanations of what i've made :

- updater/updatedatabase => will transform all tables in innoDB (not related to utf8, just to warn you) AND collate them in utf8 / utf8_general_ci. The SQL command is : ALTER TABLE tablename DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci.

- *-top.inc will show the pages in utf8

- THE HARD THING : for me, mysql-client and mysql-server were set up to communicate in iso8859-1, whatever the mysql collation ! Thus, pages were improperly shown, as datas were transmitted in iso8859-1 format ! After a full day of investigation, someone on usenet pointed "set NAMES 'utf8'" to explain that I wanted utf8. I could put this in my.cnf, but if I do that, ALL databases will "speak" in utf8, that's not what we want. Thus, I added a line in Context.pm : everytime a DB handle is opened, the communication is set to utf8.

- how to deal with MARC records ? MARC records are in marc-8 encoding, stored in biblioitems.marc binary format, that is not modified by the alter table (fortunatly, as it's a binary format !). But... I created a marcxml column in this table, containing the XML output of the marc record (a duplicate of the raw marc record). I did not knew what it will be used for, but know I know : the utf8 move transform the marcxml column, and the catalogue is moved to utf8 (with MARCgetbiblio using marcxml instead of raw marc) ! The last question being : is biblioitems.marc (raw marc record) still useful ? I think no :
- iso2709 is limited to 99999 char, specialised to MARC-8 encoding, binary.
- XML has none of those limitations.
Thus, my opinion is that we should get rid of iso2709 and use XML everywhere, except when exporting datas in iso2709 format. But internally, we should use only XML.

Let me know if it works completly, correctly, poorly or not at all for you !
--
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]