nuxeo-localizer
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nuxeo-localizer] StructuredText + Unicode


From: Myroslav Opyr
Subject: Re: [Nuxeo-localizer] StructuredText + Unicode
Date: Sat, 22 Mar 2003 01:49:37 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; uk-UA; rv:1.3) Gecko/20030312

Ruslan Spivak wrote:

Hello!

Hi, Ruslan,

I made as you wrote, but
when i make "text in russian":http://www.com it doesn't work, it doesn't convert that to link :(
only when i do "text in english":http://www.com :(

I use zope2.6.1b + plone1.0.1 and start with locale -L "ru_RU.UTF-8"

BTW, [OFF] 2.6.1 was released already.

Any suggestions?

It doesn't look to work in STX. I'll try to explain why.

Usual behavior is to have Latin STX and with it everything works as expected. Char occupies one byte and everyone is happy. If you use some charset like Windows-1251 оr koi8-r then there is slight chance that using proper regexp you'll gain proper results (if locale is set correctly), but even that is unlikely with Ukrainian characters (Russian works ok, AFAIK).

What UTF-8 is? It is multibyte encoding of two-byte-character data. Unicode char is 16 bit wide. To have maximum compatibility it was decided to encode 16 bit characters not to contain zero and control codes: 0x00-0x1f in encoded data and use only 0x20-0xff (char has variable length 1-4 bytes, latin - 1 byte, cyrrilic characters 2 bytes, Kanji - 4 bytes). All string manipilation functions see the UTF-8 string as usual string and only specual treatment can reconstruct Unicode string. Thus truncation of UTF-8 strings is difficult, And not only truncation. STX code does rely on 1-byte characters and know nothing about UTF-8. It meets strange codes inside the string and treats it according it's vision of latin structured text. For proper handling all data before processing should be converted into Unicode from it's respective charset (UTF-8, Windows-1251, koi8-u) then processed and decoded back (to target encoding) to be placed in output HTML. This time nothing like that is being done. Nobody admired to implement that as Zope Page Templates are really broken when talking about automatic data conversion.

What to do in this difficult situation? Zope 2.7 will have support for reST, which looks like Unicode ready. If your application is to be deployed not right now, reST is the way to go. You can develop with it and it'll be released some time.

Thanks in advance,
Ruslan

m.
--
Myroslav Opyr
zope.net.ua <http://zope.net.ua/> ° Ukrainian Zope Hosting
e-mail: address@hidden <mailto:address@hidden>






reply via email to

[Prev in Thread] Current Thread [Next in Thread]