[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nuxeo-localizer] StructuredText + Unicode
From: |
Myroslav Opyr |
Subject: |
Re: [Nuxeo-localizer] StructuredText + Unicode |
Date: |
Sat, 22 Mar 2003 01:49:37 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; uk-UA; rv:1.3) Gecko/20030312 |
Ruslan Spivak wrote:
Hello!
Hi, Ruslan,
I made as you wrote, but
when i make "text in russian":http://www.com it doesn't work, it
doesn't convert that to link :(
only when i do "text in english":http://www.com :(
I use zope2.6.1b + plone1.0.1 and start with locale -L "ru_RU.UTF-8"
BTW, [OFF] 2.6.1 was released already.
Any suggestions?
It doesn't look to work in STX. I'll try to explain why.
Usual behavior is to have Latin STX and with it everything works as
expected. Char occupies one byte and everyone is happy. If you use some
charset like Windows-1251 оr koi8-r then there is slight chance that
using proper regexp you'll gain proper results (if locale is set
correctly), but even that is unlikely with Ukrainian characters (Russian
works ok, AFAIK).
What UTF-8 is? It is multibyte encoding of two-byte-character data.
Unicode char is 16 bit wide. To have maximum compatibility it was
decided to encode 16 bit characters not to contain zero and control
codes: 0x00-0x1f in encoded data and use only 0x20-0xff (char has
variable length 1-4 bytes, latin - 1 byte, cyrrilic characters 2 bytes,
Kanji - 4 bytes). All string manipilation functions see the UTF-8 string
as usual string and only specual treatment can reconstruct Unicode
string. Thus truncation of UTF-8 strings is difficult, And not only
truncation. STX code does rely on 1-byte characters and know nothing
about UTF-8. It meets strange codes inside the string and treats it
according it's vision of latin structured text. For proper handling all
data before processing should be converted into Unicode from it's
respective charset (UTF-8, Windows-1251, koi8-u) then processed and
decoded back (to target encoding) to be placed in output HTML. This time
nothing like that is being done. Nobody admired to implement that as
Zope Page Templates are really broken when talking about automatic data
conversion.
What to do in this difficult situation? Zope 2.7 will have support for
reST, which looks like Unicode ready. If your application is to be
deployed not right now, reST is the way to go. You can develop with it
and it'll be released some time.
Thanks in advance,
Ruslan
m.
--
Myroslav Opyr
zope.net.ua <http://zope.net.ua/> ° Ukrainian Zope Hosting
e-mail: address@hidden <mailto:address@hidden>