Re: [Nuxeo-localizer] StructuredText + Unicode

nuxeo-localizer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nuxeo-localizer] StructuredText + Unicode

From:	Myroslav Opyr
Subject:	Re: [Nuxeo-localizer] StructuredText + Unicode
Date:	Sat, 22 Mar 2003 01:49:37 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.0; uk-UA; rv:1.3) Gecko/20030312

Ruslan Spivak wrote:

Hello!


Hi, Ruslan,

I made as you wrote, but
when i make "text in russian":http://www.com it doesn't work, itdoesn't convert that to link :(
only when i do "text in english":http://www.com :(
I use zope2.6.1b + plone1.0.1 and start with locale -L "ru_RU.UTF-8"


BTW, [OFF] 2.6.1 was released already.

Any suggestions?


It doesn't look to work in STX. I'll try to explain why.

Usual behavior is to have Latin STX and with it everything works asexpected. Char occupies one byte and everyone is happy. If you use somecharset like Windows-1251 оr koi8-r then there is slight chance thatusing proper regexp you'll gain proper results (if locale is setcorrectly), but even that is unlikely with Ukrainian characters (Russianworks ok, AFAIK).

What UTF-8 is? It is multibyte encoding of two-byte-character data.Unicode char is 16 bit wide. To have maximum compatibility it wasdecided to encode 16 bit characters not to contain zero and controlcodes: 0x00-0x1f in encoded data and use only 0x20-0xff (char hasvariable length 1-4 bytes, latin - 1 byte, cyrrilic characters 2 bytes,Kanji - 4 bytes). All string manipilation functions see the UTF-8 stringas usual string and only specual treatment can reconstruct Unicodestring. Thus truncation of UTF-8 strings is difficult, And not onlytruncation. STX code does rely on 1-byte characters and know nothingabout UTF-8. It meets strange codes inside the string and treats itaccording it's vision of latin structured text. For proper handling alldata before processing should be converted into Unicode from it'srespective charset (UTF-8, Windows-1251, koi8-u) then processed anddecoded back (to target encoding) to be placed in output HTML. This timenothing like that is being done. Nobody admired to implement that asZope Page Templates are really broken when talking about automatic dataconversion.

What to do in this difficult situation? Zope 2.7 will have support forreST, which looks like Unicode ready. If your application is to bedeployed not right now, reST is the way to go. You can develop with itand it'll be released some time.

Thanks in advance,
Ruslan


m.
--
Myroslav Opyr
zope.net.ua <http://zope.net.ua/> ° Ukrainian Zope Hosting
e-mail: address@hidden <mailto:address@hidden>

[Prev in Thread]

Current Thread

[Next in Thread]

[Nuxeo-localizer] StructuredText + Unicode, Ruslan Spivak, 2003/03/21
- Re: [Nuxeo-localizer] StructuredText + Unicode, Myroslav Opyr <=
  - Re: [Nuxeo-localizer] StructuredText + Unicode, Juan David Iba'n~ez Palomar, 2003/03/27

Prev by Date: [Nuxeo-localizer] StructuredText + Unicode
Next by Date: Re: [Nuxeo-localizer] StructuredText + Unicode
Previous by thread: [Nuxeo-localizer] StructuredText + Unicode
Next by thread: Re: [Nuxeo-localizer] StructuredText + Unicode
Index(es):
- Date
- Thread