[Nuxeo-localizer] range(128)

nuxeo-localizer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nuxeo-localizer] range(128)

From:	Juan David Ibáñez Palomar
Subject:	[Nuxeo-localizer] range(128)
Date:	Fri, 24 Jan 2003 13:09:09 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2) Gecko/20021204 Debian/1.2.1-1


Hi,

Thanks Florent for the report.

I'm not willing to "fix" something just to break something else.
I prefer to go deeper in the solution you explain at the end,
"A final digression about ZPT", even if ZPT does not it by itself
maybe Localizer could with a dynamic patch.

Unfortunately I don't have time right now to research it by myself.
This means that either you or somebody else find a solution that
works in both situations, either you wait for me to find time to
find a solution.

Just for the record I attach a zexp file. It is a folder with two
page templates (case1 and case2), case1 one works when the variable
LOCALIZER_USE_ZOPE_UNICODE is set, case2 works when it isn't. To
test it use Localizer 1.0 and TranslationService 0.2. I want
something that works for both cases, nothing else is an option.


Best regards,
david


Florent Guillaume wrote:

Hi Folks,

Sorry for the crosspost but this really covers ZPT and Localizer, and is
of great interest to the Plone i18n users. Please keep your answers to
the lists where they are legitimate -- and I'd appreciate being kept as
Cc.


Ok, I got down to the reason for the infamous "UnicodeError: ASCII
decoding error: ordinal not in range(128)". Thanks to all who cooperated
in that matter.

Readers wanting the quick solution without the rest of the discussion
can skip to the part bracketed by #######.

First a reminder of the problem for those not familiar with it.

In many situations, in a multilingual Plone site using Localizer, people
got the above error.

This in fact happened in the following circumstances:

- A page template like:
       <h1 i18n:translate="edit_type_header">
       Edit an object of type
         <span i18n:name="type">

<span i18n:translate=""tal:content="python:here.getTypeInfo().Title()"tal:omit-tag="">Type</span></span></h1>


- A translation for type_header of the form
       Éditer un objet de type ${type}
 where the translation contains non-ascii characters ("É" here),

- A substituted string for ${type} that itself has non-ascii characters,
 for instance "déjà".

What happens behind the scene during the template evaluation is complex,
but at some point the <span i18n:translate> gets evaluated, the message
catalog gets consulted and a u'déjà', as Unicode, is returned.

At that point Localizer has a mechanism to convert all non-Unicode
strings to their final browser encoding, in a plain string of bytes,
so for instance using UTF-8 it would substitue 'd\xc3\xa9j\xc3\xa0'.

The problem here is that this string is not destined to go to the
browser yet, but will first be used further in the ZPT processing to be
substituted for ${type}. So later in the processing, we have to
substitute
    u'Éditer un objet de type ${type}'
using the mapping
    {u'type': 'd\xc3\xa9j\xc3\xa0'}

At that point, we have a mix of Unicode (which is legitimate) and some
plain string encoded in the final output. This encoding came too soon!
We would still like to have Unicode here... If we still had it it would
work.

Fortunately, I kind of foresaw this sort of problem a few months ago,
and I included in Localizer a way to turn off its early conversion to
browser output encoding.

#######

To do that, you have to launch Zope with the LOCALIZER_USE_ZOPE_UNICODE
environment variable set to something not empty, for instance "yes".

#######

Now, why did Localizer choose to do early encoding by default? The
problem is the following: during ZPT parsing, we're building something
from the concatenation of a list of strings, some which are Unicode if
they come from a message catalog (or some TALES returning Unicode), some
which are plain strings like most of the page template itself.

If all the plain strings are only ever pure ASCII, then there's no
problem doing a join of all of them with something Unicode, and the
result will be Unicode. That's what pure Zope 2.6 does by default. It
then, in ZPublisher, proceeds to encode that resulting Unicode string in
the preferred browser encoding and sends that. This mode is what you get
if you define LOCALIZER_USE_ZOPE_UNICODE.

But when Localizer was introduced, it was to be used by people who had
localized their page templates by hand and thus included a lot of
non-ASCII characters in them, in their preferred encoding, say, UTF-8,
together with a RESPONSE.setHeader('Content-Type') with that encoding.
So because of those non-ASCII characters, the strategy of the previous
paragraph wouldn't work. So Localizer decided to encode all Unicode
strings to the preferred encoding (assumed to be the same as the browser
encoding) as soon as it saw them inside the ZPT parsing.

Unfortunately, as we saw at the beginning, this can't work in the
presence of i18n:name substitutions.

As a conclusion, I recommend that Localizer use the standard Zope
behavior by default, and only enable its early conversion when some new
environment variable, for instance LOCALIZER_UNICODE_CONVERSION, is set.
This will only be useful to people who have half-translated their site
(some Unicode from the message catalog, and still some non-ASCII in the
templates).



A final digression about ZPT:

I think the correct way to build the result of a ZPT would be to build a
Unicode strings as soon as TALIntepreter detects a non-ASCII string. It
would then encode the non-ASCII to Unicode using some kind of site- or
page-default encoding. This would avoid most of our problems, and would
anyway be more robust. It would simply mean replacing StringIO's
(actually FasterStringIO's) getvalue method with an intelligent join
that does the conversion I just outlined if needed.

There remains the problem of deciding which is the default encoding to
use...



Thanks for any comments (and please watch where you send them!).


Florent



--
J. David Ibáñez, http://www.j-david.net
Software Engineer / Ingénieur Logiciel / Ingeniero de Software

utests.zexp
Description: Binary data

[Prev in Thread]

Current Thread

[Next in Thread]

[Nuxeo-localizer] Fix for UnicodeError: ASCII decoding error: ordinal not in range(128), Florent Guillaume, 2003/01/23
- [Nuxeo-localizer] range(128), Juan David Ibáñez Palomar <=
- [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128), vlado, 2003/01/24
  - Re: [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128), Myroslav Opyr, 2003/01/25
    - Re: [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128), Vladimir Iliev, 2003/01/27
    - Re: [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128), Florent Guillaume, 2003/01/27
    - Re: [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128), Myroslav Opyr, 2003/01/27
    - Re: [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128), Florent Guillaume, 2003/01/27

Prev by Date: [Nuxeo-localizer] Fix for UnicodeError: ASCII decoding error: ordinal not in range(128)
Next by Date: [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128)
Previous by thread: [Nuxeo-localizer] Fix for UnicodeError: ASCII decoding error: ordinal not in range(128)
Next by thread: [Nuxeo-localizer] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128)
Index(es):
- Date
- Thread