[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Demexp-dev] Character encoding
From: |
David MENTRE |
Subject: |
Re: [Demexp-dev] Character encoding |
Date: |
Mon, 22 Oct 2007 08:57:46 +0200 |
Hello Lyu,
2007/10/22, Lyu Abe <address@hidden>:
> There's one thing I do not understand in character coding of the
> server's reply. When I display, for example, tag sets, I can read this:
>
> 'a_tag_label': u'citoyennet\xe9'
>
> in which " u'citoyennet\xe9' " corresponds to an unicode encoded text,
> right?
Yes.
> Then I do not understand why we get unicode encoded strings,
> while DEMEXP is supposed to have UTF-8 encoding...
"UTF-8 is the byte-oriented encoding form of Unicode."
http://www.unicode.org/faq/utf_bom.html#2
In other words, all strings on the server are stored in the UTF-8 byte
encoding of the Unicode encoding. All exchanges between the server and
the clients are done in UTF-8, a byte convention to represent Unicode
characters.
After that, each platform is free to do any appropriate conversion,
e.g. use 16 or 32 bits character encoding if they will. However, you
should take care to set the default Python encoding to UTF-8 when you
dialogue with the server.
To be honest, right now, the server does not check much this encoding.
It mainly came from the GTK2 interface that produces UTF-8 strings.
:-) But that should be done at one point.
Best wishes,
d.
- [Demexp-dev] Web Client Draft [status], Lyu Abe, 2007/10/21
- Re: [Demexp-dev] Web Client Draft [status], David MENTRE, 2007/10/21
- [Demexp-dev] Character encoding, Lyu Abe, 2007/10/22
- Re: [Demexp-dev] Character encoding,
David MENTRE <=
- Re: [Demexp-dev] Character encoding, Thomas Petazzoni, 2007/10/22
- Re: [Demexp-dev] Character encoding, Lyu Abe, 2007/10/22
- Re: [Demexp-dev] Character encoding, David MENTRE, 2007/10/22
- Re: [Demexp-dev] Character encoding, Thomas Petazzoni, 2007/10/22
- Re: [Demexp-dev] Character encoding, David MENTRE, 2007/10/22