[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [DotGNU]UCS4 support in Xml
From: |
Michal Moskal |
Subject: |
Re: [DotGNU]UCS4 support in Xml |
Date: |
Mon, 10 Mar 2003 09:32:07 +0100 |
User-agent: |
Mutt/1.4i |
On Sun, Mar 09, 2003 at 05:19:29PM -0700, minddog wrote:
> Hey,
> I just added the internal XmlStreamReader class that will complement
> the
> normal IO StreamReader for UCS4 support. Heres my question though, should we
> make all handling of the encoding portations of XmlStreamReader, UCS4Encoding
> instead of Encoding? I'm not very educated on this subject, but UCS4 is
> basically a larger set of characters opposed to UCS2. Is it 16bit for UCS2
> and 32bit for UCS4 ? Some help here might answer my own questions =)
One can encode any unicode character in utf-8, ucs-2 and ucs-4. You
simply just need 2 or more bytes/words/whatever to encode one
characters. In UTF-8 uses bytes, UCS-2 uses 16-bit words and UCS-4 uses
32-bit words.
UCS-4 is good as internal representation since it gives constant time
indexing (each and every character takes 4 bytes, period), but in case
of mostly ascii text it causes 4x space blowup.
XML files mostly use utf-8 since it's most compact.
Visit http://www.unicode.org for more info.
--
: Michal Moskal ::::: malekith/at/pld-linux.org : GCS {C,UL}++++$ a? !tv
: PLD Linux ::::::: Wroclaw University, CS Dept : {E-,w}-- {b++,e}>+++ h