dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DotGNU]UCS4 support in Xml


From: Michal Moskal
Subject: Re: [DotGNU]UCS4 support in Xml
Date: Mon, 10 Mar 2003 09:32:07 +0100
User-agent: Mutt/1.4i

On Sun, Mar 09, 2003 at 05:19:29PM -0700, minddog wrote:
> Hey,
>       I just added the internal XmlStreamReader class that will complement 
> the 
> normal IO StreamReader for UCS4 support.  Heres my question though, should we 
> make all handling of the encoding portations of XmlStreamReader, UCS4Encoding 
> instead of Encoding?  I'm not very educated on this subject, but UCS4 is 
> basically a larger set of characters opposed to UCS2.  Is it 16bit for UCS2 
> and 32bit for UCS4 ?  Some help here might answer my own questions =) 

One can encode any unicode character in utf-8, ucs-2 and ucs-4. You
simply just need 2 or more bytes/words/whatever to encode one
characters. In UTF-8 uses bytes, UCS-2 uses 16-bit words and UCS-4 uses
32-bit words. 

UCS-4 is good as internal representation since it gives constant time
indexing (each and every character takes 4 bytes, period), but in case
of mostly ascii text it causes 4x space blowup.

XML files mostly use utf-8 since it's most compact.

Visit http://www.unicode.org for more info.

-- 
: Michal Moskal ::::: malekith/at/pld-linux.org :  GCS {C,UL}++++$ a? !tv
: PLD Linux ::::::: Wroclaw University, CS Dept :  {E-,w}-- {b++,e}>+++ h



reply via email to

[Prev in Thread] Current Thread [Next in Thread]