dotgnu-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DotGNU]UCS4 support in Xml


From: Rhys Weatherley
Subject: Re: [DotGNU]UCS4 support in Xml
Date: Mon, 10 Mar 2003 19:01:33 +1000
User-agent: KMail/1.4.3

On Monday 10 March 2003 10:19 am, minddog wrote:
> Hey,
>       I just added the internal XmlStreamReader class that will complement the
> normal IO StreamReader for UCS4 support.  Heres my question though, should
> we make all handling of the encoding portations of XmlStreamReader,
> UCS4Encoding instead of Encoding?  I'm not very educated on this subject,
> but UCS4 is basically a larger set of characters opposed to UCS2.  Is it
> 16bit for UCS2 and 32bit for UCS4 ?  Some help here might answer my own
> questions =) Thanks.

UCS-2 is the character format that is used by most of C#, and that should be 
the standard way to process characters internally within the XML code.

All of the important UCS-4 characters can be represented in UCS-2, either 
directly as 16-bit values, or as pairs of 16-bit values (called surrogates).  
This gives an effective character set size of about 20 bits, which is pretty 
huge (over 1 million characters).

The UCS4Encoding class already takes care of converting 32-bit sequences into 
16-bit UCS-2 on the fly, inserting surrogates where necessary.  You should 
stick to UCS-2 everywhere else in the XML code, including in XmlStreamReader.  
It isn't worth using UCS-4 as the standard character set elsewhere.

Cheers,

Rhys.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]