gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Merging CoreBase into Base


From: Stefan Bidi
Subject: Re: Merging CoreBase into Base
Date: Mon, 12 Aug 2013 10:56:46 -0500

There are a couple of reasons why to use UTF-16:
(1) The CF/Foundation APIs assume UTF-16.  CFStringGetCharacterAtIndex() and CFStringGetCharacters() would be extremely inefficient for anything that isn't either ASCII, Latin1 or UTF-16.  Just look at what base has to do to support UTF-8.  It traverses through the whole string every time you call -characterAtIndex:.
(2) Almost all ICU APIs use UTF-16.

To address your concern about endianness, I don't think this is a problem at all.  The API to the outside world is still the same.  We store all strings in the host endianness and export them with the BOM if isExternalRepresentation is specified.

I can't use libc functions on almost anything except the most basic string functions.  Not even printf can be used because of the %@ specifier.


On Mon, Aug 12, 2013 at 10:31 AM, David Chisnall <address@hidden> wrote:
On 12 Aug 2013, at 16:26, Stefan Bidi <address@hidden> wrote:

> (2) I'm working towards making corebase use Unicode (ie UTF-16) internally wherever possible. I believe this is a saner choice than trying to deal with UTF-8.

I find this an odd observation.  UTF-16 is multibyte, so comes with all of the same pain as UTF-8, but has the disadvantage that it's almost always larger than UTF-16 (most two-byte characters in UTF-16 are also two-byte characters in UTF-16).  You also start hitting endian issues with UTF-16, whereas UTF-8 is endian-independent.  Finally, UTF-8 is the format that you typically want for input or output, as it's well supported by most libc functions, terminals, and so on.

David



reply via email to

[Prev in Thread] Current Thread [Next in Thread]