gnustep-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New ABI NSConstantString


From: David Chisnall
Subject: Re: New ABI NSConstantString
Date: Sat, 7 Apr 2018 10:26:37 +0100

On 7 Apr 2018, at 10:21, Ivan Vučica <address@hidden> wrote:
> 
> On Sat, Apr 7, 2018, 09:50 David Chisnall <address@hidden> wrote:
> 
> 
> My current plan is to make the format support ASCII, UTF-8, UTF-16, and 
> UTF-32, but only generate ASCII and UTF-16 in the compiler and then decide 
> later if we want to support generating UTF-8 and UTF-32.  I also won’t 
> initialise the hash in the compiler initially, until we’ve decided a bit more 
> what the hash should be.
> 
> Emojis don't fit UTF-16. Even if one dismisses CJK, ancient scripts etc, 
> constant strings are not absolutely unlikely to contain emojis.
> 
> Not supporting UTF-8 for internal storage may be reasonable, but not 
> supporting UTF-32 for strings that require it seems like a bug.

UTF-32 is not more expressive than UTF-16, and it’s not even more efficient 
than UTF-16 (all unicode characters can be expressed in either one or two 
UTF-16 characters, so in the worst case you need the same number of bytes to 
express a unicode character in UTF-16 and in the best case you need half as 
many).  The only advantage that UTF-32 has is of being a fixed-length encoding, 
but that isn’t actually very helpful when the APIs all refer to UTF-16 code 
units (and UTF-32 is not a fixed-length encoding of UTF-16 code units).

David




reply via email to

[Prev in Thread] Current Thread [Next in Thread]