chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] CR #1142 and upcoming changes


From: Felix Winkelmann
Subject: Re: [Chicken-hackers] CR #1142 and upcoming changes
Date: Wed, 20 Aug 2014 20:47:07 +0200 (CEST)

>> Well, actually we might as well support several: ASCII/Latin-1, UTF-8
>> and UCS-2/UCS-4. Without UTF-8 it would just be a variable
>> element-size option. But I agree that this doesn't make maintenance
>> any easier... Let's think some more about this. We don't have to
>> decide right now.
> 
> UCS-2 is obsolete; it would need to be UTF-16 (i.e. support of
> surrogates).

Hm. Wasn't wchar_t on Windows 16 bits? Do they use UTF-16 there?

> 
> In any case, Alex's point about the FFI is strong.  Even on Windows,
> UTF-8 is coming to be the dominant way to talk to C programs, and it's
> part of the spirit of Chicken (IIUC) that talking to C is clean and easy.
> On Posix systems, UTF-8 is massively dominant.
> 
> Similarly, on the Web, UTF-8 encodes a huge majority of all Web
> pages.  As of early 2012, UTF-8 (including pure ASCII) was at 80% (see
> <http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html>),
> and <http://w3techs.com/technologies/overview/character_encoding/all>
> shows it still rising.  These figures aren't comparable, because Google
> is using its whole index and the *effective* encoding, whereas W3Techs
> is using only a large subset (10 million sites, usually only page per
> site) and the declared encoding (HTTP header, HTML meta, etc.)  Still,
> both reports are loud and clear that UTF-8 is winning.  Not having to
> transcode web pages most of the time is a win too.

The internal representation is a different issue compared to the
external encoding. It would be nice if we could separate these two
things somehow. But perhaps it's too early for this. Also,
unicode-related discussion quickly get out of hand. So let's postpone
this until later.


felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]