chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] Numbers egg interaction with other compiled code.


From: Tony Sidaway
Subject: Re: [Chicken-hackers] Numbers egg interaction with other compiled code.
Date: Sat, 24 Oct 2009 14:37:33 +0100

On 10/24/09, John Cowan <address@hidden> wrote:
> Alex Shinn scripsit:
>
>> On the other hand, in 95% of cases external libraries don't
>> need to treat a string as utf8 and it will work fine.
>
> Provided they don't mutate it or use string-ref directly, I agree.
> But if you try to truncate a string to five characters, and what you
> get is five bytes, Bad Things.


I was under the impression that Chicken was already unicode-aware. but
apparently it's only partial.

The Euro symbol "€" is utf-8 #x20ac

(string-length (string #\€))
===> 1

(string=? (string #\€) (string (integer->char #x20ac)))
===> #t

(char=? (integer->char #x20ac) #\€)
===> #t

but (on my system at least):

(number->string (char->integer (string-ref (string (integer->char
#x20ac )) 0)) 16)
===> "ac"

This is deeply puzzling.  string-length knows that the string is a
single character.  but string-ref will only let you look at the first
byte. And worse, it refuses to look at the second byte because as far
as it's concerned the string only contains 1 byte:

(string-ref (string (integer->char  #x20ac )) 1)
===> Error: (string-ref) out of range

This sounds like something that is relatively easy to fix. There's no
reason that I can think of why Chicken shouldn't be fully UTF-aware,
if it is capable of recognising, encoding and decoding UTF-8
characters.

Is this a limitation due to Chicken's being a Scheme-to-C implementation?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]