help-smalltalk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-smalltalk] [Q] Unicode String?


From: Paolo Bonzini
Subject: Re: [Help-smalltalk] [Q] Unicode String?
Date: Fri, 07 Jul 2006 09:03:26 +0200
User-agent: Thunderbird 1.5.0.4 (Macintosh/20060530)

Chun Sungjin wrote:
Hi,

I've tried GNU smalltalk and for me it seems good. But I have a problem: current implementation does not support Unicode. It seems that it only supports single byte character only. I've also tried squeak, which seems less faster than GNU smalltalk - I'm not sure on this, this might not be correct - has unicode compatible string implementation and I think this kind of approach is good. Is there any change to have unicode compatible string implementation in next version of GNU smalltalk?
What do you need exactly? The main missing thing is support for Character objects with values above 256. However if you are content with multibyte character sets like UTF-8, or with Unicode character codes, that's fine.

For character set translation, if you load the I18N package, GNU Smalltalk gets an iconv wrapper. The main method you need is EncodedStream>>#on:from:to: (e.g. on: 'abc' from: 'UTF-8' to: 'UCS-4').

To extract Unicode character codes from an UCS-4LE encoded string, you can use (ByteStream on: x asByteArray) and send nextLong. For big-endian, there is no class but I was thinking of adding a #bigEndian method to ByteStream for the next version.

Things that could be useful are
   Integer>>#asUTF8String
   String class>>#utf8FromCodepoint: (same as above)
   String>>#utf8Stream
   UTF8Stream (returns Unicode character codes)
   ... (tell me what you need) ...

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]