help-smalltalk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: {Spam?} Why string should be collection of single byte characters? (


From: Paolo Bonzini
Subject: Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)
Date: Fri, 07 Jul 2006 17:59:06 +0200
User-agent: Thunderbird 1.5.0.4 (Macintosh/20060530)


I DO think that strlen is not for unicode(actually multi-byte encoded case)
string and is bad design: limited to single byte encoding.
I think it's different than this. strlen counts bytes. mbrlen counts characters. In Smalltalk #size returns allocation units: only if we stored everything in UTF-32 (no, UTF-16 would not suffice) would this mean characters.
 I DO think that
modern language should consider unicode like string. I DO think Smalltalk is
MODERN :-)
I do think that modern languages should support Unicode and you're right that GNU Smalltalk (mostly) does not. I don't think they should dismiss character encodings based on bytes, like UTF-8. These should remain the primary representation in my opinion, especially if like in UTF-8 you don't have any problem in finding the first byte of a character (unlike JIS-0212 or GB-2312) and no need for escape sequences (unlike ISO-2022).

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]