[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: {Spam?} Why string should be collection of single byte characters? (
From: |
Paolo Bonzini |
Subject: |
Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?) |
Date: |
Sun, 09 Jul 2006 16:23:58 +0200 |
User-agent: |
Thunderbird 1.5.0.4 (Macintosh/20060530) |
I'm working on it in my spare time, I attach my current prototype patch.
I have almost completed this, it's only about 400 lines of new code,
mostly in i18n/Sets.st. I have defined a new UnicodeString class, and
modified Character to have support for characters whose Unicode code
point is > 255. For ease of testing and usage, also, I've defined a
syntax $<279> that allows you to refer to a Character by its ASCII
value. It's equivalent to "279 asCharacter" -- I could have instead
inlined this at compile-time, but I prefer to have also a more compact
syntax.
The changes are mostly backwards compatible, but characters should *not*
be compared with ==, but with = unless you're sure the code point is <=
255. Similarly, they should *not* be printed with nextPut:, but with
display:, unless you're sure the code point is <= 127.
What follows is some use cases. This is in a UTF-8 locale but (subject
to the capabilities of your system's iconv function) it works as well
for every other locale.
I am not very expert in the *needs* of people using Unicode, so can you
please confirm that it is (close to) what you need? In particular, I'd
like feedback on what to do when in transcoding is not enabled, because
right now the behavior is inconsistent: see the notes preceded by ***.
Without the I18N package, the behavior is not complete and you can
store, but not print Unicode characters correctly:
Printing a Unicode character:
st> $<279> printNl!
$<16r0117>
Converting a Unicode character to String:
*** maybe should consider returning '?'
st> $<279> asString printNl!
error: Invalid argument <16r0117>: argument must be between $<0> and
$<16r00FF>
Converting a Unicode character to a UTF-32 String:
st> ($<279> asUnicodeString) printNl!
'<16r0117>'
Converting a UTF-32 String with a Unicode character to a byte-encoded
String:
*** maybe should give an error instead
st> $<279> asUnicodeString asString printNl!
'?'
Asking the number of characters to the resulting Strings:
st> $<279> asUnicodeString numberOfCharacters printNl!
1
st> $<279> asUnicodeString asString numberOfCharacters printNl!
error: should not be implemented in this class
Converting ByteArrays or Strings to UnicodeStrings:
st> #[196 151] asUnicodeString first printNl!
error: should not be implemented in this class
-----
After loading the I18N package, everything is much better:
Printing a Unicode character:
st> $<279> printNl!
$ė
Converting a Unicode character to String:
st> $<279> asString printNl!
'ė'
Converting a Unicode character to a UTF-32 String, and then back just by
printing it:
st> ($<279> asUnicodeString) printNl!
'ė'
Converting a UTF-32 String with a Unicode character to a byte-encoded
String:
st> $<279> asUnicodeString asString printNl!
'ė'
Asking the number of characters to the resulting Strings:
st> $<279> asUnicodeString numberOfCharacters printNl!
1
st> $<279> asUnicodeString asString numberOfCharacters printNl!
1
Converting ByteArrays or Strings to UnicodeStrings:
st> #[196 151] asUnicodeString first printNl!
$ė
st> #[196 151] asUnicodeString size printNl!
1
st> #[196 151] asUnicodeString numberOfCharacters printNl!
1
Paolo
- Re: [Help-smalltalk] Re: Starting with smalltalk, (continued)
- Re: [Help-smalltalk] Re: Starting with smalltalk, Mike Anderson, 2006/07/05
- Re: [Help-smalltalk] Re: Starting with smalltalk, Bram Neijt, 2006/07/05
- Re: [Help-smalltalk] Re: Starting with smalltalk, Paolo Bonzini, 2006/07/06
- Message not available
- Re: [Help-smalltalk] Re: Starting with smalltalk, Paolo Bonzini, 2006/07/06
- [Help-smalltalk] [Q] Unicode String?, Chun Sungjin, 2006/07/06
- Re: [Help-smalltalk] [Q] Unicode String?, Paolo Bonzini, 2006/07/07
- Re: [Help-smalltalk] [Q] Unicode String?, Chun Sungjin, 2006/07/07
- Re: {Spam?} Re: [Help-smalltalk] [Q] Unicode String?, Paolo Bonzini, 2006/07/07
- Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?), Sungjin Chun, 2006/07/07
- Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?), Paolo Bonzini, 2006/07/07
- Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?),
Paolo Bonzini <=
- Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?), Paolo Bonzini, 2006/07/07
- Message not available
- Message not available
- Re: [Help-smalltalk] Re: Starting with smalltalk, Bram Neijt, 2006/07/06