Re: {Spam?} Why string should be collection of single byte characters? (

help-smalltalk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: {Spam?} Why string should be collection of single byte characters? (

From:	Paolo Bonzini
Subject:	Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)
Date:	Fri, 07 Jul 2006 17:34:59 +0200
User-agent:	Thunderbird 1.5.0.4 (Macintosh/20060530)

Sungjin Chun wrote:

Hi,

For me, string should not be limited to collection of single byte
characters. String is string not a simple collection of byte, isn't it? I
think squeak's approach (or OpenStep's approach, where abstract public
string class and concrete private subclasses of string that implements
several cases of string). But I'm not currently working hard on GNU
Smalltalk, this may not be the best idea for GNU Smalltalk's case :-)

There's already CharacterArray as a superclass of String. It probablywould not be hard to have a UnicodeString subclass of CharacterArray,and use that instead of WordArray inside the I18N package. I'd alsoneed UnicodeCharacter, probably.


I'm working on it in my spare time, I attach my current prototype patch.

Paolo

--- orig/i18n/Sets.st
+++ mod/i18n/Sets.st
@@ -89,70 +89,70 @@
 
 Namespace current: Smalltalk.I18N.Encoders!
 
-Encoder subclass: #FromUCS4
+Encoder subclass: #FromUTF32
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'i18n-Character sets'!
 
-FromUCS4 comment:
-'This class is a superclass for classes that convert from UCS4
+FromUTF32 comment:
+'This class is a superclass for classes that convert from UTF-32
 characters (encoded as 32-bit Integers) to bytes in another
 encoding (encoded as Characters).'!
 
-Encoder subclass: #ToUCS4
+Encoder subclass: #ToUTF32
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'i18n-Character sets'!
 
-ToUCS4 comment:
+ToUTF32 comment:
 'This class is a superclass for classes that convert from bytes
-(encoded as Characters) to UCS4 characters (encoded as 32-bit
+(encoded as Characters) to UTF-32 characters (encoded as 32-bit
 Integers to simplify the code and to avoid endianness conversions).'!
 
-ToUCS4 subclass: #ComposeUCS4LE
+ToUTF32 subclass: #ComposeUTF32LE
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'i18n-Character sets'!
 
-ComposeUCS4LE comment:
-'This class is used internally to provide UCS4 characters encoded as
-32-bit integers for a descendent of FromUCS4, when the starting
+ComposeUTF32LE comment:
+'This class is used internally to provide UTF-32 characters encoded as
+32-bit integers for a descendent of FromUTF32, when the starting
 encoding is little-endian.'!
 
-ToUCS4 subclass: #ComposeUCS4BE
+ToUTF32 subclass: #ComposeUTF32BE
        instanceVariableNames: ''
        classVariableNames: ''
        poolDictionaries: ''
        category: 'i18n-Character sets'!
 
-ComposeUCS4BE comment:
-'This class is used internally to provide UCS4 characters encoded as
-32-bit integers for a descendent of FromUCS4, when the starting
+ComposeUTF32BE comment:
+'This class is used internally to provide UTF-32 characters encoded as
+32-bit integers for a descendent of FromUTF32, when the starting
 encoding is big-endian.'!
 
-FromUCS4 subclass: #SplitUCS4LE
+FromUTF32 subclass: #SplitUTF32LE
        instanceVariableNames: 'wch'
        classVariableNames: ''
        poolDictionaries: ''
        category: 'i18n-Character sets'!
 
-SplitUCS4LE comment:
+SplitUTF32LE comment:
 'This class is used internally to split into four 8-bit characters
-the 32-bit UCS4 integers coming from a descendent of ToUCS4, when
+the 32-bit UTF-32 integers coming from a descendent of ToUTF32, when
 the destination encoding is little-endian.'!
 
-FromUCS4 subclass: #SplitUCS4BE
+FromUTF32 subclass: #SplitUTF32BE
        instanceVariableNames: 'count wch'
        classVariableNames: ''
        poolDictionaries: ''
        category: 'i18n-Character sets'!
 
-SplitUCS4BE comment:
+SplitUTF32BE comment:
 'This class is used internally to split into four 8-bit characters
-the 32-bit UCS4 integers coming from a descendent of ToUCS4, when
+the 32-bit UTF-32 integers coming from a descendent of ToUTF32, when
 the destination encoding is big-endian.'!
 
 Encoder subclass: #Iconv
@@ -166,21 +166,21 @@
 'This class is used to delegate the actual character set conversion
 to the C library''s iconv function.  Most conversions use iconv as
 the only step in the conversions, sometimes the structure is 
-ToUCS4+SplitUCS4xx+Iconv or Iconv+ComposeUCS4xx+FromUCS4, rarely
+ToUTF32+SplitUTF32xx+Iconv or Iconv+ComposeUTF32xx+FromUTF32, rarely
 Iconv is skipped altogether and only Smalltalk converters are used.'!
 
-FromUCS4 subclass: #ToUTF7
+FromUTF32 subclass: #ToUTF7
        instanceVariableNames: 'left value lookahead'
        classVariableNames: 'Base64Characters DirectCharacters ToBase64'
        poolDictionaries: ''
        category: 'i18n-Encodings'!
 
 ToUTF7 comment:
-'This class implements a converter that transliterates UCS4
+'This class implements a converter that transliterates UTF-32
 characters (encoded as 32-bit Integers) to UTF-7 encoded
 characters.'!
 
-ToUCS4 subclass: #FromUTF7
+ToUTF32 subclass: #FromUTF7
        instanceVariableNames: 'shift wch lookahead'
        classVariableNames: 'DirectCharacters FromBase64'
        poolDictionaries: ''
@@ -188,7 +188,7 @@
 
 ToUTF7 comment:
 'This class implements a converter that transliterates UTF-7
-encoded characters to UCS4 values (encoded as 32-bit Integers).'!
+encoded characters to UTF-32 values (encoded as 32-bit Integers).'!
 
 Namespace current: Smalltalk.I18N!
 
@@ -241,9 +241,9 @@
 !Encoder methodsFor: 'private - initialization'!
 
 initializeFrom: fromEncoding to: toEncoding origin: aStringOrStream
-    origin := aStringOrStream isString
-       ifTrue: [ aStringOrStream readStream ]
-       ifFalse: [ aStringOrStream ].
+    origin := (aStringOrStream isKindOf: Stream)
+       ifFalse: [ aStringOrStream readStream ]
+       ifTrue: [ aStringOrStream ].
 
     self flush
 ! !
@@ -258,27 +258,27 @@
     }
 !
 
-registerEncoderFor: arrayOfAliases toUCS4: toUCS4Class fromUCS4: fromUCS4Class
+registerEncoderFor: arrayOfAliases toUTF32: toUTF32Class fromUTF32: 
fromUTF32Class
     "Register the two classes that will respectively convert from the
-     charsets in arrayOfAliases to UCS4 and vice versa.
+     charsets in arrayOfAliases to UTF-32 and vice versa.
 
      The former class is a stream that accepts characters and returns
-     (via #next) integers representing UCS-4 character codes, while
-     the latter accepts UCS-4 character codes and converts them to
+     (via #next) integers representing UTF-32 character codes, while
+     the latter accepts UTF-32 character codes and converts them to
      characters.  For an example see respectively FromUTF7 and ToUTF7
      (I admit it is not a trivial example)."
 
     EncodersRegistry := EncodersRegistry copyWith:
-       { arrayOfAliases. toUCS4Class. fromUCS4Class }
+       { arrayOfAliases. toUTF32Class. fromUTF32Class }
 ! !
 
 !EncodedStream class methodsFor: 'private - triangulating'!
 
 bigEndianPivot
     "When only one of the sides is implemented in Smalltalk
-     and the other is obtained via iconv, we use UCS-4 to
+     and the other is obtained via iconv, we use UTF-32 to
      marshal data from Smalltalk to iconv; answer whether we
-     should encode UCS-4 characters as big-endian."
+     should encode UTF-32 characters as big-endian."
     ^Memory bigEndian
 !
 
@@ -287,29 +287,119 @@
      and the other is obtained via iconv, we need a common
      pivot encoding to marshal data from Smalltalk to iconv.
      Answer the iconv name of this encoding."
-    ^self bigEndianPivot ifTrue: [ 'UCS-4BE' ] ifFalse: [ 'UCS-4LE' ]
+    ^self bigEndianPivot ifTrue: [ 'UTF-32BE' ] ifFalse: [ 'UTF-32LE' ]
 !
 
-split: input
+split: input to: encoding
     "Answer a pipe with the given input stream (which produces
-     UCS-4 character codes as integers) and whose output is
+     UTF-32 character codes as integers) and whose output is
      a series of Characters in the required pivot encoding"
-    ^self bigEndianPivot
-       ifTrue: [ SplitUCS4BE on: input from: 'words' to: 'UCS4-BE' ]
-       ifFalse: [ SplitUCS4LE on: input from: 'words' to: 'UCS4-LE' ].
+    ^(encoding = 'UCS-4BE' or: [ encoding = 'UTF-32BE' ])
+       ifTrue: [ SplitUTF32BE on: input from: 'UTF-32' to: encoding ]
+       ifFalse: [ SplitUTF32LE on: input from: 'UTF-32' to: encoding ].
 !
 
-compose: input
+compose: input from: encoding
     "Answer a pipe with the given input stream (which produces
      Characters in the required pivot encoding) and whose output
-     is a series of integer UCS-4 character codes."
-    ^self bigEndianPivot
-       ifTrue: [ ComposeUCS4BE on: input from: 'UCS4-BE' to: 'words' ]
-       ifFalse: [ ComposeUCS4LE on: input from: 'UCS4-LE' to: 'words' ].
+     is a series of integer UTF-32 character codes."
+    ^(encoding = 'UCS-4BE' or: [ encoding = 'UTF-32BE' ])
+       ifTrue: [ ComposeUTF32BE on: input from: encoding to: 'UTF-32' ]
+       ifFalse: [ ComposeUTF32LE on: input from: encoding to: 'UTF-32' ].
 ! !
 
 !EncodedStream class methodsFor: 'instance creation'!
 
+encoding: aWordArray
+    "Answer a pipe of encoders that converts aWordArray (which contains
+     Integers for the Unicode values) to the current locale's default
+     charset."
+    ^self
+       encoding: aWordArray
+       as: Locale default charset
+!
+
+encoding: aStringOrStream as: toEncoding
+    "Answer a pipe of encoders that converts aWordArray (which contains
+     Integers for the Unicode values) to the supplied encoding (which
+     can be an ASCII String or Symbol)."
+    | pivot to encoderTo pipe |
+
+    "Adopt an uniform naming"
+    to := toEncoding asString.
+    (from = 'UTF-32' or: [ from = 'UCS-4' ])
+       ifTrue: [ to := self pivotEncoding ].
+    (to = 'UTF-16' or: [ to = 'UCS-2' ])
+       ifTrue: [ to := self pivotEncoding copyReplacing: '32' with: '16' ].
+
+    "If converting to the pivot encoding, we're done."
+    pivot := 'UTF-32'.
+    ((to startsWith: 'UCS-4') or: [ to startsWith: 'UTF-32' ])
+       ifTrue: [ pivot := to ].
+    pivot = 'UTF-32' ifTrue: [ pivot := self pivotEncoding ].
+
+    encoderTo := Iconv.
+    EncodersRegistry do: [ :each |
+       ((each at: 1) includes: to)
+           ifTrue: [ encoderTo := each at: 2 ]
+    ].
+
+    pipe := aStringOrStream.
+
+    "Split UTF-32 character codes into bytes if needed by iconv."
+    encoderTo == Iconv ifTrue: [ pipe := self split: pipe to: pivot ].
+
+    "If not converting to the pivot encoding, we need one more step."
+    to = pivot ifFalse: [
+        pipe := encoderTo on: aStringOrStream from: pivot to: toEncoding ].
+    ^pipe
+!
+
+unicodeOn: aStringOrStream
+    "Answer a pipe of encoders that converts aStringOrStream (which can
+     be a string or another stream) from the current locale's default
+     charset to integers representing Unicode character codes."
+    ^self
+       unicodeOn: aStringOrStream
+       encoding: Locale default charset
+!
+
+unicodeOn: aStringOrStream encoding: fromEncoding
+    "Answer a pipe of encoders that converts aStringOrStream
+     (which can be a string or another stream) from the supplied
+     encoding (which can be an ASCII String or Symbol) to
+     integers representing Unicode character codes."
+    | from pivot encoderFrom pipe |
+
+    "Adopt an uniform naming"
+    from := fromEncoding asString.
+    (from = 'UTF-32' or: [ from = 'UCS-4' ])
+       ifTrue: [ from := aStringOrStream utf32Encoding ].
+    (from = 'UTF-16' or: [ from = 'UCS-2' ])
+       ifTrue: [ from := aStringOrStream utf16Encoding ].
+
+    pivot := 'UTF-32'.
+    ((from startsWith: 'UCS-4') or: [ from startsWith: 'UTF-32' ])
+       ifTrue: [ pivot := from ].
+    pivot = 'UTF-32' ifTrue: [ pivot := self pivotEncoding ].
+
+    encoderFrom := Iconv.
+    EncodersRegistry do: [ :each |
+       ((each at: 1) includes: from)
+           ifTrue: [ encoderFrom := each at: 2 ]
+    ].
+
+    pipe := aStringOrStream.
+
+    "If not converting from the pivot encoding, we need one more step."
+    from = pivot ifFalse: [
+        pipe := encoderFrom on: aStringOrStream from: fromEncoding to: pivot ].
+
+    "Compose iconv-produced bytes into UTF-32 character codes if needed."
+    encoderFrom == Iconv ifTrue: [ pipe := self compose: pipe from: pivot ].
+    ^pipe
+!
+
 on: aStringOrStream from: fromEncoding
     "Answer a pipe of encoders that converts aStringOrStream
      (which can be a string or another stream) from the given
@@ -340,8 +430,20 @@
     "Adopt an uniform naming"
     from := fromEncoding asString.
     to := toEncoding asString.
-    from = 'UCS-4' ifTrue: [ from := 'UCS-4BE' ].
-    to = 'UCS-4' ifTrue: [ to := 'UCS-4BE' ].
+    (from = 'UTF-32' or: [ from = 'UCS-4' ])
+       ifTrue: [ from := aStringOrStream utf32Encoding ].
+    (from = 'UTF-16' or: [ from = 'UCS-2' ])
+       ifTrue: [ from := aStringOrStream utf16Encoding ].
+    (to = 'UTF-32' or: [ to = 'UCS-4' ])
+       ifTrue: [ to := self pivotEncoding ].
+    (to = 'UTF-16' or: [ to = 'UCS-2' ])
+       ifTrue: [ to := self pivotEncoding copyReplaceAll: '32' with: '16' ].
+
+    ((from startsWith: 'UCS-4') or: [ from startsWith: 'UTF-32' ])
+       ifTrue: [ pivot := from ].
+    ((to startsWith: 'UCS-4') or: [ to startsWith: 'UTF-32' ])
+       ifTrue: [ pivot := to ].
+    pivot = 'UTF-32' ifTrue: [ pivot := self pivotEncoding ].
 
     encoderFrom := encoderTo := Iconv.
     EncodersRegistry do: [ :each |
@@ -358,12 +460,12 @@
     "Else answer a `pipe' that takes care of triangulating.
      There is an additional complication: Smalltalk encoders
      read or provide a stream of character codes (respectively
-     if the source is UCS-4, or the target is UCS-4), while iconv
+     if the source is UTF-32, or the target is UTF-32), while iconv
      expects raw bytes.  So we add an intermediate layer if
      a mixed Smalltalk+iconv conversion is done: it converts
-     character codes --> bytes (SplitUCS4xx, used if iconv will
-     convert from UCS-4) or bytes --> character code (ComposeUCS4xx,
-     used if iconv will convert to UCS-4).
+     character codes --> bytes (SplitUTF32xx, used if iconv will
+     convert from UTF-32) or bytes --> character code (ComposeUTF32xx,
+     used if iconv will convert to UTF-32).
 
      There are five different cases (remember that at least one converter
      is not iconv, so `both use iconv' and `from = pivot = to' are banned):
@@ -373,7 +475,6 @@
        from uses iconv --> iconv + Compose + non-iconv (implies to ~= pivot)
        none uses iconv --> non-iconv + non-iconv (implies neither = pivot)"
 
-    pivot := self pivotEncoding.
     pipe := aStringOrStream.
     from = pivot
        ifFalse: [
@@ -382,16 +483,16 @@
 
            pipe := encoderFrom on: pipe from: fromEncoding to: pivot.
            encoderTo == Iconv ifTrue: [
-               pipe := self split: pipe.
+               pipe := self split: pipe to: pivot.
 
                "Check if we already reached the destination format."
                to = pivot ifTrue: [ ^pipe ].
            ].
        ].
 
-    "Compose iconv-produced bytes into UCS-4 character codes if needed."
+    "Compose iconv-produced bytes into UTF-32 character codes if needed."
     encoderFrom == Iconv ifTrue: [
-       pipe := self compose: pipe
+       pipe := self compose: pipe from: pivot
     ].
 
     ^encoderTo on: pipe from: pivot to: toEncoding.
@@ -399,7 +500,7 @@
 
 Namespace current: Smalltalk.I18N.Encoders!
 
-!FromUCS4 methodsFor: 'stream operation'!
+!FromUTF32 methodsFor: 'stream operation'!
 
 species
     "We answer a string of Characters encoded in our destination
@@ -407,15 +508,15 @@
     ^String
 ! !
 
-!ToUCS4 methodsFor: 'stream operation'!
+!ToUTF32 methodsFor: 'stream operation'!
 
 species
-    "We answer a WordArray of UCS4 characters encoded as a series of
+    "We answer a WordArray of UTF-32 characters encoded as a series of
      32-bit Integers."
     ^WordArray
 ! !
 
-!ComposeUCS4LE methodsFor: 'stream operation'!
+!ComposeUTF32LE methodsFor: 'stream operation'!
 
 next
     "Answer a 32-bit integer obtained by reading four 8-bit character
@@ -426,7 +527,7 @@
      (self nextInput asInteger bitShift: 24)
 ! !
 
-!ComposeUCS4BE methodsFor: 'stream operation'!
+!ComposeUTF32BE methodsFor: 'stream operation'!
 
 next
     "Answer a 32-bit integer obtained by reading four 8-bit character
@@ -439,7 +540,7 @@
           self nextInput asInteger     
 ! !
 
-!SplitUCS4LE methodsFor: 'stream operation'!
+!SplitUTF32LE methodsFor: 'stream operation'!
 
 atEnd
     "Answer whether the receiver can produce more characters"
@@ -474,7 +575,7 @@
     wch := 1
 ! !
 
-!SplitUCS4BE methodsFor: 'stream operation'!
+!SplitUTF32BE methodsFor: 'stream operation'!
 
 atEnd
     "Answer whether the receiver can produce more characters"
@@ -670,7 +771,7 @@
 !ToUTF7 class methodsFor: 'initialization'!
 
 initialize
-    "Initialize the tables used by the UCS4-to-UTF7 converter"
+    "Initialize the tables used by the UTF-32-to-UTF-7 converter"
 
     Base64Characters := #[
         16r00 16r00 16r00 16r00 16r00 16rA8 16rFF 16r03
@@ -806,7 +907,7 @@
 !FromUTF7 class methodsFor: 'initialization'!
 
 initialize
-    "Initialize the tables used by the UTF7-to-UCS4 converter"
+    "Initialize the tables used by the UTF-7-to-UTF-32 converter"
 
     FromBase64 := #[
        62 99 99 99 63
@@ -842,7 +943,7 @@
 !FromUTF7 methodsFor: 'converting'!
 
 atEnd
-    "Answer whether the receiver can produce another UCS4 32-bit
+    "Answer whether the receiver can produce another UTF-32 32-bit
      encoded integer"
     ^lookahead isNil
 !
@@ -878,7 +979,7 @@
     "Flush any remaining state left in the encoder by the last character
      (this is because UTF-7 encodes 6 bits at a time, so it takes three
      characters before it can provide a single 16-bit character and
-     up to six characters before it can provide a full UCS-4 character)."
+     up to six characters before it can provide a full UTF-32 character)."
     shift := 0.
     lookahead := self getNext.
 ! !
@@ -977,16 +1078,42 @@
 description
     "Answer a textual description of the exception."
     ^'unknown encoding specified'! !
-
+
 
 "Now add some extensions to the system classes"
 
-(CharacterArray classPool includesKey: #DefaultEncoding)
-    ifFalse: [ CharacterArray addClassVarName: #DefaultEncoding ]!
-
 !CharacterArray class methodsFor: 'converting'!
 
 defaultEncoding
+    self subclassResponsibility!
+
+!CharacterArray methodsFor: 'converting'!
+
+encoding
+    "Answer the encoding of the receiver, assuming it is in the
+     default locale's default charset"
+
+    self class defaultEncoding asString = 'UTF-16'
+       ifTrue: [ ^self utf16Encoding ].
+    self class defaultEncoding asString = 'UTF-32'
+       ifTrue: [ ^self utf32Encoding ].
+    ^self class defaultEncoding!
+
+utf16Encoding
+    "Answer the encoding of the receiver, assuming it's UTF-16"
+    ^Memory bigEndian ifTrue: [ 'UTF-16BE' ] ifFalse: [ 'UTF-16LE' ]!
+
+utf32Encoding
+    "Answer the encoding of the receiver, assuming it's UTF-32"
+    ^Memory bigEndian ifTrue: [ 'UTF-32BE' ] ifFalse: [ 'UTF-32LE' ]! !
+
+
+(String classPool includesKey: #DefaultEncoding)
+    ifFalse: [ String addClassVarName: #DefaultEncoding ]!
+
+!String class methodsFor: 'converting'!
+
+defaultEncoding
     "Answer the default locale's default charset"
     DefaultEncoding isNil
        ifTrue: [ DefaultEncoding := Locale default charset ].
@@ -999,15 +1126,28 @@
     DefaultEncoding := aString
 ! !
 
-!CharacterArray methodsFor: 'converting'!
+!String methodsFor: 'converting'!
 
-encoding
-    "Answer the encoding of the receiver, assuming it is in the
-     default locale's default charset"
+asUnicode
+    "Return a WordArray with the contents of the receiver, interpreted
+     as the default locale character set."
+    ^(EncodedStream unicodeOn: self) contents!
+
+asUnicode: aString
+    "Return a WordArray with the contents of the receiver, interpreted
+     as the default locale character set."
+    ^(EncodedStream unicodeOn: self encoding: aString) contents! !
+
+!String methodsFor: 'converting'!
+
+utf32Encoding
+    "Assuming the receiver is encoded as UTF-16 with a proper
+     endianness marker, answer the correct encoding of the receiver."
 
-    ^self class defaultEncoding asString = 'UTF-16'
-       ifTrue: [ self utf16Encoding ]
-       ifFalse: [ self class defaultEncoding ]
+    | b1 b2 bigEndian |
+    b1 := self at: 1.          "Low byte"
+    bigEndian := b1 = 0.
+    ^bigEndian ifTrue: [ 'UTF-32BE' ] ifFalse: [ 'UTF-32LE' ]
 !
 
 utf16Encoding
@@ -1026,12 +1166,53 @@
     ^bigEndian ifTrue: [ 'UTF-16BE' ] ifFalse: [ 'UTF-16LE' ]
 ! !
 
+
+!WordArray class methodsFor: 'converting'!
+
+defaultEncoding
+    ^'UTF-32'! !
+
+!WordArray methodsFor: 'converting'!
+
+utf16Encoding
+    self shouldNotImplement! !
+
+!WordArray methodsFor: 'converting'!
+
+encoded
+    "Return a String with the contents of the receiver, converted
+     into the default locale character set."
+    ^(EncodedStream encoding: self) contents!
+
+encodedAs: aString
+    "Return a String with the contents of the receiver, converted
+     into the aString locale character set."
+    ^(EncodedStream encoding: self as: aString) contents!
+
+
 !PositionableStream methodsFor: 'converting'!
 
 encoding
     "Answer the encoding of the underlying collection"
-    ^collection encoding
-! !
+    ^collection encoding!
+
+utf16Encoding
+    "Answer the encoding of the underlying collection, assuming it's UTF-16"
+    ^collection utf16Encoding!
+
+utf32Encoding
+    "Answer the encoding of the underlying collection, assuming it's UTF-32"
+    ^collection utf32Encoding! !
+
+!Stream methodsFor: 'converting'!
+
+utf16Encoding
+    "Answer the encoding of the underlying collection, assuming it's UTF-16"
+    ^Memory bigEndian ifTrue: [ 'UTF-16BE' ] ifFalse: [ 'UTF-16LE' ]!
+
+utf32Encoding
+    "Answer the encoding of the underlying collection, assuming it's UTF-32"
+    ^Memory bigEndian ifTrue: [ 'UTF-32BE' ] ifFalse: [ 'UTF-32LE' ]! !
 
 Encoders.ToUTF7 initialize!
 Encoders.FromUTF7 initialize!

[Prev in Thread]

Current Thread

[Next in Thread]

[Help-smalltalk] Re: Starting with smalltalk, Paolo Bonzini, 2006/07/03
- Re: [Help-smalltalk] Re: Starting with smalltalk, Mike Anderson, 2006/07/05
  - Re: [Help-smalltalk] Re: Starting with smalltalk, Bram Neijt, 2006/07/05
    - Re: [Help-smalltalk] Re: Starting with smalltalk, Paolo Bonzini, 2006/07/06
    - Message not available
    - Re: [Help-smalltalk] Re: Starting with smalltalk, Paolo Bonzini, 2006/07/06
    - [Help-smalltalk] [Q] Unicode String?, Chun Sungjin, 2006/07/06
    - Re: [Help-smalltalk] [Q] Unicode String?, Paolo Bonzini, 2006/07/07
    - Re: [Help-smalltalk] [Q] Unicode String?, Chun Sungjin, 2006/07/07
    - Re: {Spam?} Re: [Help-smalltalk] [Q] Unicode String?, Paolo Bonzini, 2006/07/07
    - Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?), Sungjin Chun, 2006/07/07
    - Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?), Paolo Bonzini <=
    - Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?), Paolo Bonzini, 2006/07/09
    - Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?), Paolo Bonzini, 2006/07/07
    - Message not available
    - Message not available
    - Re: [Help-smalltalk] Re: Starting with smalltalk, Bram Neijt, 2006/07/06

Prev by Date: Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)
Next by Date: Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)
Previous by thread: Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)
Next by thread: Re: {Spam?} Why string should be collection of single byte characters? (WAS: Re: [Help-smalltalk] [Q] Unicode String?)
Index(es):
- Date
- Thread