help-smalltalk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: {Spam?} Re: [Help-smalltalk] [Q] Bug in EncodedStream?


From: Paolo Bonzini
Subject: Re: {Spam?} Re: [Help-smalltalk] [Q] Bug in EncodedStream?
Date: Mon, 16 Oct 2006 13:05:02 +0200
User-agent: Thunderbird 1.5.0.7 (Macintosh/20060909)


I mean that I want example code which shows good pattern on dealing
multibyte string :-) For example, I'm not sure whether this code is good
or not:

str _ UnicodeString fromString: 'Some UTF-8 Encoded String'.
It is if your default encoding is UTF-8, or if the encoded string includes a byte-order mark (for this, you need the attached patch :-( ...).

For example, this works:

st> #[254 255 200 4 193 49 201 196] asString encoding!
'UTF-16BE'
str _ UnicodeString fromString: 'Some UTF-8 Encoded String' encoding:
UTF8StringEncoding.
UTF8StringEncoding is written 'UTF-8'.

Paolo

* auto-adding address@hidden/smalltalk--devo--2.2--patch-152 to greedy revision 
library /Users/bonzinip/Archives/revlib
* found immediate ancestor revision in library 
(address@hidden/smalltalk--devo--2.2--patch-151)
* patching for this revision (address@hidden/smalltalk--devo--2.2--patch-152)
--- orig/i18n/Sets.st
+++ mod/i18n/Sets.st
@@ -1289,21 +1289,21 @@ encoding
      default locale's default charset"
 
     | encoding |
-    (self size >= 4 and: [ (self at: 1) = 0 and: [ (self at: 2) = 0 and: [
-       (self at: 3) = 254 and: [
-       (self at: 4) = 255 ]]]]) ifTrue: [ ^'UTF-32BE' ].
-    (self size >= 4 and: [ (self at: 4) = 0 and: [ (self at: 3) = 0 and: [
-       (self at: 2) = 254 and: [
-       (self at: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
+    (self size >= 4 and: [ (self valueAt: 1) = 0 and: [ (self valueAt: 2) = 0 
and: [
+       (self valueAt: 3) = 254 and: [
+       (self valueAt: 4) = 255 ]]]]) ifTrue: [ ^'UTF-32BE' ].
+    (self size >= 4 and: [ (self valueAt: 4) = 0 and: [ (self valueAt: 3) = 0 
and: [
+       (self valueAt: 2) = 254 and: [
+       (self valueAt: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
     (self size >= 2 and: [
-       (self at: 1) = 254 and: [
-       (self at: 2) = 255 ]]) ifTrue: [ ^'UTF-16BE' ].
+       (self valueAt: 1) = 254 and: [
+       (self valueAt: 2) = 255 ]]) ifTrue: [ ^'UTF-16BE' ].
     (self size >= 2 and: [
-       (self at: 2) = 254 and: [
-       (self at: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
-    (self size >= 3 and: [ (self at: 1) = 16rEF and: [
-       (self at: 2) = 16rBB and: [
-       (self at: 3) = 16rBF ]]]) ifTrue: [ ^'UTF-8' ].
+       (self valueAt: 2) = 254 and: [
+       (self valueAt: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
+    (self size >= 3 and: [ (self valueAt: 1) = 16rEF and: [
+       (self valueAt: 2) = 16rBB and: [
+       (self valueAt: 3) = 16rBF ]]]) ifTrue: [ ^'UTF-8' ].
 
     encoding := self class defaultEncoding.
     encoding asString = 'UTF-16' ifTrue: [ ^self utf16Encoding ].
@@ -1314,9 +1314,9 @@ utf32Encoding
     "Assuming the receiver is encoded as UTF-16 with a proper
      endianness marker, answer the correct encoding of the receiver."
 
-    (self size >= 4 and: [ (self at: 4) = 0 and: [ (self at: 3) = 0 and: [
-       (self at: 2) = 254 and: [
-       (self at: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
+    (self size >= 4 and: [ (self valueAt: 4) = 0 and: [ (self valueAt: 3) = 0 
and: [
+       (self valueAt: 2) = 254 and: [
+       (self valueAt: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
     ^'UTF-32BE'
 !
 
@@ -1325,8 +1325,8 @@ utf16Encoding
      endianness marker, answer the correct encoding of the receiver."
 
     (self size >= 2 and: [
-       (self at: 2) = 254 and: [
-       (self at: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
+       (self valueAt: 2) = 254 and: [
+       (self valueAt: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
     ^'UTF-16BE'
 ! !
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]