[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: {Spam?} Re: [Help-smalltalk] [Q] Bug in EncodedStream?
From: |
Paolo Bonzini |
Subject: |
Re: {Spam?} Re: [Help-smalltalk] [Q] Bug in EncodedStream? |
Date: |
Mon, 16 Oct 2006 13:05:02 +0200 |
User-agent: |
Thunderbird 1.5.0.7 (Macintosh/20060909) |
I mean that I want example code which shows good pattern on dealing
multibyte string :-) For example, I'm not sure whether this code is good
or not:
str _ UnicodeString fromString: 'Some UTF-8 Encoded String'.
It is if your default encoding is UTF-8, or if the encoded string
includes a byte-order mark (for this, you need the attached patch :-( ...).
For example, this works:
st> #[254 255 200 4 193 49 201 196] asString encoding!
'UTF-16BE'
str _ UnicodeString fromString: 'Some UTF-8 Encoded String' encoding:
UTF8StringEncoding.
UTF8StringEncoding is written 'UTF-8'.
Paolo
* auto-adding address@hidden/smalltalk--devo--2.2--patch-152 to greedy revision
library /Users/bonzinip/Archives/revlib
* found immediate ancestor revision in library
(address@hidden/smalltalk--devo--2.2--patch-151)
* patching for this revision (address@hidden/smalltalk--devo--2.2--patch-152)
--- orig/i18n/Sets.st
+++ mod/i18n/Sets.st
@@ -1289,21 +1289,21 @@ encoding
default locale's default charset"
| encoding |
- (self size >= 4 and: [ (self at: 1) = 0 and: [ (self at: 2) = 0 and: [
- (self at: 3) = 254 and: [
- (self at: 4) = 255 ]]]]) ifTrue: [ ^'UTF-32BE' ].
- (self size >= 4 and: [ (self at: 4) = 0 and: [ (self at: 3) = 0 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
+ (self size >= 4 and: [ (self valueAt: 1) = 0 and: [ (self valueAt: 2) = 0
and: [
+ (self valueAt: 3) = 254 and: [
+ (self valueAt: 4) = 255 ]]]]) ifTrue: [ ^'UTF-32BE' ].
+ (self size >= 4 and: [ (self valueAt: 4) = 0 and: [ (self valueAt: 3) = 0
and: [
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
(self size >= 2 and: [
- (self at: 1) = 254 and: [
- (self at: 2) = 255 ]]) ifTrue: [ ^'UTF-16BE' ].
+ (self valueAt: 1) = 254 and: [
+ (self valueAt: 2) = 255 ]]) ifTrue: [ ^'UTF-16BE' ].
(self size >= 2 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
- (self size >= 3 and: [ (self at: 1) = 16rEF and: [
- (self at: 2) = 16rBB and: [
- (self at: 3) = 16rBF ]]]) ifTrue: [ ^'UTF-8' ].
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
+ (self size >= 3 and: [ (self valueAt: 1) = 16rEF and: [
+ (self valueAt: 2) = 16rBB and: [
+ (self valueAt: 3) = 16rBF ]]]) ifTrue: [ ^'UTF-8' ].
encoding := self class defaultEncoding.
encoding asString = 'UTF-16' ifTrue: [ ^self utf16Encoding ].
@@ -1314,9 +1314,9 @@ utf32Encoding
"Assuming the receiver is encoded as UTF-16 with a proper
endianness marker, answer the correct encoding of the receiver."
- (self size >= 4 and: [ (self at: 4) = 0 and: [ (self at: 3) = 0 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
+ (self size >= 4 and: [ (self valueAt: 4) = 0 and: [ (self valueAt: 3) = 0
and: [
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]]]) ifTrue: [ ^'UTF-32LE' ].
^'UTF-32BE'
!
@@ -1325,8 +1325,8 @@ utf16Encoding
endianness marker, answer the correct encoding of the receiver."
(self size >= 2 and: [
- (self at: 2) = 254 and: [
- (self at: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
+ (self valueAt: 2) = 254 and: [
+ (self valueAt: 1) = 255 ]]) ifTrue: [ ^'UTF-16LE' ].
^'UTF-16BE'
! !