[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-smalltalk] [Q] Bug in EncodedStream?
From: |
Paolo Bonzini |
Subject: |
Re: [Help-smalltalk] [Q] Bug in EncodedStream? |
Date: |
Mon, 16 Oct 2006 10:21:29 +0200 |
User-agent: |
Thunderbird 1.5.0.7 (Macintosh/20060909) |
Sungjin Chun wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
When I run following:
(I18N.EncodedStream encoding: (UnicodeString fromString: '전성진'))
contents !
gst emits endless messages related to garbage collecting then crashes
with segmentation faults.
Yes, it is a stupid bug. When using the system function iconv, gst has
to split the UnicodeCharacters back into 8-bit Characters, and here it
gets stuck in an infinite loop. The first character for example is
$<16rC804>, and the "C8" byte is created as a UnicodeCharacter rather
than a Character. This causes a recursive creation of another
I18N.EncodedStream.
The attached patch fixes the bug; thanks for reporting it.
In my testing, I only used Eastern-European characters where all bytes
are < 0x80.
And, are there any simple example for processing UTF-8 encoded string?
Can you expand?
Paolo
--- orig/i18n/Sets.st
+++ mod/i18n/Sets.st
@@ -718,13 +718,13 @@ next
been extracted."
wch := answer := self nextInput codePoint.
wch := (wch bitShift: -8) + 16r1000000.
- ^(answer bitAnd: 255) asCharacter
+ ^Character value: (answer bitAnd: 255)
].
"Answer any other byte"
answer := wch bitAnd: 255.
wch := wch bitShift: -8.
- ^answer asCharacter
+ ^Character value: answer
!
flush
@@ -754,7 +754,7 @@ next
wch := answer := self nextInput codePoint.
wch := wch bitAnd: 16rFFFFFF.
count := 3.
- ^(answer bitShift: -24) asCharacter
+ ^Character value: (answer bitShift: -24)
].
"Answer any other byte. We keep things so that the byte we answer
@@ -763,7 +763,7 @@ next
wch := wch bitAnd: 16rFFFF.
wch := wch bitShift: 8.
count := count - 1.
- ^answer asCharacter
+ ^Character value: answer
!
flush