|
From: | Paolo Bonzini |
Subject: | Re: [Help-smalltalk] Iliad: problem with UTF-8 in text: what the heck????? |
Date: | Sat, 08 Aug 2009 10:16:28 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Lightning/1.0pre Thunderbird/3.0b2 |
hmm... just now Paolo's reply popped up. Might be an opportunity to ask what gst input should be encoded as? Are strings "just byte arrays" or do we have encoders and a canonical internal representation?
Strings should match whatever the LC_* environment variables say. If you manually use EncodedStream and methods such as #asString:/#asUnicodeString: you can use strings in whatever encoding you want.
Anyways, the à is a dead giveaway, as it's the ISO-8859-1 representation of one of the multibyte markers in UTF-8. So it could be two things: - your browser uses ISO-8859 encoding when it should be using UTF-8 - your input was UTF-8 encoded but got parsed as ISO-8859
Indeed. Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |