help-smalltalk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-smalltalk] Iliad: problem with UTF-8 in text: what the heck??


From: Bèrto ëd Sèra
Subject: Re: [Help-smalltalk] Iliad: problem with UTF-8 in text: what the heck?????
Date: Sat, 8 Aug 2009 21:25:00 +0300

Hi!

Headers seems okay:
Hypertext Transfer Protocol
    GET /ambaradan HTTP/1.1\r\n
    Host: localhost:8080\r\n
    User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11)
Gecko/2009080620 Gentoo Firefox/3.0.11\r\n
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
    Accept-Language: en-gb,en;q=0.8,ru;q=0.7,it;q=0.5,fr;q=0.3,es;q=0.2\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: UTF-8,*\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    Cookie: _iliad685744=viedbqwkee8t8plm00q451yqm1s7ae-u\r\n
    Cache-Control: max-age=0\r\n
    \r\n

It looks like there's something weird on my box... so I tried the manual version

((I18N.EncodedStream new nextPutAll: ' Feòrag NicBhrìde') asString)

This won't work because #NextPut: is subclassResponsibility...

((I18N.EncodedString fromString: ' Feòrag NicBhrìde') asUnicodeString)

will return ' Feòrag NicBhrìde'

ANYWAY....

just copying and pasting the correct text from BLOX will yeld the very
same result. So if I copy the example from the Transcript and paste it
here I get:
((I18N.EncodedString fromString: ' Feòrag NicBhrìde') asUnicodeString)

Which leads me to think that I might have got something wrong with
encoding at compilation time. Or that I should not use experimental
locales on my development box (which is much more likely to be the
true case). I'll make a check on Fedora, where locales are absolutely
standard.

Berto



2009/8/8 Paolo Bonzini <address@hidden>:
>
>> hmm... just now Paolo's reply popped up. Might be an opportunity
>> to ask what gst input should be encoded as? Are strings "just byte
>> arrays" or do we have encoders and a canonical internal representation?
>
> Strings should match whatever the LC_* environment variables say.  If you
> manually use EncodedStream and methods such as #asString:/#asUnicodeString:
> you can use strings in whatever encoding you want.
>
>> Anyways, the à is a dead giveaway, as it's the ISO-8859-1 representation
>> of one of the multibyte markers in UTF-8. So it could be two things:
>> - your browser uses ISO-8859 encoding when it should be using UTF-8
>> - your input was UTF-8 encoded but got parsed as ISO-8859
>
> Indeed.
>
> Paolo
>
>
> _______________________________________________
> help-smalltalk mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/help-smalltalk
>



-- 
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
viole les droits du peuple, l'insurrection est, pour le peuple et pour
chaque portion du peuple, le plus sacré des droits et le plus
indispensable des devoirs.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]