mit-scheme-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MIT-Scheme-devel] UTF-8 sequences


From: Matt Birkholz
Subject: [MIT-Scheme-devel] UTF-8 sequences
Date: Wed, 18 Feb 2015 14:52:48 -0700

> From: <address@hidden>
> Date: Wed, 18 Feb 2015 10:55:29 +0100
> 
> Hello fellow Schemers!
> 
> I've run into a curious problem. I'm working with UTF-8 files. Generally
> things work very well, however (on a UTF-8 terminal):
> 
> 1 ]=> "ä"
> 
> ;Value 13: "ä"
> 
> 1 ]=> "ß"
> 
> ;Value 14: "Ã\237"

Did you first do this?

    (port/set-coding console-input-port 'UTF-8)

I did that on an Ubuntu 14.04 Gnome Terminal where I had Set Character
Encoding to Unicode (UTF-8) and eszett was handled correctly, by
conversion to a Latin-1 (narrow) string, length 1: "\337".

> 1 ]=> "\303\244"
> 
> ;Value 15: "ä"

The "\303\244" format is interpreted as Latin-1.  With the output
port coding set to utf-8, it is written by Scheme and displayed by
Gnome Terminal as "ä" (correctly, for Latin-1).

> 1 ]=> "\303\237"
> 
> ;Value 16: "Ã\237"
> 
> Why does ä (\303\244) work fine, but ß (\303\237) not? 

Good question!  Not that anything is working "fine" until terminal and
Scheme agree on the character encoding, but if 0244 was not
slashified, why was 0237?  Smells like a bug.

> I've also noticed that DISPLAY works fine, while WRITE does not.

The display procedure SEEMS to work because it does not "slashify"
non-graphical, non-Latin-1 characters.

> Does anyone have any idea how to fix this or where I should look in the
> source?

src/runtime/unpars.scm:412:(define (unparse/string string)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]