[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[MIT-Scheme-devel] UTF-8 sequences
From: |
Matt Birkholz |
Subject: |
[MIT-Scheme-devel] UTF-8 sequences |
Date: |
Wed, 18 Feb 2015 14:52:48 -0700 |
> From: <address@hidden>
> Date: Wed, 18 Feb 2015 10:55:29 +0100
>
> Hello fellow Schemers!
>
> I've run into a curious problem. I'm working with UTF-8 files. Generally
> things work very well, however (on a UTF-8 terminal):
>
> 1 ]=> "ä"
>
> ;Value 13: "ä"
>
> 1 ]=> "ß"
>
> ;Value 14: "Ã\237"
Did you first do this?
(port/set-coding console-input-port 'UTF-8)
I did that on an Ubuntu 14.04 Gnome Terminal where I had Set Character
Encoding to Unicode (UTF-8) and eszett was handled correctly, by
conversion to a Latin-1 (narrow) string, length 1: "\337".
> 1 ]=> "\303\244"
>
> ;Value 15: "ä"
The "\303\244" format is interpreted as Latin-1. With the output
port coding set to utf-8, it is written by Scheme and displayed by
Gnome Terminal as "ä" (correctly, for Latin-1).
> 1 ]=> "\303\237"
>
> ;Value 16: "Ã\237"
>
> Why does ä (\303\244) work fine, but ß (\303\237) not?
Good question! Not that anything is working "fine" until terminal and
Scheme agree on the character encoding, but if 0244 was not
slashified, why was 0237? Smells like a bug.
> I've also noticed that DISPLAY works fine, while WRITE does not.
The display procedure SEEMS to work because it does not "slashify"
non-graphical, non-Latin-1 characters.
> Does anyone have any idea how to fix this or where I should look in the
> source?
src/runtime/unpars.scm:412:(define (unparse/string string)