[MIT-Scheme-devel] UTF-8 sequences

mit-scheme-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[MIT-Scheme-devel] UTF-8 sequences

From:	Matt Birkholz
Subject:	[MIT-Scheme-devel] UTF-8 sequences
Date:	Wed, 18 Feb 2015 14:52:48 -0700

> From: <address@hidden>
> Date: Wed, 18 Feb 2015 10:55:29 +0100
> 
> Hello fellow Schemers!
> 
> I've run into a curious problem. I'm working with UTF-8 files. Generally
> things work very well, however (on a UTF-8 terminal):
> 
> 1 ]=> "ä"
> 
> ;Value 13: "ä"
> 
> 1 ]=> "ß"
> 
> ;Value 14: "Ã\237"

Did you first do this?

    (port/set-coding console-input-port 'UTF-8)

I did that on an Ubuntu 14.04 Gnome Terminal where I had Set Character
Encoding to Unicode (UTF-8) and eszett was handled correctly, by
conversion to a Latin-1 (narrow) string, length 1: "\337".

> 1 ]=> "\303\244"
> 
> ;Value 15: "ä"

The "\303\244" format is interpreted as Latin-1.  With the output
port coding set to utf-8, it is written by Scheme and displayed by
Gnome Terminal as "Ã¤" (correctly, for Latin-1).

> 1 ]=> "\303\237"
> 
> ;Value 16: "Ã\237"
> 
> Why does ä (\303\244) work fine, but ß (\303\237) not? 

Good question!  Not that anything is working "fine" until terminal and
Scheme agree on the character encoding, but if 0244 was not
slashified, why was 0237?  Smells like a bug.

> I've also noticed that DISPLAY works fine, while WRITE does not.

The display procedure SEEMS to work because it does not "slashify"
non-graphical, non-Latin-1 characters.

> Does anyone have any idea how to fix this or where I should look in the
> source?

src/runtime/unpars.scm:412:(define (unparse/string string)

[Prev in Thread]

Current Thread

[Next in Thread]

[MIT-Scheme-devel] UTF-8 sequences, craven, 2015/02/18
- Re: [MIT-Scheme-devel] UTF-8 sequences, craven, 2015/02/18
- [MIT-Scheme-devel] UTF-8 sequences, Matt Birkholz <=
  - [MIT-Scheme-devel] UTF-8 sequences, Matt Birkholz, 2015/02/18
    - Re: [MIT-Scheme-devel] UTF-8 sequences, craven, 2015/02/19
    - Re: [MIT-Scheme-devel] UTF-8 sequences, Taylor R Campbell, 2015/02/19

Prev by Date: Re: [MIT-Scheme-devel] UTF-8 sequences
Next by Date: [MIT-Scheme-devel] UTF-8 sequences
Previous by thread: Re: [MIT-Scheme-devel] UTF-8 sequences
Next by thread: [MIT-Scheme-devel] UTF-8 sequences
Index(es):
- Date
- Thread