[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#11197: problems with string ports and unicode
From: |
Ludovic Courtès |
Subject: |
bug#11197: problems with string ports and unicode |
Date: |
Wed, 11 Apr 2012 23:01:16 +0200 |
User-agent: |
Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.93 (gnu/linux) |
Hi Mark,
Mark H Weaver <address@hidden> skribis:
> Okay, now I understand. The problem is that internally, string ports
> are implemented by converting the string into a stream of bytes in the
> string port's encoding, and then the string port reads those bytes.
Exactly.
[...]
> Conceptually, a string port is a textual port, not a binary port.
But not in Guile, where there’s no distinction between textual and
binary ports. One can write code like:
scheme@(guile-user)> (define (string->utf16 s)
(let ((p (with-fluids ((%default-port-encoding
"UTF-16BE"))
(open-input-string s))))
(get-bytevector-all p)))
scheme@(guile-user)> (string->utf16 "hello")
$4 = #vu8(0 104 0 101 0 108 0 108 0 111)
scheme@(guile-user)> (use-modules(rnrs bytevectors))
scheme@(guile-user)> (utf16->string $4)
$5 = "hello"
> You should be able to hand it an arbitrary string and read those
> characters from it, as described in SRFI-6, without setting
> Guile-specific fluid variables. Similarly, you should be able to
> write arbitrary characters to a string-output-port.
The SRFI-6 issue could be addressed with:
diff --git a/module/srfi/srfi-6.scm b/module/srfi/srfi-6.scm
index 098b586..ba946ec 100644
--- a/module/srfi/srfi-6.scm
+++ b/module/srfi/srfi-6.scm
@@ -1,6 +1,6 @@
;;; srfi-6.scm --- Basic String Ports
-;; Copyright (C) 2001, 2002, 2003, 2006 Free Software Foundation, Inc.
+;; Copyright (C) 2001, 2002, 2003, 2006, 2012 Free Software Foundation,
Inc.
;;
;; This library is free software; you can redistribute it and/or
;; modify it under the terms of the GNU Lesser General Public
@@ -23,10 +23,16 @@
;;; Code:
(define-module (srfi srfi-6)
- #:re-export (open-input-string open-output-string get-output-string))
+ #:export (open-input-string open-output-string)
+ #:re-export (get-output-string))
-;; Currently, guile provides these functions by default, so no action
-;; is needed, and this file is just a placeholder.
+(define (open-input-string s)
+ (with-fluids ((%default-port-encoding "UTF-8"))
+ ((@ (guile) open-input-string) s)))
+
+(define (open-output-string)
+ (with-fluids ((%default-port-encoding "UTF-8"))
+ ((@ (guile) open-output-string))))
(cond-expand-provide (current-module) '(srfi-6))
It wouldn’t completely solve the problem.
> IMO, string ports should use UTF-8 as their initial port encoding, since
> we know that UTF-8 can represent any Guile string. This will allow
> portable use of string ports.
The change was submitted and briefly discussed at
<http://thread.gmane.org/gmane.lisp.guile.devel/9822>.
I think the rationale was mostly backward compatibility (in 1.8 people
could mix Latin-1 textual and binary I/O), consistency with how other
ports behave, and the ability to change the default encoding of string
ports.
> I realize that this would change the existing behavior of programs that
> use binary I/O on string ports, but as things stand right now, portable
> SRFI-6 code is broken on Guile.
>
> What do you think?
In hindsight, UTF-8 does seem like a better default than the locale port
encoding (which is what %default-port-encoding is, by default), but it
does remain useful to specify a different encoding.
>>> What _is_ needed is a file coding declaration near the top of the source
>>> file, e.g. "coding: utf-8" (see "Character Encoding of Source Files" in
>>> the manual).
>>
>> Yes. And you actually need both–i.e., the ‘coding’ cookie won’t
>> magically make string ports use that encoding.
>>
>>> I tried that and it still fails for me.
>>
>> What fails exactly?
>
> It fails ungracefully (goes into an infinite while trying to print the
> backtrace) without the %default-port-encoding setting.
Indeed, it’s stuck in a deadlock:
--8<---------------cut here---------------start------------->8---
(gdb) bt
#0 0x00007ffff75e1204 in __lll_lock_wait () from
/nix/store/vxycd107wjbhcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0
#1 0x00007ffff75dc4d4 in _L_lock_999 () from
/nix/store/vxycd107wjbhcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0
#2 0x00007ffff75dc2ea in pthread_mutex_lock () from
/nix/store/vxycd107wjbhcj720hzkw2px7s7kr724-glibc-2.12.2/lib/libpthread.so.0
#3 0x00007ffff7b30499 in scm_dynwind_pthread_mutex_lock (mutex=0x7ffff7dd28c0)
at threads.c:1962
#4 0x00007ffff7b2bb0e in scm_mkstrport (pos=0x2, str=0x4, modes=327680,
caller=<value optimized out>) at strports.c:287
#5 0x00007ffff7aac20b in display_backtrace_body (a=0x7fffffffc1a0) at
backtrace.c:487
#6 0x00007ffff7b46c7b in vm_regular_engine (vm=0x6f61f0, program=0x7f5d50,
argv=0x6fa3b0, nargs=-1) at vm-i-system.c:895
#7 0x00007ffff7ac039e in scm_call_3 (proc=0x7f5d50, arg1=<value optimized
out>, arg2=<value optimized out>, arg3=<value optimized out>) at eval.c:500
#8 0x00007ffff7b32504 in scm_internal_catch (tag=<value optimized out>,
body=<value optimized out>, body_data=<value optimized out>, handler=<value
optimized out>, handler_data=<value optimized out>) at throw.c:222
#9 0x00007ffff7aabbba in scm_display_backtrace_with_highlights (stack=<value
optimized out>, port=<value optimized out>, first=<value optimized out>,
depth=<value optimized out>, highlights=<value optimized out>)
at backtrace.c:558
#10 0x00007ffff7ab725e in print_exception_and_backtrace (error_port=0x6f6170,
tag=0x66d4c0, args=0x8e6ea0) at continuations.c:490
#11 pre_unwind_handler (error_port=0x6f6170, tag=0x66d4c0, args=0x8e6ea0) at
continuations.c:534
#12 0x00007ffff7b46c7b in vm_regular_engine (vm=0x6f61f0, program=0x7f3ce0,
argv=0x6fa300, nargs=-1) at vm-i-system.c:895
#13 0x00007ffff7b4846e in scm_call_with_vm (vm=0x6f61f0, proc=0x7f3ce0,
args=<value optimized out>) at vm.c:878
#14 0x00007ffff7b296db in scm_to_stringn (str=0x8dba80, lenp=0x7fffffffc4e8,
encoding=<value optimized out>, handler=SCM_FAILED_CONVERSION_ERROR) at
strings.c:2102
#15 0x00007ffff7b2bb73 in scm_mkstrport (pos=0x2, str=0x8dba80, modes=196608,
caller=<value optimized out>) at strports.c:312
--8<---------------cut here---------------end--------------->8---
This could be fixed by calling ‘scm_new_port_table_entry’ after having
prepared the backing buffer, but the problem is that ‘pt->encoding’ is
needed before.
Thoughts?
Ludo’.