bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] Changing the appearance of escapes


From: Bruno Haible
Subject: Re: [bug-libunistring] Changing the appearance of escapes
Date: Thu, 16 Sep 2010 03:23:34 +0200
User-agent: KMail/1.9.9

Hello Ludo,

> Guile stores strings internally either in ISO-8859-1 (if possible) or in
> UCS-4.  When doing I/O, strings are converted from/to the input/output
> encoding using, e.g., ‘u32_conv_to_encoding’.  I/O ports have a
> conversion-failure strategy, the equivalent of ‘ilseq_handler’.

Yes, I see <http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-9.html> has
two paragraphs describing "transcoders" with an 'error-handling-mode'.
(This is, btw, similar to what GNU clisp has:
<http://clisp.cons.org/impnotes/encoding.html#make-encoding>.)

> Currently, we end up hackily parsing the result of
> ‘u32_conv_to_encoding’, looking for escape sequences (assuming the
> result of the conversion is in an ASCII-compatible encoding), and
> rewriting them.  In the worst case, a 6-character libunistring ‘\uNNNN’
> escape is converted to a 7-character R6RS ‘\uNNNN;’ escape

This is actually incorrect, since the input string might actually have
contained the characters '\\', 'u', '1', '2', '3', '4'. Once
a call to u32_conv_to_encoding is terminated, you don't know whether such
a sequence in the output comes from the input or is an escape.

> libunistring/iconv have their own syntax for escapes, which is
> different from Guile’s historical syntax, and also different from that
> used in R6RS.  The problem is, we want Guile to emit escapes in either
> of these two formats, not that of libunistring/iconv.

I think you need custom code for that. The focus of libunistring is not
on super-duper-elaborate uses of iconv(), but just enough to make iconv()
based conversion comfortable to use on average.

GNU clisp also has custom code for that. This allows to produce good
error messages and exceptions when invalid characters are present. In
guile you might also want to write an 'error-handling-mode' in Scheme.
It is also useful for performance to minimize the number of calls to
iconv_open(), that is, to use iconv_open() only once or twice per port
and not once for every chunk to be converted.

> Guile could do the escaping on its own if ‘u32_conv_to_encoding’ could
> stop conversion upon failure and return (i) whatever was converted, and
> (ii) the offset of the conversion failure (like ‘iconv’ does.)

It sounds like none of the functions u32_conv_to_encoding, mem_iconveha,
mem_iconveh, mem_cd_iconveh are useful for you, and you really need to
go down to the level of iconv().

Note that at this level, there are a couple of portability problems.
Take a look at gnulib/lib/striconv.c and gnulib/lib/striconveh.c.

Bruno



reply via email to

[Prev in Thread] Current Thread [Next in Thread]