bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] Changing the appearance of escapes


From: Ludovic Courtès
Subject: Re: [bug-libunistring] Changing the appearance of escapes
Date: Thu, 16 Sep 2010 22:18:50 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Hi Bruno,

Bruno Haible <address@hidden> writes:

>> Guile stores strings internally either in ISO-8859-1 (if possible) or in
>> UCS-4.  When doing I/O, strings are converted from/to the input/output
>> encoding using, e.g., ‘u32_conv_to_encoding’.  I/O ports have a
>> conversion-failure strategy, the equivalent of ‘ilseq_handler’.
>
> Yes, I see <http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-9.html> has
> two paragraphs describing "transcoders" with an 'error-handling-mode'.
> (This is, btw, similar to what GNU clisp has:
> <http://clisp.cons.org/impnotes/encoding.html#make-encoding>.)

Yes.

>> Currently, we end up hackily parsing the result of
>> ‘u32_conv_to_encoding’, looking for escape sequences (assuming the
>> result of the conversion is in an ASCII-compatible encoding), and
>> rewriting them.  In the worst case, a 6-character libunistring ‘\uNNNN’
>> escape is converted to a 7-character R6RS ‘\uNNNN;’ escape
>
> This is actually incorrect, since the input string might actually have
> contained the characters '\\', 'u', '1', '2', '3', '4'. Once
> a call to u32_conv_to_encoding is terminated, you don't know whether such
> a sequence in the output comes from the input or is an escape.

Heh, indeed.

>> libunistring/iconv have their own syntax for escapes, which is
>> different from Guile’s historical syntax, and also different from that
>> used in R6RS.  The problem is, we want Guile to emit escapes in either
>> of these two formats, not that of libunistring/iconv.
>
> I think you need custom code for that. The focus of libunistring is not
> on super-duper-elaborate uses of iconv(), but just enough to make iconv()
> based conversion comfortable to use on average.
>
> GNU clisp also has custom code for that. This allows to produce good
> error messages and exceptions when invalid characters are present. In
> guile you might also want to write an 'error-handling-mode' in Scheme.
> It is also useful for performance to minimize the number of calls to
> iconv_open(), that is, to use iconv_open() only once or twice per port
> and not once for every chunk to be converted.
>
>> Guile could do the escaping on its own if ‘u32_conv_to_encoding’ could
>> stop conversion upon failure and return (i) whatever was converted, and
>> (ii) the offset of the conversion failure (like ‘iconv’ does.)
>
> It sounds like none of the functions u32_conv_to_encoding, mem_iconveha,
> mem_iconveh, mem_cd_iconveh are useful for you, and you really need to
> go down to the level of iconv().

OK, makes sense.  I must admit I had completely overlooked this problem
until now.  :-(

> Note that at this level, there are a couple of portability problems.
> Take a look at gnulib/lib/striconv.c and gnulib/lib/striconveh.c.

Thanks for the tips!

Now to actually design and implement something along these lines...

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]