[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-libunistring] Changing the appearance of escapes
From: |
Ludovic Courtès |
Subject: |
Re: [bug-libunistring] Changing the appearance of escapes |
Date: |
Fri, 17 Sep 2010 19:17:17 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) |
Hi Bruno,
Bruno Haible <address@hidden> writes:
> The way I recommend to do it is:
> - For ports with an input direction, store in the port an iconv_t descriptor
> from the given encoding to UTF-8. Similarly, for ports with an output
> direction, store in it an iconv_t descriptor from UTF-8 to the encoding.
> (Why UTF-8 and not UTF-32 = UCS-4? Because on all platforms you can
> convert
> from UTF-8 to anything and vice versa, but not from UTF-32 from/to
> anything.
> Solaris for example.)
Hmm, OK. It’s actually not a problem to use UTF-8 instead of UCS-4 when
reading from an input port.
> - In the input direction you'll also need a small buffer (up to 6 bytes or
> so)
> for bytes that have already been read from the stream but not yet
> converted
> to characters. Near this, you'll also have a character or bit that is used
> to implement the CRLF -> LF conversion.
> - The most tricky thing is to handle all possible errors and return values
> from iconv() correctly.
> - In the output direction, an iconv_t can produce a couple of bytes at the
> end, that you need to output before closing the stream. This is needed for
> stateful encodings such as CP1258, UTF-7, or UTF-16 (with BOM). But only
> if you want to support stateful encodings at all. All encodings used by
> locales are stateless.
OK.
Thanks a lot for the valuable advice!
Ludo’.
pgpMjQNR9xXg4.pgp
Description: PGP signature