bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] Changing the appearance of escapes


From: Ludovic Courtès
Subject: Re: [bug-libunistring] Changing the appearance of escapes
Date: Fri, 17 Sep 2010 19:17:17 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

Hi Bruno,

Bruno Haible <address@hidden> writes:

> The way I recommend to do it is:
>   - For ports with an input direction, store in the port an iconv_t descriptor
>     from the given encoding to UTF-8. Similarly, for ports with an output
>     direction, store in it an iconv_t descriptor from UTF-8 to the encoding.
>     (Why UTF-8 and not UTF-32 = UCS-4? Because on all platforms you can 
> convert
>     from UTF-8 to anything and vice versa, but not from UTF-32 from/to 
> anything.
>     Solaris for example.)

Hmm, OK.  It’s actually not a problem to use UTF-8 instead of UCS-4 when
reading from an input port.

>   - In the input direction you'll also need a small buffer (up to 6 bytes or 
> so)
>     for bytes that have already been read from the stream but not yet 
> converted
>     to characters. Near this, you'll also have a character or bit that is used
>     to implement the CRLF -> LF conversion.
>   - The most tricky thing is to handle all possible errors and return values
>     from iconv() correctly.
>   - In the output direction, an iconv_t can produce a couple of bytes at the
>     end, that you need to output before closing the stream. This is needed for
>     stateful encodings such as CP1258, UTF-7, or UTF-16 (with BOM). But only
>     if you want to support stateful encodings at all. All encodings used by
>     locales are stateless.

OK.

Thanks a lot for the valuable advice!

Ludo’.

Attachment: pgpMjQNR9xXg4.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]