[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH, resend] Handle multibyte codepoint width properly
From: |
Vladimir 'φ-coder/phcoder' Serbinenko |
Subject: |
Re: [PATCH, resend] Handle multibyte codepoint width properly |
Date: |
Thu, 05 Apr 2012 21:48:03 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120329 Icedove/10.0.3 |
On 05.04.2012 21:18, Bruno Haible wrote:
> Hi Vladimir,
> mbsnwidth returns -1 in such a case only if the option MBSW_REJECT_INVALID
> is passed as third argument. If you pass 0, mbsnwidth will not return -1;
> instead, it will assume width 1 for every invalid byte or unprintable
> character.
Ok, will use mbsnwidth instead then.
>>> - The function __argp_get_display_len looks very similar to mbsnwidth(),
>> Remaining is the issue due to escape sequences.
> What is the use case? PO file editors are not required to support editing
> of strings with control characters. msgfmt warns when a message in a PO file
> contains an unusual control character like ESC.
Unfortunately it doesn't do so enough (see my post on bug-gettext, still
got no answer to it). In particular it accepts the file cpio/ko.po from
TP with no warnings despite it containing loads of \e. Some other
control characters like \b are ignored as well. (the file in reality
uses unsupported ISO-2022 variant, an encoding using many escapes and
not EUC-KR as it claims)
>> it is used in address@hidden
> Ah, right. But I don't know how frequently it is used; maybe I and Simon
> were the only persons to ever use this? If we want to support this, not
> only mbswidth has to be modified, but basically any code that uses
> wcwidth - including libunistring. So, until this is discussed (and possibly
> generalized to more languages than 'en'), I propose to get away without
> it.
Ok. In long term I see only 2 possible ways: deprecate address@hidden or
fix all those places. I don't care if boldquot gets deprecated.
>> Done but the test is valid only for UTF-8 locales. Should I force some
>> specific locale? It's impossible to make a test working in all locales
>> since in case of e.g. ASCII we don't have such characters at all.
> In such a situation, it is best to split the test into two parts: a part
> that can be executed on every machine, and a part which can only be executed
> on a system with a UTF-8 locale. This way, the first part is not skipped
> just because the system has no UTF-8 locale.
Ok, will do. Can I include all the "normal" test in UTF-8 test for
simplicity?
> Please take a look how it's done in module 'mbsstr-tests':
> - test-mbsstr1.c is a test that doesn't need a particular locale.
> - test-mbsstr2.c is a test that requires a UTF-8 locale. We use the
> French one for simplicity. (If a system does not have fr_FR.UTF-8
> installed, it would be unlikely that it has ru_RU.UTF-8 installed.)
> - test-mbsstr2.sh is a wrapper script that uses the LOCALE_FR_UTF8
> value, determined by m4/locale-fr.m4, and invokes test-mbsstr2.
Ok.
> + if (wc == '\e' && ptr + 3 < end
> + && ptr[1] == '[' && (ptr[2] == '0' || ptr[2] == '1')
> + && ptr[3] == 'm')
> '\e' is not portable, only GCC supports it. Use '\x1b' or '\033' instead.
>
> Also, the test ptr + 3 < end is wrong. Should be written as
> end - ptr > 3
> instead. (Think of ptr = 0xFFFFFFD, end = 0xFFFFFFFE on a 32-bit machine.)
> Sure, on many systems this won't matter, because this memory range is
> either unmapped or occupied by the stack. But in general you have no guarantee
> that the memory page from 0xFFFFC000..0xFFFFFFFF will not be used for
> malloc().
I have already been bitten by this once on sparc64 with GRUB :(
> Bruno
>
>
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
signature.asc
Description: OpenPGP digital signature
- [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/03
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Bruno Haible, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Bruno Haible, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly,
Vladimir 'φ-coder/phcoder' Serbinenko <=
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Bruno Haible, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/20
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/05
- Re: [PATCH, resend] Handle multibyte codepoint width properly, Vladimir 'φ-coder/phcoder' Serbinenko, 2012/04/05