bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26779: expr length


From: Pádraig Brady
Subject: bug#26779: expr length
Date: Tue, 9 May 2017 05:31:03 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 09/05/17 00:04, Assaf Gordon wrote:
> Hello,
> 
>> On May 4, 2017, at 11:43, Pádraig Brady <address@hidden> wrote:
>>
>> On 04/05/17 02:59, Андрей Воронов wrote:
>>> I have the bug in expr utility when it perform operation of the 
>>> calculating length of the string in my multi-byte encoding ru_RU.UTF-8.
>>
>> expr is listed in the plan here:
>> http://www.pixelbeat.org/docs/coreutils_i18n/
> 
> Attached a draft patch implementing multibyte support for 'expr'
> (it doesn't need any code from my previous multibyte stuff, so I'm sending it 
> separately).
> 
> Specifically, the length/index/substr operators are adjusted.
> The regex engine for the 'match' operator already supported multibyte 
> characters (only minor adjustment needed to return matched character count 
> instead of matched byte count).
> The string comparison already used 'strcoll' so I assumed they work with 
> multibyte strings.

Definitely needs a NEWS entry
and mention of this bug in the commit message.

Perf isn't a huge concern for expr use cases,
so I'd rather not address in this patch if at all,
but for future reference it might be nice to pass in
to mbschr() etc. whether the current locale is UTF8.
Maybe some global similar to MB_CUR_MAX.  Then mbschr()
could be optimized in the UTF8 case as per pseudo code at:
http://www.pixelbeat.org/docs/utf8_programming.html

s/sequnce/sequence/

I notice expr isn't handled in the rhat/suse i18n patch,
so there is nothing to consider there.

Excellent work on the tests.

I've updated status in http://www.pixelbeat.org/docs/coreutils_i18n/
as this is ready to land I think.

thanks!
Pádraig





reply via email to

[Prev in Thread] Current Thread [Next in Thread]