Re: mbcel module for Gnulib?

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mbcel module for Gnulib?

From:	Bruno Haible
Subject:	Re: mbcel module for Gnulib?
Date:	Thu, 13 Jul 2023 17:19:16 +0200

Hi Paul,

> > Candidates for optimization:
> > 
> > - The C locale handling
> >    https://sourceware.org/bugzilla/show_bug.cgi?id=19932
> >    https://sourceware.org/bugzilla/show_bug.cgi?id=29511
> >    It's now a clear POSIX violation. Would it make sense to get this fixed
> >    in glibc, so that gnulib's override can be dropped on future glibc
> >    versions?
> 
> Absolutely. Does наб's patch in the latter bug report look good to you? 

Just saw that наб is already at patch v16, on libc-alpha. I gave a bit of
feedback about it now.

Since this will be fixed in glibc in the near future, I will hold off
from optimizing the invocation of hard_locale.

> Although mbiter is faster on ASCII than on non-ASCII, it's not 
> well-optimized compared to mbcel. On Fedora x86-64 here is the kernel of 
> an mbcel-based loop that merely scans ASCII and adds each byte's 
> numerical value to a sum:
> 
>   .L28:       addq    %rax, %rbp
>       movl    $1, %eax
>       addq    %rax, %rbx
>       cmpq    %r12, %rbx
>       jnb     .L19
>       movsbq  (%rbx), %rax
>       testb   %al, %al
>       jns     .L28
> 
> where %rbp = sum, %rbx = pointer to next byte, and %r12 = pointer just 
> past end of input.
> 
> In contrast, with mbiter the kernel is:
> 
>   .L24:       movq    136(%rsp), %rax
>       movq    112(%rsp), %r14
>       movq    $1, 144(%rsp)
>       movsbl  (%rbx), %edx
>       movb    $1, 152(%rsp)
>       leaq    1(%rax), %rbx
>       movl    %edx, 156(%rsp)
>       movq    %rbx, 136(%rsp)
>       addq    %rdx, %r15
>       movb    $0, 128(%rsp)
>       cmpq    %r14, %rbx
>       jnb     .L5
>   .L14:       movzbl  (%rbx), %ecx
>       movl    %ecx, %eax
>       shrb    $5, %al
>       andl    $7, %eax
>       movl    is_basic_table(,%rax,4), %eax
>       shrl    %cl, %eax
>       testb   $1, %al
>       jne     .L24
> 
> where %r15 = sum, %rbx = pointer to next byte, %r14 = pointer just past 
> end of input.

Excellent result!!! This means the mbiter can/should get the following
optimizations:

  - Optimize away is_basic_table; a simpler range check for [0x00..0x7F]
    like in mbcel will speed this up.

  - The
      movb      $0, 128(%rsp)
    line should already be gone through my patch "Optimize away the in_shift
    field" yesterday.

  - There are 6 other instructions that read or write from the struct on
    the stack. It seems that gcc does not optimize this as well as the
    struct-as-return-value situation. I'll benchmark this again...

> > - Resetting an mbstate_t: Should we define a function
> >       void mbszero (mbstate_t *);
> >    that clears the relevant part of an mbstate_t (i.e. 24 bytes instead
> >    of 128 bytes on BSD systems)?
> >    Advantage: performance.
> >    Drawback: Yet another gnulib-invented, nonstandard API.
> 
> It's likely worth it for mbcel on BSDish hosts. Quite possibly it's also 
> worth it for mbiter and mbuiter. Not sure it's worth it everywhere.

Good, thanks for your opinion. I'll then add an 'mbszero' function and
mark it as recommended in loops.
 
> Here's a summary of the results I got on Fedora 38 x86-64 on an AMD 
> Phenom II X4 910e processor dated 2010.
> 
>   user CPU sec   speedup
>   mbiter  mbcel  factor  test
>    1.735  0.478  3.630   a - ASCII text, C locale
>    1.703  0.447  3.810   b - ASCII text, UTF-8 locale
>    3.852  1.514  2.544   c - French text, C locale
>    3.544  1.600  2.215   d - French text, ISO-8859-1 locale
>    3.651  1.662  2.197   e - French text, UTF-8 locale
>   26.787 15.115  1.772   f - Greek text, C locale
>   21.651 17.106  1.266   g - Greek text, ISO-8859-7 locale
>   22.565 17.633  1.280   h - Greek text, UTF-8 locale
>   10.011  8.051  1.243   i - Chinese text, UTF-8 locale
>    9.787  7.967  1.228   j - Chinese text, GB18030 locale
> 
> With a better CPU (a Xeon W-1350 dated 2021) and a slightly-slower OS 
> (Ubuntu 23.04) I got these numbers:
> 
>   user CPU sec   speedup
>   mbiter  mbcel  factor  test
>    0.531  0.238  2.231   a - ASCII text, C locale
>    0.478  0.187  2.556   b - ASCII text, UTF-8 locale
>    1.262  0.510  2.475   c - French text, C locale
>    1.121  0.529  2.119   d - French text, ISO-8859-1 locale
>    1.080  0.571  1.891   e - French text, UTF-8 locale
>   10.349  5.876  1.761   f - Greek text, C locale
>    8.530  6.537  1.305   g - Greek text, ISO-8859-7 locale
>    8.407  6.506  1.292   h - Greek text, UTF-8 locale
>    3.427  2.578  1.329   i - Chinese text, UTF-8 locale
>    3.279  2.489  1.317   j - Chinese text, GB18030 locale

Impressive! I'll repeat these benchmarks, after having optimized mbiter
a bit more.

Bruno

[Prev in Thread]

Current Thread

[Next in Thread]

Re: From wchar_t to char32_t, (continued)

Prev by Date: Re: From wchar_t to char32_t
Next by Date: not bug, but issue
Previous by thread: Re: mbcel module for Gnulib?
Next by thread: Re: mbcel module for Gnulib?
Index(es):
- Date
- Thread