[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: mbcel module for Gnulib?
From: |
Bruno Haible |
Subject: |
Re: mbcel module for Gnulib? |
Date: |
Thu, 13 Jul 2023 17:19:16 +0200 |
Hi Paul,
> > Candidates for optimization:
> >
> > - The C locale handling
> > https://sourceware.org/bugzilla/show_bug.cgi?id=19932
> > https://sourceware.org/bugzilla/show_bug.cgi?id=29511
> > It's now a clear POSIX violation. Would it make sense to get this fixed
> > in glibc, so that gnulib's override can be dropped on future glibc
> > versions?
>
> Absolutely. Does наб's patch in the latter bug report look good to you?
Just saw that наб is already at patch v16, on libc-alpha. I gave a bit of
feedback about it now.
Since this will be fixed in glibc in the near future, I will hold off
from optimizing the invocation of hard_locale.
> Although mbiter is faster on ASCII than on non-ASCII, it's not
> well-optimized compared to mbcel. On Fedora x86-64 here is the kernel of
> an mbcel-based loop that merely scans ASCII and adds each byte's
> numerical value to a sum:
>
> .L28: addq %rax, %rbp
> movl $1, %eax
> addq %rax, %rbx
> cmpq %r12, %rbx
> jnb .L19
> movsbq (%rbx), %rax
> testb %al, %al
> jns .L28
>
> where %rbp = sum, %rbx = pointer to next byte, and %r12 = pointer just
> past end of input.
>
> In contrast, with mbiter the kernel is:
>
> .L24: movq 136(%rsp), %rax
> movq 112(%rsp), %r14
> movq $1, 144(%rsp)
> movsbl (%rbx), %edx
> movb $1, 152(%rsp)
> leaq 1(%rax), %rbx
> movl %edx, 156(%rsp)
> movq %rbx, 136(%rsp)
> addq %rdx, %r15
> movb $0, 128(%rsp)
> cmpq %r14, %rbx
> jnb .L5
> .L14: movzbl (%rbx), %ecx
> movl %ecx, %eax
> shrb $5, %al
> andl $7, %eax
> movl is_basic_table(,%rax,4), %eax
> shrl %cl, %eax
> testb $1, %al
> jne .L24
>
> where %r15 = sum, %rbx = pointer to next byte, %r14 = pointer just past
> end of input.
Excellent result!!! This means the mbiter can/should get the following
optimizations:
- Optimize away is_basic_table; a simpler range check for [0x00..0x7F]
like in mbcel will speed this up.
- The
movb $0, 128(%rsp)
line should already be gone through my patch "Optimize away the in_shift
field" yesterday.
- There are 6 other instructions that read or write from the struct on
the stack. It seems that gcc does not optimize this as well as the
struct-as-return-value situation. I'll benchmark this again...
> > - Resetting an mbstate_t: Should we define a function
> > void mbszero (mbstate_t *);
> > that clears the relevant part of an mbstate_t (i.e. 24 bytes instead
> > of 128 bytes on BSD systems)?
> > Advantage: performance.
> > Drawback: Yet another gnulib-invented, nonstandard API.
>
> It's likely worth it for mbcel on BSDish hosts. Quite possibly it's also
> worth it for mbiter and mbuiter. Not sure it's worth it everywhere.
Good, thanks for your opinion. I'll then add an 'mbszero' function and
mark it as recommended in loops.
> Here's a summary of the results I got on Fedora 38 x86-64 on an AMD
> Phenom II X4 910e processor dated 2010.
>
> user CPU sec speedup
> mbiter mbcel factor test
> 1.735 0.478 3.630 a - ASCII text, C locale
> 1.703 0.447 3.810 b - ASCII text, UTF-8 locale
> 3.852 1.514 2.544 c - French text, C locale
> 3.544 1.600 2.215 d - French text, ISO-8859-1 locale
> 3.651 1.662 2.197 e - French text, UTF-8 locale
> 26.787 15.115 1.772 f - Greek text, C locale
> 21.651 17.106 1.266 g - Greek text, ISO-8859-7 locale
> 22.565 17.633 1.280 h - Greek text, UTF-8 locale
> 10.011 8.051 1.243 i - Chinese text, UTF-8 locale
> 9.787 7.967 1.228 j - Chinese text, GB18030 locale
>
> With a better CPU (a Xeon W-1350 dated 2021) and a slightly-slower OS
> (Ubuntu 23.04) I got these numbers:
>
> user CPU sec speedup
> mbiter mbcel factor test
> 0.531 0.238 2.231 a - ASCII text, C locale
> 0.478 0.187 2.556 b - ASCII text, UTF-8 locale
> 1.262 0.510 2.475 c - French text, C locale
> 1.121 0.529 2.119 d - French text, ISO-8859-1 locale
> 1.080 0.571 1.891 e - French text, UTF-8 locale
> 10.349 5.876 1.761 f - Greek text, C locale
> 8.530 6.537 1.305 g - Greek text, ISO-8859-7 locale
> 8.407 6.506 1.292 h - Greek text, UTF-8 locale
> 3.427 2.578 1.329 i - Chinese text, UTF-8 locale
> 3.279 2.489 1.317 j - Chinese text, GB18030 locale
Impressive! I'll repeat these benchmarks, after having optimized mbiter
a bit more.
Bruno
- Re: From wchar_t to char32_t, (continued)
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/03
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/03
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/04
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/04
- Re: From wchar_t to char32_t, Bruno Haible, 2023/07/06
- Re: From wchar_t to char32_t, Paul Eggert, 2023/07/06
- mbcel module for Gnulib?, Paul Eggert, 2023/07/09
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/11
- Re: mbcel module for Gnulib?, Paul Eggert, 2023/07/12
- Re: mbcel module for Gnulib?,
Bruno Haible <=
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/16
- Re: mbcel module for Gnulib?, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/16
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/17
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/20
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/21
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Bruno Haible, 2023/07/24
- Re: mbcel module for Gnulib?, incomplete multibyte sequences, Paul Eggert, 2023/07/25