[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RE : Preliminary version of new regex matcher for gawk now availabl
From: |
arnold |
Subject: |
Re: RE : Preliminary version of new regex matcher for gawk now available |
Date: |
Mon, 29 Jul 2024 23:57:29 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
Thanks for the report.
"Jason C. Kwan" <jasonckwan@yahoo.com> wrote:
> I've test compiled gawk-with-minrx via clang and clang++ on Apple M1.
> Compilation log attached. It mostly works fine, but to my best knowledge, the
> BAU regex engine in gawk performed correctly for this regex, while minRX was
> incorrect :
> This test string has 31 valid UTF-8 code points spanning a total of 39 bytes,
> as confirmed by gnu-wc :
> === === === === === === === === ===
> printf '%s'
> $'ab^c_1[23]*_\305\224\666~+xy$z!{3,4}:\341\204\217&\365\360\237\244\241%=' |
> gwc -lcm
> 0 31 39
> === === === === === === === === ===
>
> This regex in question is as plain as it gets - isolate all locale-valid
> characters individually with square brackets.
> The test input was intentionally crafted with 1 overlong sequence for 2-byte
> UTF-8, plus a UTF-8 invalid byte, so those shouldn't be matched against
> vanilla dot ( /./ ).
>
>
>
> for __ in ~/desktop/gawk_minrx/gawk/gawk '/opt/homebrew/bin/gawk'; do
> printf '%s'
> $'ab^c_1[23]*_\305\224\666~+xy$z!{3,4}:\341\204\217&\365\360\237\244\241%=' |
> $( printf '%s' "$__" ) '{ print gsub(/./, "[&]"); print }'
> $( printf '%s' "$__" ) -V | ghead -n 2
> echo "\n\n\t$__\n"
> done
> echo uname -a
> 33[a][b][^][c][_][1][[][2][3][]][*][_][Ŕ][?][~][+][x][y][$][z][!][{][3][,][4][}][:][ᄏ][&][?][🤡][%][=]GNU
> Awk 5.3.60-minrx, API 4.0Copyright (C) 1989, 1991-2024 Free Software
> Foundation.
> ~/desktop/gawk_minrx/gawk/gawk
> 31[a][b][^][c][_][1][[][2][3][]][*][_][Ŕ]?[~][+][x][y][$][z][!][{][3][,][4][}][:][ᄏ][&]?[🤡][%][=]GNU
> Awk 5.3.0, API 4.0, (GNU MPFR 4.2.1, GNU MP 6.3.0)Copyright (C) 1989,
> 1991-2023 Free Software Foundation.
> /opt/homebrew/bin/gawk
> Darwin m1mx4CT.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:12:58
> PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: RE : Preliminary version of new regex matcher for gawk now available,
arnold <=