help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RE : Preliminary version of new regex matcher for gawk now availabl


From: arnold
Subject: Re: RE : Preliminary version of new regex matcher for gawk now available
Date: Mon, 29 Jul 2024 23:57:29 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Thanks for the report.

"Jason C. Kwan" <jasonckwan@yahoo.com> wrote:

> I've test compiled gawk-with-minrx via clang and clang++ on Apple M1.
> Compilation log attached. It mostly works fine, but to my best knowledge, the 
> BAU regex engine in gawk performed correctly for this regex, while minRX was 
> incorrect :
> This test string has 31 valid UTF-8 code points spanning a total of 39 bytes, 
> as confirmed by gnu-wc :
> === === ===  === === === === === ===
>  printf '%s' 
> $'ab^c_1[23]*_\305\224\666~+xy$z!{3,4}:\341\204\217&\365\360\237\244\241%=' | 
> gwc -lcm
>       0      31      39
> === === ===  === === === === === ===
>
> This regex in question is as plain as it gets - isolate all locale-valid 
> characters individually with square brackets. 
> The test input was intentionally crafted with 1 overlong sequence for 2-byte 
> UTF-8, plus a UTF-8 invalid byte, so those shouldn't be matched against 
> vanilla dot ( /./ ).
>
>
>
>     for __ in ~/desktop/gawk_minrx/gawk/gawk '/opt/homebrew/bin/gawk'; do
>         printf '%s' 
> $'ab^c_1[23]*_\305\224\666~+xy$z!{3,4}:\341\204\217&\365\360\237\244\241%=' |
>         $( printf '%s' "$__" ) '{ print gsub(/./, "[&]"); print }'
>         $( printf '%s' "$__" ) -V | ghead -n 2
>         echo "\n\n\t$__\n"
>     done
>     echo    uname -a
> 33[a][b][^][c][_][1][[][2][3][]][*][_][Ŕ][?][~][+][x][y][$][z][!][{][3][,][4][}][:][ᄏ][&][?][🤡][%][=]GNU
>  Awk 5.3.60-minrx, API 4.0Copyright (C) 1989, 1991-2024 Free Software 
> Foundation.
>  ~/desktop/gawk_minrx/gawk/gawk
> 31[a][b][^][c][_][1][[][2][3][]][*][_][Ŕ]?[~][+][x][y][$][z][!][{][3][,][4][}][:][ᄏ][&]?[🤡][%][=]GNU
>  Awk 5.3.0, API 4.0, (GNU MPFR 4.2.1, GNU MP 6.3.0)Copyright (C) 1989, 
> 1991-2023 Free Software Foundation.
>  /opt/homebrew/bin/gawk
> Darwin m1mx4CT.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May  1 20:12:58 
> PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 arm64



reply via email to

[Prev in Thread] Current Thread [Next in Thread]