[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug in gawk 3.1.0 regex code
From: |
Aharon Robbins |
Subject: |
Re: bug in gawk 3.1.0 regex code |
Date: |
Sun, 4 Aug 2002 10:16:09 +0300 |
Greetings. Re this, posted in May:
> From: address@hidden
> To: address@hidden
> Date: Fri, 10 May 2002 03:38:42 GMT+01:00
> Subject: bug in gawk 3.1.0 regex code
>
> I believe I've just found a bug in gawk3.1.0 implementation of
> extended regular expressions. It seems to be down to the alternation
> operator; when using an end anchor '$' as a subexpression in an
> alternation and the entire matched RE is a nul-string it fails
> to match the end of string, for example;
>
> gsub(/$|2/,"x")
> print
>
> input = 12345
> expected output = 1x345x
> actual output = 1x345
>
> The start anchor '^' always works as expected;
>
> gsub(/^|2/,"x")
> print
>
> input = 12345
> expected output = x1x345
> actual output = x1x345
>
> This was with POSIX compliance enabled althought that doesn't
> effect the result.
>
> I checked on gawk3.0.6 and got exactly the same results however
> gawk2.15.6 gives the expected results.
> [....]
I'm sorry it's taken so long to post an official reply. I
wanted to test all the various things that had been posted.
This is a bug in the implementation of gsub, and not in the
actual regex routines. The patch below fixes the problem.
By the way, re this:
> From: address@hidden (laura fairhead)
> Newsgroups: comp.lang.awk
> Subject: Re: bug in gawk3.1.0 regex code
> Date: Fri, 10 May 2002 02:09:44 GMT
>
> I'm also investigating another possible problem with matching nul strings;
>
> input = 12345
> gsub(/2|/,"x")
> output = x1x3x4x5x
> expected = x1xx3x4x5x
> [....]
Your `expected' is incorrect. Matched text is always as long as
possible. Thus, given a choice between the empty string and the
non-empty "2", it chooses the "2".
Thanks for finding this bug, and here's the patch.
Arnold
------------------- cut here ------------------------
*** ../gawk-3.1.1/builtin.c Tue Apr 16 04:40:31 2002
--- builtin.c Wed May 15 06:04:58 2002
***************
*** 1969,1977 ****
/*
* If the current match matched the null string,
* and the last match didn't and did a replacement,
! * then skip this one.
*/
! if (lastmatchnonzero && matchstart == matchend) {
lastmatchnonzero = FALSE;
matches--;
goto empty;
--- 1968,1980 ----
/*
* If the current match matched the null string,
* and the last match didn't and did a replacement,
! * and the match of the null string is at the front of
! * the text (meaning right after end of the previous
! * replacement), then skip this one.
*/
! if (matchstart == matchend
! && lastmatchnonzero
! && matchstart == text) {
lastmatchnonzero = FALSE;
matches--;
goto empty;
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: bug in gawk 3.1.0 regex code,
Aharon Robbins <=