help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Shift and rotate in gawk


From: Andrew J. Schorr
Subject: Re: Shift and rotate in gawk
Date: Mon, 29 Apr 2024 17:06:51 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

I've got a pretty basic question about this. Unless I'm missing something,
your script below does not actually access $0 after manipulating the
various fields contained in the record. If there's no need to access $0,
then why would you bother to change the values of $i at all instead of
just manipulating the values in a separate array? If you use a normal
array instead of $i, then there's no concern about reparsing or rebuilding
or any of that.

In other words, why not do this?

{
   f[NF] = $1
   for (i = 2; i <= NF; i++)
      f[i-1] = $i
}

Now you've got the rotated fields in the f array with no
need to worry about FS or OFS or anything else.

Or to rotate into a single string:

{
   split($0, f, FS, s)
   x = (substr($0, length(s[0])+length(f[1])+length(s[1])+1) s[1] f[1])
   # or stuff it back into $0 if you need it there for some reason
   print x
}

Or:

{
   x = ""
   # N.B. you can use FS if you prefer
   for (i = 2; i <= NF; i++)
      x = (x $i OFS)
   x = (x $1)
   # or stuff it back into $0 if you need it there for some reason
   print x
}

Regards,
Andy

On Sun, Apr 28, 2024 at 08:11:46PM -0700, hackerb9@member.fsf.org wrote:
> Hi folks,
> 
> Can someone help me either prove that my gawk code is correct or explain
> why it is not? I’ve been emailing back and forth with a highly respected
> expert who says it is wrong and has not believed my gawk output, my timing
> tests, or even my analysis of the gawk source code.
> 
> *The problem*: Shift the fields to the left by one and rotate $1 to $NF.
> 
> *My solution*: $(NF+1) = $1; for (i=1; i<NF; i++) $i=$(i+1); NF--
> 
> 
>    Click to see a full script that is easy to test
> 
>    #!/bin/bash
>    # rotate.awk v1
>      -*- awk -*-
>    # Input: 1_2_3_4_..._999999_1000000
>    # e.g., awk -vn=1E6 'BEGIN { OFS="_"; while (++i<=n) { $i=i }; print; exit 
> }'
> 
>    if [[ $# == 0 ]]; then cat /dev/stdin; else echo "$@"; fi |
>          ${AWK:-gawk} -F_ '
>          BEGIN { print ARGV[0] }
>    {
>        FS="_";
>        OFS="XXX";
>        $3 = "_3a_3b_3c_";
>        print "Modifying $3 to match FS (_) to test replacement with OFS (XXX)"
> 
>        print "NF is " NF
>        for (i=1; i<=3; i++)
>            print i": "$i
> 
>        $(NF+1) = $1; # Rotate, comment out this line to discard $1
>        for (i=1; i<NF; i++) $i=$(i+1)
>        NF--;
> 
>        print ""
> 
>        print "NF is " NF
>        for (i=1; i<=3; i++)
>            print i": "$i
> 
>        for (i=NF-2; i<=NF; i++)
>            print i": "$i
>    }
>    '
> 
>    Example output from /bin/time ./rotate.awk < numbers.1E6, where the file
>    numbers.1E6 was created using, awk -vn=1E6 'BEGIN { OFS="_"; while
>    (++i<=n) { $i=i }; print; exit }' > numbers.1E6.
> 
>    gawk
>    Modifying $3 to match FS (_) to test replacement with OFS (XXX)
>    NF is 1000000
>    1: 1
>    2: 2
>    3: _3a_3b_3c_
> 
>    NF is 1000000
>    1: 2
>    2: _3a_3b_3c_
>    3: 4
>    999998: 999999
>    999999: 1000000
>    1000000: 1
> 
>    0.20user 0.06system 0:00.26elapsed 101%CPU (0avgtext+0avgdata
> 168380maxresident)k
>    0inputs+0outputs (0major+39748minor)pagefaults 0swaps
> 
> 
> 
> *The expert’s response*:
> 
>    1.
> 
>    It will replace all strings that match FS with the value of OFS.
>    2.
> 
>    It will reconstruct $0 NF times for every line so it’ll be slow.
>    3.
> 
>    a. Modifying a field causes awk to reconstruct $0 replacing every FS
>    with OFS.
>    b. For example, echo '1 2 3 4 5' | awk '{$(NF+1)=$1; for (i=1;i<NF;i++)
>    { OFS="<"i">"; $i=$(i+1); print }; NF--; print }'
> 
> *My current belief*:
> 
> Of course, I could be wrong, but I currently believe the expert is
> mistaken. #1 is easily testable as is #2. #3a, if it ever was true, has not
> been true in decades: setting a field merely sets a flag that $0 needs to
> be rebuilt. #3b is incorrect because there are two cases in gawk
> <https://git.savannah.gnu.org/gitweb/?p=gawk.git&a=search&h=HEAD&st=grep&s=rebuild_record>
> where $0 is rebuilt: when OFS is set and when $0 is read, both of which the
> expert’s example does; that is, it introduces the very problem it is
> supposed to be detecting.
> 
> *What now?*
> 
> I have gone to lengths to find flaws in my code. I don’t want to overwhelm
> people with details that they may not even be interested in. I think
> everyone can see for themselves that it works correctly, but is there
> something more I could do to demonstrate that? Should I bother explaining
> in detail the flaws in the expert's test code? Do people want to see timing
> tests showing that this method is not slow? Would it help to demonstrate
> that it is as fast or faster than the commonly seen (and incorrect)
> methods, such as k=$1; $1=""; $0 = $0 k? Even though this is a gawk
> specific question, should I show that my code works even on the oldest
> versions of AWK still in use (e.g., MacOS’s 2007 version of Brian
> Kernighan’s *One True Awk*)? Would pointing to how $0 is rebuilt in the
> gawk source code be useful? Do people want a patch to the current gawk git
> which outputs how many times $0 has been rebuilt when it exits?
> 
> What am I missing and what more can I do?
> 
> Thank you,
> 
> —b9

-- 
Andrew Schorr                      e-mail: aschorr@telemetry-investments.com
Telemetry Investments, L.L.C.      phone:  917-305-1748
147 W 35th St, Ste 1106
New York, NY 10001-2140



reply via email to

[Prev in Thread] Current Thread [Next in Thread]