[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Shift and rotate in gawk
From: |
Andrew J. Schorr |
Subject: |
Re: Shift and rotate in gawk |
Date: |
Mon, 29 Apr 2024 17:06:51 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Hi,
I've got a pretty basic question about this. Unless I'm missing something,
your script below does not actually access $0 after manipulating the
various fields contained in the record. If there's no need to access $0,
then why would you bother to change the values of $i at all instead of
just manipulating the values in a separate array? If you use a normal
array instead of $i, then there's no concern about reparsing or rebuilding
or any of that.
In other words, why not do this?
{
f[NF] = $1
for (i = 2; i <= NF; i++)
f[i-1] = $i
}
Now you've got the rotated fields in the f array with no
need to worry about FS or OFS or anything else.
Or to rotate into a single string:
{
split($0, f, FS, s)
x = (substr($0, length(s[0])+length(f[1])+length(s[1])+1) s[1] f[1])
# or stuff it back into $0 if you need it there for some reason
print x
}
Or:
{
x = ""
# N.B. you can use FS if you prefer
for (i = 2; i <= NF; i++)
x = (x $i OFS)
x = (x $1)
# or stuff it back into $0 if you need it there for some reason
print x
}
Regards,
Andy
On Sun, Apr 28, 2024 at 08:11:46PM -0700, hackerb9@member.fsf.org wrote:
> Hi folks,
>
> Can someone help me either prove that my gawk code is correct or explain
> why it is not? I’ve been emailing back and forth with a highly respected
> expert who says it is wrong and has not believed my gawk output, my timing
> tests, or even my analysis of the gawk source code.
>
> *The problem*: Shift the fields to the left by one and rotate $1 to $NF.
>
> *My solution*: $(NF+1) = $1; for (i=1; i<NF; i++) $i=$(i+1); NF--
>
>
> Click to see a full script that is easy to test
>
> #!/bin/bash
> # rotate.awk v1
> -*- awk -*-
> # Input: 1_2_3_4_..._999999_1000000
> # e.g., awk -vn=1E6 'BEGIN { OFS="_"; while (++i<=n) { $i=i }; print; exit
> }'
>
> if [[ $# == 0 ]]; then cat /dev/stdin; else echo "$@"; fi |
> ${AWK:-gawk} -F_ '
> BEGIN { print ARGV[0] }
> {
> FS="_";
> OFS="XXX";
> $3 = "_3a_3b_3c_";
> print "Modifying $3 to match FS (_) to test replacement with OFS (XXX)"
>
> print "NF is " NF
> for (i=1; i<=3; i++)
> print i": "$i
>
> $(NF+1) = $1; # Rotate, comment out this line to discard $1
> for (i=1; i<NF; i++) $i=$(i+1)
> NF--;
>
> print ""
>
> print "NF is " NF
> for (i=1; i<=3; i++)
> print i": "$i
>
> for (i=NF-2; i<=NF; i++)
> print i": "$i
> }
> '
>
> Example output from /bin/time ./rotate.awk < numbers.1E6, where the file
> numbers.1E6 was created using, awk -vn=1E6 'BEGIN { OFS="_"; while
> (++i<=n) { $i=i }; print; exit }' > numbers.1E6.
>
> gawk
> Modifying $3 to match FS (_) to test replacement with OFS (XXX)
> NF is 1000000
> 1: 1
> 2: 2
> 3: _3a_3b_3c_
>
> NF is 1000000
> 1: 2
> 2: _3a_3b_3c_
> 3: 4
> 999998: 999999
> 999999: 1000000
> 1000000: 1
>
> 0.20user 0.06system 0:00.26elapsed 101%CPU (0avgtext+0avgdata
> 168380maxresident)k
> 0inputs+0outputs (0major+39748minor)pagefaults 0swaps
>
>
>
> *The expert’s response*:
>
> 1.
>
> It will replace all strings that match FS with the value of OFS.
> 2.
>
> It will reconstruct $0 NF times for every line so it’ll be slow.
> 3.
>
> a. Modifying a field causes awk to reconstruct $0 replacing every FS
> with OFS.
> b. For example, echo '1 2 3 4 5' | awk '{$(NF+1)=$1; for (i=1;i<NF;i++)
> { OFS="<"i">"; $i=$(i+1); print }; NF--; print }'
>
> *My current belief*:
>
> Of course, I could be wrong, but I currently believe the expert is
> mistaken. #1 is easily testable as is #2. #3a, if it ever was true, has not
> been true in decades: setting a field merely sets a flag that $0 needs to
> be rebuilt. #3b is incorrect because there are two cases in gawk
> <https://git.savannah.gnu.org/gitweb/?p=gawk.git&a=search&h=HEAD&st=grep&s=rebuild_record>
> where $0 is rebuilt: when OFS is set and when $0 is read, both of which the
> expert’s example does; that is, it introduces the very problem it is
> supposed to be detecting.
>
> *What now?*
>
> I have gone to lengths to find flaws in my code. I don’t want to overwhelm
> people with details that they may not even be interested in. I think
> everyone can see for themselves that it works correctly, but is there
> something more I could do to demonstrate that? Should I bother explaining
> in detail the flaws in the expert's test code? Do people want to see timing
> tests showing that this method is not slow? Would it help to demonstrate
> that it is as fast or faster than the commonly seen (and incorrect)
> methods, such as k=$1; $1=""; $0 = $0 k? Even though this is a gawk
> specific question, should I show that my code works even on the oldest
> versions of AWK still in use (e.g., MacOS’s 2007 version of Brian
> Kernighan’s *One True Awk*)? Would pointing to how $0 is rebuilt in the
> gawk source code be useful? Do people want a patch to the current gawk git
> which outputs how many times $0 has been rebuilt when it exits?
>
> What am I missing and what more can I do?
>
> Thank you,
>
> —b9
--
Andrew Schorr e-mail: aschorr@telemetry-investments.com
Telemetry Investments, L.L.C. phone: 917-305-1748
147 W 35th St, Ste 1106
New York, NY 10001-2140