help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Shift and rotate in gawk


From: Neil R. Ormos
Subject: Re: Shift and rotate in gawk
Date: Mon, 29 Apr 2024 18:18:17 -0500 (CDT)
User-agent: Alpine 2.20 (DEB 67 2015-01-07)

hackerb9@member.fsf.org wrote:

> Can someone help me either prove that my gawk
> code is correct or explain why it is not? I've
> been emailing back and forth with a highly
> respected expert who says it is wrong and has
> not believed my gawk output, my timing tests, or
> even my analysis of the gawk source code. [...]

> [sample code elided]

> *The expert's response*: 

>    1. It will replace all strings that match FS with
>       the value of OFS.

>    2. It will reconstruct $0 NF times for every line so
>       it'll be slow.

> 3[]a. Modifying a field causes awk to reconstruct $0
>       replacing every FS with OFS. [...]

If your goal is to produce code that works, you can avoid the risk from side 
effects of awk's inherent field splitting and reassembly involving $0..$NF by 
adopting Andy's suggestion that you split the record explicitly and manage the 
pieces yourself.  (Andy is also a highly respected expert.)

If that is not an acceptable solution, then you can minimize risk, despite 
using the inherent field splitting, by:

  (a) understanding what triggers awk to
      reassemble $0 from the field variables,
      using OFS as a separator;

  (b) understanding what triggers awk to re-split
      $0 into the field variables, using FS as a
      separator; and

  (c) performing testing appropriately contrived
      to cover the universe of possible input.

It seems that you've already done (a) and (b), so you already know that the 
expert's contention, especially w/r/t "all", is correct for certain sequences 
of operations on $0 and the field variables, but does not apply to the sample 
code you posted.

As for expert contentions 2 and 3a, you have already found what triggers 
reassembly of $0 in the gawk source code, and your performance measurements 
appear to be consistent with your understanding of gawk's behavior.  If your 
program produces acceptable performance, as measured over your defined input in 
an execution environment similar to what will be used in production, then there 
may be little value in attempting further to disprove contrary outside 
speculation.

> I have gone to lengths to find flaws in my
> code. I don't want to overwhelm people with
> details that they may not even be interested
> in. I think everyone can see for themselves that
> it works correctly, but is there something more
> I could do to demonstrate that? Should I bother
> explaining in detail the flaws in the expert's
> test code? Do people want to see timing tests
> showing that this method is not slow? Would it
> help to demonstrate that it is as fast or faster
> than the commonly seen (and incorrect) methods,
> such as k=$1; $1=""; $0 = $0 k? Even though this
> is a gawk specific question, should I show that
> my code works even on the oldest versions of AWK
> still in use (e.g., MacOS's 2007 version of
> Brian Kernighan's *One True Awk*)? Would
> pointing to how $0 is rebuilt in the gawk source
> code be useful? Do people want a patch to the
> current gawk git which outputs how many times $0
> has been rebuilt when it exits?
 
> What am I missing and what more can I do?

Are you obliged to obtain the respected expert's approval of your code?  Is 
there some other upside to winning the argument?

If not, why not simply thank the expert for his help, and proceed with 
development?  Wouldn't writing a few more test cases (etc.) be more productive 
than further attempts to persuade?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]