bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] gawk 5.0.1 patch to allow *valid* awk variable names to b


From: Andrew J. Schorr
Subject: Re: [bug-gawk] gawk 5.0.1 patch to allow *valid* awk variable names to be assigned to SYMTAB
Date: Mon, 21 Oct 2019 07:25:06 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

A few points:

1. Tom's suggestion involves more typing than necessary. The example below:
      FNR == 1 { for (i = 1; i <= NF; i++) mem[$i] = i }
      $(mem["Name"]) ~ /^H/ { process H.* records }
      $(mem("Amount"]) < 0 { process negative amount records }

can be rewritten as (and I use this approach all the time):
   NR == 1 {
      for (i = 1; i <= NF; i++)
         m[$i] = i
      next
   }

   $m["Name"] ~ /^H/ { process H.* records }
   $m["Amount"] < 0 { process negative amount records }

This saves 4 characters per variable reference (mem -> m, and no need
for the parentheses), so reduces the overhead by almost 50% (9 -> 5). :-)

So yeah, it costs you a bit of typing, but it's so much cleaner than
mucking with SYMTAB.

2. If the goal here is really to process CSV files, then there's a
gawkextlib project to develop a CSV processing extension that could
probably benefit from some development/contributions.

3. If you develop your own extension, you are welcome to contribute
it to the gawkextlib project.
   http://gawkextlib.sourceforge.net/
   https://sourceforge.net/projects/gawkextlib/

Regards,
Andy

On Mon, Oct 21, 2019 at 01:50:32AM -0400, address@hidden wrote:
> Apologies for forgetting to CC the list for the last two iterations of this
> discussion.  I am correcting that mistake with this reply.
> 
> Arnold,
> 
> I understand your lack of enthusiasm, particularly after seeing the
> unexpected and undesired results when I tried to actually use my proposed
> update.
> 
> After having reviewed the manual documentation of the gawk extension API, I
> tend to agree that what I want to do is most easily done in a new extension
> function.
> 
> There do appear to be several of the delivered extension function sources I
> could use as a model for a relatively simple extension function that
> satisfies my use case.
> 
> Thank you for your understanding, guidance, and genuine consideration of my
> needs.
> 
> If I get such an extension operating correctly and robustly, is there any
> interest in my contributing that extension to the project?
> 
> Regards,
> 
> Peter
> 
> > -----Original Message-----
> > From: address@hidden <address@hidden>
> > Sent: Sunday, October 20, 2019 3:08 PM
> > To: address@hidden; address@hidden
> > Subject: Re: [bug-gawk] gawk 5.0.1 patch to allow *valid* awk variable
> names
> > to be assigned to SYMTAB
> > 
> > OK, I understand the use case.  You want to allow for column names that
> are not
> > used as variables in the program.
> > 
> > I am not overly enthusiastic about making this change. I think it
> encourages
> > confusion as to how SYMTAB works and should be used, and leads (or can
> easily
> > lead) to sloppy programming.
> > 
> > W.R.T. functions that can create and set variables, these can easily be
> written in
> > C as a loadable extension; the manual provides the details.  That is
> probably the
> > easier path to follow than patching gawk itself.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> > <address@hidden> wrote:
> > 
> > > The actual use case here is for CSV files with column header lines.
> > > At BEGINFILE or FNR == 1 time I would like to assign the column header
> > > values (checked for valid variable name format first) as real gawk
> > > variable names and assign the column number as their value.
> > >
> > > Assuming a sample CSV like this:
> > >
> > > Name,Desc,Amount
> > > Harry,Item # 1,-30
> > > Jeffery,"Groups, Pairs, and stuff",46
> > >
> > > I would like to be able to write gawk code like the following
> > > (assuming FPAT has been set to deconstruct CSV records and without a
> > > check for variable name validity to keep it simpler):
> > >
> > > FNR == 1 { for (i = 1; i <= NF; i++) SYMTAB[$i] = i } $Name ~ /^H/ {
> > > process H.* records } $Amount < 0 { process negative amount records }
> > >
> > > Obviously with Tom's suggestion this can be coded today as:
> > >
> > > FNR == 1 { for (i = 1; i <= NF; i++) mem[$i] = i }
> > > $(mem["Name"]) ~ /^H/ { process H.* records }
> > > $(mem("Amount"]) < 0 { process negative amount records }
> > >
> > > But that alternative is more typing and IMHO much less clear and is
> > > also much easier to make typing mistakes while coding.
> > >
> > > If this is insufficient reason to open up assignment to SYMTAB I will
> > > accept your decision, but that is what I was trying to accomplish with
> > > the patch I submitted.
> > >
> > > Alternatively, could (a) builtin function(s) be supplied to DTRT to
> > > create dynamic variables and assign values to them? E.G.,
> > > create_var($i[,value]) and/or assign_var($i,value).
> > >
> > > Peter
> > >
> > > > -----Original Message-----
> > > > From: address@hidden <address@hidden>
> > > > Sent: Wednesday, October 16, 2019 11:14 AM
> > > > To: address@hidden; address@hidden;
> > > > address@hidden
> > > > Cc: address@hidden
> > > > Subject: Re: [bug-gawk] gawk 5.0.1 patch to allow *valid* awk
> > > > variable
> > > names
> > > > to be assigned to SYMTAB
> > > >
> > > > Peter,
> > > >
> > > > I have to agree with Tom here. From the snippet you sent, it doesn't
> > > > look
> > > like
> > > > there's a significant advantage to your using SYMTAB.
> > > >
> > > > Arnold
> > > >
> > > > Tom Gray <address@hidden> wrote:
> > > >
> > > > > HI Peter,
> > > > >
> > > > > Have you considered using your own array instead of SYMTAB[] Lets
> > > > > call it mem[] ... to represent some generic storage space.
> > > > > Then the index will not have any "nice variable name" restrictions.
> > > > >
> > > > >       FNR == 1 { for (I = 1; I <= NF; i++) { mem[$i] =
> computed_value }
> > > > > }
> > > > >
> > > > >       If ((mem[MyVar] > x) && (mem[MyVar] < y)) { do-something
> > > > >
> > > > > In my opinion Gawks jagged arrays of arrays are the best thing
> > > > > since
> > > sliced
> > > > bread.
> > > > > Combined with recursion and indirect function calls you get
> > > > > incredible
> > > power.
> > > > >
> > > > > Do not underestimate the power of portability.  If you write
> > > > > something
> > > cool,
> > > > you want others to be able to use it.
> > > > >
> > > > > Tom
> <Remainder of original chain snipped for brevity>
> --
> 

-- 
Andrew Schorr                      e-mail: address@hidden
Telemetry Investments, L.L.C.      phone:  917-305-1748
545 Fifth Ave, Suite 1108          fax:    212-425-5550
New York, NY 10017-3630



reply via email to

[Prev in Thread] Current Thread [Next in Thread]