bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can "gawk -i extension" be made safer?


From: Stephane CHAZELAS
Subject: Re: Can "gawk -i extension" be made safer?
Date: Tue, 27 Jun 2023 09:14:56 +0100

2023-06-26 21:40:55 -0400, Andrew J. Schorr:
> Hi,
[...]
> > Note that the new -I I was suggesting would not change gawk's
> > established behaviour. That would not fix scripts that currently
> > use -i or @include, but at least would allow script writer to
> > switch to a safer API going forward.
> 
> There is actually already an -I option used for tracing:
> 
> bash-5.1$ ./gawk --help | fgrep -e -I
>         -I                      --trace

My bad, I only checked with 5.1.0 on Ubuntu 22.04, that -I was
apparently added in 5.1.1

Then maybe gawk -m / --module?

> > Maybe a @use (a la perl) or @import (a la python) or @require...
> > could be the corresponding directive.
> 
> Is it really worth adding a new directive when the problem here is actually
> that the path is not set appropriately?

The problem is that that $AWKPATH is also used for -f/-E

awk -f script.awk

has always been intended (in any awk, since the 70s) to be the
same as awk -f ./script.awk (and in implementations other
than gawk never to be the same as awk -f
/some/library/script.awk nor awk -f script.awk.awk or awk -f
/some/library/script.awk.awk).

If we remove . from $AWKPATH globally, we break that. We also
break #! /usr/bin/gawk -E scripts for the cases where they're
invoked as execve("myscript", args, env) (admitedly rare in
practice).

As someone mentioned at
https://unix.stackexchange.com/questions/749645/how-to-safely-use-gawks-i-option-or-include-directive/749910#749910
C has #include "file path on the filesystem (relative to the file it is 
included from though)" vs 
      #include <file looked up in a search path>
which removes the problem here.

So gawk could do the something similar where @include <inplace>
looks up inplace or inplace.awk in the absolute components of
$AWKPATH while @include "file.awk" remains unchanged for
backward compatibility but ideally would only treat "file.awk" as
a file path on the filesystem.

And -i/--include inplace does the same as @include "inplace" and 
-m/--module inplace does the same as @incluce <inplace>

> > My understanding is that the -Wposix mode is intended for users
> > that don't care about gawk extensions and want a awk that
> > behaves the standard way. The current behaviour of -f that looks
> > for files in $AWKPATH or looks for them with ".awk" added breaks
> > POSIX compliance, so changing the behaviour as I suggested in
> > POSIX mode would restore compliance and would be unlikely to
> > break any script.
> 
> I'm going to leave the language lawyering to Arnold. I don't know whether
> gawk --posix mode should care much about how to find source code.

See
https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/awk.html
for the specification

-f  progfile
     Specify the pathname of the file progfile containing an awk
     program

progfile is to be interpreted as a pathname (itself defined at
https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap03.html#tag_03_271

[...]
> > I don't thinkg AWKLIBPATH has a problem. Its default value
> > doesn't include "." or the empty string AFAICT.
> 
> I agree that the default value is fine, but presumably somebody could
> change it.

Yes, and if they do have

cd mysoftware && AWKLIBPATH=./lib gawk -l mylib -l myotherlib

We should probably not get in their way.

[...]
> > Also searching in the current working directory is desirable for
> > -f or -E (where IMO arguments should only be interpreted as
> > paths) and some usages of -i/@include while it is unwelcome for
> > usages of -i extension.
> 
> I'm confused by that point. You seem to say that it's desirable for
> some usages of -i, and then say that it is unwelcome. And why is it desirable
> for -E?

See above (and below).

-i is --include also intended to include actual files as opposed
to "standard" extensions. Looking at some usages of gawk -i
through a github code search, I saw some cases of:

cd mysoftware && gawk -i myfile-to-include.awk ...

Where the intention was also to look for the file in the current
working directory.

While in gawk -i inplace, the inplace extension is intended to
be loaded from some "standard" place regardless of what the
current working directory is, hence the need to distinguish the
2 cases with --include vs --module and @include "file.awk" vs
@include <module>.

[...]
> > #! /usr/bin/safegawk -f/-E
> > 
> > We do *not* want the argument of -f/-E (which here is filled-in
> > by the system from the first (file path) argument to execve())
> > to be looked-up in $AWKPATH, only be interpreted as a file path.
> 
> That is true. I'm not sure how best to handle the gawk shebang case.
> As you suggest, it may be inadvisable for -E to do any searching at all.
> And perhaps -E should have a side-effect of disabling relative paths
> in AWKPATH and in AWKLIBPATH, on the theory that -E is used only 
> in shebang situations.
> 
> Regards,
> Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]