bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22128: dirname enhancement


From: Stephane Chazelas
Subject: bug#22128: dirname enhancement
Date: Fri, 11 Dec 2015 14:46:38 +0000
User-agent: Mutt/1.5.21 (2010-09-15)

2015-12-10 10:40:30 -0700, Bob Proulx:
[...]
> In this instance the first thing I thought of when I read your dirname
> -f request was a loop.
> 
>    while read dir; do dirname $dir; done < list

"read dir" expects the input in a very specific format and
depends on the current value of IFS (like a dir called "my\dir "
has to be input as "my\\dir\ " with the default value of IFS)
and can't accept dir names with newline characters.

Invoking the split+glob operator on $dir doesn't make sense here
unless you mean the input to be treated as a $IFS delimited list
of patterns.

If the intention was to treat the input as a list of file
paths, one per line (so can't do file paths with newline
characters), then that would rather be:

 while IFS= read -r dir; do dirname -- "$dir"; done < list

> 
> Pádraig suggested xargs which was even shorter.
> 
>   xargs dirname < filename

That expects yet another input format. That time, it can cope
with any file path, since newline can be specified using quotes
like:

"my dir
with newline"

The output of dirname however won't be post-processable.


> Both of those directly do exactly what you had asked to do.  The
> technique works not only with dirname but with every other command on
> the system too.  A technique that works with everything is much better
> than something that only works in one small place.

The while loop you can't reasonably do for large file lists as
running one dirname invocation per file is going to be
prohibitive in terms of performance.

The xargs approach, you can do only with GNU dirname as it
supports passing more than one string as an extension over the
standard.

I think here we're seeing the limits of shell scripting. OK,
dirname is the tool to get a dirname, but doing it in a loop is
not practical/efficient and produces an ambiguous output (not to
mention that file names are not necessarily valid text so the
passing of that data through text utilities can be a problem)

Extending all the utilities so that they can take a list of
arguments from stdin instead of arguments is one solution (and
one solution applied by several GNU utilities already (like
--files0-from in du/sort/wc) but I agree xargs -r0 is a more
generic solution and good enough for things like dirname since
the number of invocations is minimised..

The --files0-from option of du/sort/wc are justified because
xargs -r0 wouldn't work (as several invocations of the utilities
could end-up being made which wouldn't work for them), but not
for dirname. (I'd argue ls would need one for its sorting though
(and an option to outut NUL delimited).

That can't be applied for commands that take only one argument
like basename though.

GNU xargs addresses the problem of the stdin of the command
being redirected (like for rm -i) with its --arg-file option

The problem with dirname is that OK, GNU dirname can take
several paths as arguments but then its output is not
post-processable reliably ("dirname a/b a/c" and "dirname
$'a\na/b'" produce the same output for instance).

Here using another programming language/paradigm that has the
"dirname" capability and can deal with list of strings reliably
within the same command (like perl or zsh) would be a more
reliable and efficient approach.


zsh:

files=(${(z)<file.list}) # read a NUL delimited list
print -rN -- $files:h # print the dirnames as NUL delimited
print -rN -- ${(u)files:h} # same for unique items.

perl -MFile::Basename -0 -lpe '$_ = dirname $_' < file.list

Those can handle strings with any byte values. For the shell
pipelines, you have issues with NUL/NL, and depending on the
tool, invalid characters, long lines, things starting with "-",
things containing "="...


> Want a sorted unique list modified in some custom way?
> 
>    while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u
[...]

I would recommend the reading of
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice

Here, I'd do:

< list sed -z 's/foo/bar/' | LC_ALL=C sort -zu

Assuming a NUL delimited list in "list".

-- 
Stephane






reply via email to

[Prev in Thread] Current Thread [Next in Thread]