help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] latest


From: Greg Wooledge
Subject: Re: [Help-bash] latest
Date: Mon, 28 Nov 2016 09:27:24 -0500
User-agent: Mutt/1.4.2.3i

On Sun, Nov 27, 2016 at 04:43:18AM +0000, Val Krem wrote:
> I am trying to fine  the latest 5 files in a given folder and count the 
> number of rows in each  of files

To do this 100% safely, you need some non-POSIX tools.  GNU find and
GNU sort are especially helpful.

FIRST: you can't use ls.  Because ls mangles filenames.  And even GNU ls
has no extension to print filenames with a NUL terminator instead of
a newline terminator -- meaning there is absolutely no way ls can handle
a stream of filenames that contain newlines.

SECOND: since you can't use ls, that means you can't use ls's -t option
to do the sorting by mtime.  That means you need something else to do
the sorting by mtime.  The obvious candidate is sort.  For that to work,
you need to print the mtime in addition to the filename, and all of this
has to be done with NUL terminators so that you can handle filenames
with newlines.  GNU sort has a -z option to handle NUL-terminated
records.  GNU find has a -printf option that can print file metadata
in any format you choose.  So we use these.  Then we simply remove the
timestamp from each record, and what's left is the pathname.

THIRD: you need bash's read -r -d '' to handle the NUL-delimited stream.
POSIX sh can't do it.  ksh can't do it.  Only bash.

FOURTH: you can't use head or tail to filter the NUL-delimited stream.
You'll just have to use the shell to count pathnames.

FINALLY: once you actually get the filenames, the rest is a piece of cake.
Just use wc -l to count lines in the expected way.  (At that point, the
output has been written in a human-readable format and CANNOT be further
processed.  If you want to process this stream of line-counts and pathnames
with more scripting, then DO NOT just call wc -l.  But that will have to
wait until you actually change the question on us.  Which is inevitable.)

So, here's one way:

dir=/whatever
i=0
while ((i < 5)) && read -r -d '' time file; do
  wc -l "$file"
  ((i++))
done < <(find "$dir" -maxdepth 1 -type f -printf '%T@ %p\0' |
         sort -z -n -r)

If you want to get fancier, you can read the pathnames into an array
and then pass them all to a single wc -l command, or whatever you feel
is appropriate.  This will be the way to go if you want to omit the -r
option from sort, to get the pathnames in ascending chronological order
instead of descending.

Novices are probably looking at this script in horror.  Well, I'm sorry,
but this is how shell scripting has to be done in an environment that
allows newlines in filenames.  There is no safe way to use the simple
tools that SEEM like they should work, but don't.  Every answer you've
seen to this question in the past that uses ls -t and head/tail is wrong,
and is beyond salvage.  The whole thing has to be thrown away.  There is
no quick fix.

You could use a pipeline instead of the process substitution, but then
the while loop runs in a subshell and all work done in that subshell is
lost upon return to the parent (script).  And the question will ALWAYS
change in such a way that you need the more generalized form, if you
give the simpler answer first.  So it's better to just bite the bullet
and use the complex, generalized form.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]