[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] find: doc: Fix -prune SCM example directories and make it mo
From: |
raf |
Subject: |
Re: [PATCH] find: doc: Fix -prune SCM example directories and make it more efficient |
Date: |
Wed, 23 Nov 2022 20:59:41 +1100 |
On Wed, Nov 23, 2022 at 07:07:14PM +1100, raf <raf@raf.org> wrote:
> On Tue, Nov 22, 2022 at 11:12:00PM +0100, Bernhard Voelker
> <mail@bernhard-voelker.de> wrote:
>
> > Hi raf,
>
> Hi Berny,
>
> > thanks for the patch.
> > Besides busy day job times here, I was actually waiting for the FSF legal
> > to add your entry into the 'copyright.list' file on the GNU fencepost
> > server.
> > It's not yet in place though, so no need to hurry.
>
> No worries. That'll take some time. I've emailed the
> copyright assignment request form to Craig Topham, but
> the paperwork has yet to arrive here for signing.
>
> > On 11/18/22 23:48, raf wrote:
> > > * find/find.1 - Fix -prune SCM example directories and make it more
> > > efficient
> > > * doc/find.texi - Make -prune SCM example more efficient (two ways)
> > > * NEWS - Mention the above
> > > ---
> > > NEWS | 3 +++
> > > doc/find.texi | 25 +++++++++++++++++++++----
> > > find/find.1 | 13 ++++++++++++-
> > > 3 files changed, 36 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/NEWS b/NEWS
> > > index 1fff34f8..7c6fcfb3 100644
> > > --- a/NEWS
> > > +++ b/NEWS
> > > @@ -18,6 +18,9 @@ GNU findutils NEWS - User visible changes. -*-
> > > outline -*- (allout)
> > > When generating the Texinfo manual, `makeinfo` is invoked with the
> > > --no-split
> > > option for all output formats now; this avoids files like
> > > find.info-[12].
> > > + The find.1 manual's -prune SCM example directories/output have been
> > > fixed, and
> > > + the example itself has been made more efficient (find.1 and find.texi)
> > > [#62259]
> > > + Also fixed a typo in find.texi.
> > > * Noteworthy changes in release 4.9.0 (2022-02-22) [stable]
> > > diff --git a/doc/find.texi b/doc/find.texi
> > > index 379fe646..1a36e243 100644
> > > --- a/doc/find.texi
> > > +++ b/doc/find.texi
> > > @@ -5138,13 +5138,15 @@ already found.
> > > @smallexample
> > > find repo/ \
> > > --exec test -d @{@}/.svn \; -or \
> > > --exec test -d @{@}/.git \; -or \
> > > --exec test -d @{@}/CVS \; -print -prune
> > > +-type d \
> > > +\( -exec test -d @{@}/.svn \; \
> > > +-or -exec test -d @{@}/.git \; \
> > > +-or -exec test -d @{@}/CVS \; \
> > > +\) -print -prune
> > > @end smallexample
> >
> > Indeed, the bug omitting the \( ... \) has been introduced in 2008:
> >
> > $ git diff v4.5.6b-28-gaca33f85^..v4.5.6b-28-gaca33f85 -- doc/find.texi |
> > tail -n12
> > @@ -4683,7 +4683,10 @@ searching subdirectories inside projects whose SCM
> > directory we
> > already found.
> >
> > @smallexample
> > -find repo/ -exec test -d @{@}/.svn -o -d @{@}/.git -o -d @{@}/CVS \;
> > -print -prune
> > +find repo/ \
> > +-exec test -d @{@}/.svn \; -or \
> > +-exec test -d @{@}/.git \; -or \
> > +-exec test -d @{@}/CVS \; -print -prune
> > @end smallexample
> >
> > In this example, @command{test} is used to tell if we are currently
> >
> > For find.1, this was already fixed in commit v4.6.0-55-g47d8fd38, but missed
> > to check the analog place in the texi file.
> >
> > Your patch above aims at optimizing things by letting 'find -type d'
> > pre-filter
> > directories in order to avoid -exec invocations on regular and other
> > non-directory
> > files.
> > The '-type d' test is different from 'test -d ...' because the latter
> > transparently
> > follows symbolic links while the former strictly checks on the type of the
> > entry.
> > Therefore, the patch changes the result in the case the repo workspace is a
> > symlink:
> > the new version would skip it. Well, adding -L would help.
>
> Ah, I hadn't thought of that. I can add -L.
>
> > > In this example, @command{test} is used to tell if we are currently
> > > -examining a directory which appears to the a project's root directory
> > > +examining a directory which appears to be a project's root directory
> >
> > good catch!
> >
> > > (because it has an SCM subdirectory). When we find a project root,
> > > there is no need to search inside it, and @code{-prune} makes sure
> > > that we descend no further.
> > > @@ -5153,6 +5155,21 @@ For large, complex trees like the Linux kernel,
> > > this will prevent
> > > searching a large portion of the structure, saving a good deal of
> > > time.
> > > +The @samp{-type d} clause causes the three @samp{test} shell
> > > +processes to only be executed for directories. This can be made even
> > > +more efficient by combining the three @samp{test} shell processes
> > > +into a single process:
> > > +
> > > +@smallexample
> > > +find repo/ \
> > > +-type d \
> > > +-exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d "$1"/CVS'
> > > . {} \; \
> > > +-print -prune
> > > +@end smallexample
> > > +
> > > +Note that the @samp{.} argument is just a placeholder for the unused
> > > +@samp{$0} environment variable in the @samp{sh -c} command. The
> > > +@samp{@{@}} argument is the @samp{$1} environment variable.
> > _______________________________^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > s/environment/shell/, and maybe s/variable/parameter/
>
> Are you sure? The term "environment variable" is a
> fairly standard term. The find documentation (manual
> and info) already uses it at least 16 times. There are
> no appearances there of the term "shell parameter".
>
> Should I change it anyway?
Ah, I get it. I think "shell positional parameter" would be
the most clear term.
> > > @node Security Considerations
> > > @chapter Security Considerations
> >
> > I'm still unsure about efficiency here. Summary:
> >
> > 1) Originally it was:
> >
> > $ find repo/ -exec test -d '{}/.svn' -o -d '{}/.git' -o -d '{}/CVS' \;
> > -print -prune
> >
> > -> 1 process, doing the OR-ing itself. Nice and neat, and efficient, but
> > maybe a bit
> > confusing for the novice reader: it may not be obvious that test(1) is
> > doing the OR-ing,
> > because -o is also a find(1) operator.
> > Besides readability, the test -o operator is discouraged.
> >
> > $ env '[' --help
> > ...
> > NOTE: Binary -a and -o are inherently ambiguous. Use 'test EXPR1 && test
> > EXPR2' or 'test EXPR1 || test EXPR2' instead.
> > ...
> >
> > 2) Now there's the 3x -exec, with fixed -or grouping (with or without
> > '-type d'):
> >
> > $ find repo/ -type d \
> > '(' -exec test -d '{}/.svn' \; -or \
> > -exec test -d '{}/.git' \; -or \
> > -exec test -d '{}/CVS' \; \
> > ')' -print -prune
> >
> > -> less efficient than 1) because it launches 3x test(1) via -exec.
> > OTOH good to read.
> >
> > 3. Then there's 1x -exec to launch a shell process to do the OR-ing:
> >
> > $ find repo/ \
> > -type d \
> > -exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d
> > "$1"/CVS' . {} \; \
> > -print -prune
> >
> > While there's only 1x -exec per directory, I'm wondering if launching one
> > shell process with 3x test (which is usually the shell builtin) is really
> > more efficient than 3x launching the executable found by `which test`.
> >
> > We don't have a measurement, and the results may vary on different systems,
> > but I could imagine the launching a shell is more expensive than launching
> > 3x /usr/bin/test.
> >
> > Anyhow, I like the discussion, and I believe it's good to have several
> > alternatives
> > documented with their individual pros vs. cons considerations.
> > WDYT?
>
> I just did a quick check (Debian vm on a mature macbookpro)
> and 1000 * 3 * /usr/bin/test is 3.8s and 1000 * /bin/sh is
> 1.3s (~3x faster). I'm not surprised. dash is only 2x the
> size of test (both small), and they load the same shared
> libraries.
>
> The original version of this patch just had the single
> -exec version (my preference), but it was thought better to
> leave the three -exec version in place, and add the single
> -exec version as an alternative.
>
> I think either is OK. The more important difference is
> adding -test d and not executing any sh/test processes
> for every non-directory.
>
> > > diff --git a/find/find.1 b/find/find.1
> > > index 429aa2f0..4faaa23b 100644
> > > --- a/find/find.1
> > > +++ b/find/find.1
> > > @@ -2494,6 +2494,7 @@ projects' roots:
> > > .in +4m
> > > .B $ find repo/ \e
> > > .in +4m
> > > +.B \-type d \e
> > > .B \e( \-exec test \-d \(aq{}/.svn\(aq \e; \e
> > > .B \-or \-exec test \-d \(aq{}/.git\(aq \e; \e
> > > .B \-or \-exec test \-d \(aq{}/CVS\(aq \e; \e
> > > @@ -2502,7 +2503,7 @@ projects' roots:
> > > .in -4m
> > > \&
> > > .fi
> > > -Sample output:
> > > +Sample directories:
> > > .nf
> > > \&
> > > .in +4m
> > > @@ -2513,6 +2514,16 @@ Sample output:
> > > .B repo/project4/.git
> > > .in
> > > \&
> > > +Sample output:
> > > +.nf
> > > +\&
> > > +.in +4m
> > > +.B repo/project1
> > > +.B repo/gnu/project2
> > > +.B repo/gnu/project3
> > > +.B repo/project4
> > > +.in
> > > +\&
> > > .fi
> > > In this example,
> > > .B \-prune
> >
> > +1 on this.
> >
> > Have a nice day,
> > Berny
>
> Thanks. You too.
>
> cheers,
> raf
>