findutils-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] find: doc: Fix -prune SCM example directories and make it mo


From: Bernhard Voelker
Subject: Re: [PATCH] find: doc: Fix -prune SCM example directories and make it more efficient
Date: Tue, 22 Nov 2022 23:12:00 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

Hi raf,

thanks for the patch.
Besides busy day job times here, I was actually waiting for the FSF legal
to add your entry into the 'copyright.list' file on the GNU fencepost server.
It's not yet in place though, so no need to hurry.

On 11/18/22 23:48, raf wrote:
* find/find.1 - Fix -prune SCM example directories and make it more efficient
* doc/find.texi - Make -prune SCM example more efficient (two ways)
* NEWS - Mention the above
---
  NEWS          |  3 +++
  doc/find.texi | 25 +++++++++++++++++++++----
  find/find.1   | 13 ++++++++++++-
  3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index 1fff34f8..7c6fcfb3 100644
--- a/NEWS
+++ b/NEWS
@@ -18,6 +18,9 @@ GNU findutils NEWS - User visible changes.      -*- outline 
-*- (allout)
    When generating the Texinfo manual, `makeinfo` is invoked with the 
--no-split
    option for all output formats now; this avoids files like find.info-[12].
+ The find.1 manual's -prune SCM example directories/output have been fixed, and
+  the example itself has been made more efficient (find.1 and find.texi) 
[#62259]
+  Also fixed a typo in find.texi.
* Noteworthy changes in release 4.9.0 (2022-02-22) [stable] diff --git a/doc/find.texi b/doc/find.texi
index 379fe646..1a36e243 100644
--- a/doc/find.texi
+++ b/doc/find.texi
@@ -5138,13 +5138,15 @@ already found.
@smallexample
  find repo/ \
--exec test -d @{@}/.svn \; -or \
--exec test -d @{@}/.git \; -or \
--exec test -d @{@}/CVS \; -print -prune
+-type d \
+\( -exec test -d @{@}/.svn \; \
+-or -exec test -d @{@}/.git \; \
+-or -exec test -d @{@}/CVS \; \
+\) -print -prune
  @end smallexample

Indeed, the bug omitting the \( ... \) has been introduced in 2008:

  $ git diff v4.5.6b-28-gaca33f85^..v4.5.6b-28-gaca33f85 -- doc/find.texi | 
tail -n12
  @@ -4683,7 +4683,10 @@ searching subdirectories inside projects whose SCM 
directory we
   already found.

   @smallexample
  -find repo/ -exec test -d @{@}/.svn -o -d @{@}/.git -o -d @{@}/CVS \; -print 
-prune
  +find repo/ \
  +-exec test -d @{@}/.svn \; -or \
  +-exec test -d @{@}/.git \; -or \
  +-exec test -d @{@}/CVS \; -print -prune
   @end smallexample

   In this example, @command{test} is used to tell if we are currently

For find.1, this was already fixed in commit v4.6.0-55-g47d8fd38, but missed
to check the analog place in the texi file.

Your patch above aims at optimizing things by letting 'find -type d' pre-filter
directories in order to avoid -exec invocations on regular and other 
non-directory
files.
The '-type d' test is different from 'test -d ...' because the latter 
transparently
follows symbolic links while the former strictly checks on the type of the 
entry.
Therefore, the patch changes the result in the case the repo workspace is a 
symlink:
the new version would skip it.  Well, adding -L would help.

  In this example, @command{test} is used to tell if we are currently
-examining a directory which appears to the a project's root directory
+examining a directory which appears to be a project's root directory

good catch!

  (because it has an SCM subdirectory).  When we find a project root,
  there is no need to search inside it, and @code{-prune} makes sure
  that we descend no further.
@@ -5153,6 +5155,21 @@ For large, complex trees like the Linux kernel, this 
will prevent
  searching a large portion of the structure, saving a good deal of
  time.
+The @samp{-type d} clause causes the three @samp{test} shell
+processes to only be executed for directories. This can be made even
+more efficient by combining the three @samp{test} shell processes
+into a single process:
+
+@smallexample
+find repo/ \
+-type d \
+-exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d "$1"/CVS' . {} 
\; \
+-print -prune
+@end smallexample
+
+Note that the @samp{.} argument is just a placeholder for the unused
+@samp{$0} environment variable in the @samp{sh -c} command. The
+@samp{@{@}} argument is the @samp{$1} environment variable.
_______________________________^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

s/environment/shell/, and maybe s/variable/parameter/

  @node Security Considerations
  @chapter Security Considerations

I'm still unsure about efficiency here.  Summary:

1) Originally it was:

  $ find repo/ -exec test -d '{}/.svn' -o -d '{}/.git' -o -d '{}/CVS' \; -print 
-prune

-> 1 process, doing the OR-ing itself.  Nice and neat, and efficient, but maybe 
a bit
confusing for the novice reader: it may not be obvious that test(1) is doing 
the OR-ing,
because -o is also a find(1) operator.
Besides readability, the test -o operator is discouraged.

  $ env '[' --help
  ...
  NOTE: Binary -a and -o are inherently ambiguous.  Use 'test EXPR1 && test
  EXPR2' or 'test EXPR1 || test EXPR2' instead.
  ...

2) Now there's the 3x -exec, with fixed -or grouping (with or without '-type 
d'):

  $ find repo/ -type d \
      '(' -exec test -d '{}/.svn' \; -or \
          -exec test -d '{}/.git' \; -or \
          -exec test -d '{}/CVS' \; \
      ')' -print -prune

-> less efficient than 1) because it launches 3x test(1) via -exec.
OTOH good to read.

3. Then there's 1x -exec to launch a shell process to do the OR-ing:

  $ find repo/ \
      -type d \
      -exec sh -c 'test -d "$1"/.svn || test -d "$1"/.git || test -d "$1"/CVS' 
. {} \; \
      -print -prune

While there's only 1x -exec per directory, I'm wondering if launching one
shell process with 3x test (which is usually the shell builtin) is really
more efficient than 3x launching the executable found by `which test`.

We don't have a measurement, and the results may vary on different systems,
but I could imagine the launching a shell is more expensive than launching
3x /usr/bin/test.

Anyhow, I like the discussion, and I believe it's good to have several 
alternatives
documented with their individual pros vs. cons considerations.
WDYT?

diff --git a/find/find.1 b/find/find.1
index 429aa2f0..4faaa23b 100644
--- a/find/find.1
+++ b/find/find.1
@@ -2494,6 +2494,7 @@ projects' roots:
  .in +4m
  .B $ find repo/ \e
  .in +4m
+.B \-type d \e
  .B \e( \-exec test \-d \(aq{}/.svn\(aq \e; \e
  .B \-or \-exec test \-d \(aq{}/.git\(aq \e; \e
  .B \-or \-exec test \-d \(aq{}/CVS\(aq \e; \e
@@ -2502,7 +2503,7 @@ projects' roots:
  .in -4m
  \&
  .fi
-Sample output:
+Sample directories:
  .nf
  \&
  .in +4m
@@ -2513,6 +2514,16 @@ Sample output:
  .B repo/project4/.git
  .in
  \&
+Sample output:
+.nf
+\&
+.in +4m
+.B repo/project1
+.B repo/gnu/project2
+.B repo/gnu/project3
+.B repo/project4
+.in
+\&
  .fi
  In this example,
  .B \-prune

+1 on this.

Have a nice day,
Berny



reply via email to

[Prev in Thread] Current Thread [Next in Thread]