findutils-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Findutils-patches] Re: make find *much* faster on reiserfs


From: James Youngman
Subject: [Findutils-patches] Re: make find *much* faster on reiserfs
Date: Sat, 21 Feb 2009 18:09:31 +0000

On Sun, Feb 15, 2009 at 11:44 PM, Jim Meyering <address@hidden> wrote:
> [reposting to the right list]
>
> Following up on this post,
>
>  http://thread.gmane.org/gmane.comp.gnu.findutils.bugs/3894
>
> now that the gnulib/fts changes are in,
> here are the parts required (in addition to getting the
> latest fts.c) in order to enable the improvement.
>
> With these, all tests pass, and valgrind says they do so cleanly.
> And my daily updatedb-run find now runs in just 3-4 minutes instead
> of over 30.
>
> From 167e1bb018f4a336b57a792e3b552dfc9993459b Mon Sep 17 00:00:00 2001
> From: Jim Meyering <address@hidden>
> Date: Sun, 15 Feb 2009 13:03:52 +0100
> Subject: [PATCH 1/2] find: enable fts's FTS_CWDFD mode
>
> * find/ftsfind.c (ftsoptions): Set the FTS_CWDFD bit.
> This is required in order to take advantage of fts' leaf-optimization.
> ---
>  find/ftsfind.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/find/ftsfind.c b/find/ftsfind.c
> index b3d44f8..4a33059 100644
> --- a/find/ftsfind.c
> +++ b/find/ftsfind.c
> @@ -93,7 +93,7 @@ static void set_close_on_exec(int fd)
>  /* FTS_TIGHT_CYCLE_CHECK tries to work around Savannah bug #17877
>  * (but actually using it doesn't fix the bug).
>  */
> -static int ftsoptions = FTS_NOSTAT|FTS_TIGHT_CYCLE_CHECK;
> +static int ftsoptions = FTS_NOSTAT|FTS_TIGHT_CYCLE_CHECK|FTS_CWDFD;
>
>  static int prev_depth = INT_MIN; /* fts_level can be < 0 */
>  static int curr_fd = -1;
> --
> 1.6.2.rc0.238.g7a114
>
>
> From 7f9db96b9da3f879a5cf73d73c6e568004f939f9 Mon Sep 17 00:00:00 2001
> From: Jim Meyering <address@hidden>
> Date: Fri, 26 Dec 2008 18:28:10 +0100
> Subject: [PATCH 2/2] find: take advantage of new gnulib/fts leaf-optimization
>
> * find/ftsfind.c (consider_visiting): Allow state.type to be 0
> when fts_info is FTS_NSOK;
>
> This allows find to process an fts entry for which fts_read returns
> FTS_NSOK (no stat) but for which find requires only type info.
> This happens on file systems that lack dirent.dtype information.
> Currently, only reiserfs is handled this way.  Until the recent
> gnulib/fts change, [97d5b66] "fts: arrange not to stat non-directories
> in more cases" this change was not necessary, because fts would always
> stat non-dir entries on a file system with no dirent.dtype information.
>
> However, combined with the gnulib change, this change lets find
> avoid many per-non-directory stat-like syscalls (i.e. fstatat)
> in some very common cases, like "find . -print" on reiserfs --
> which can be a huge performance savings.
> ---
>  find/ftsfind.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/find/ftsfind.c b/find/ftsfind.c
> index 4a33059..765221b 100644
> --- a/find/ftsfind.c
> +++ b/find/ftsfind.c
> @@ -1,6 +1,6 @@
>  /* find -- search for files in a directory hierarchy (fts version)
>    Copyright (C) 1990, 91, 92, 93, 94, 2000, 2003, 2004, 2005, 2006,
> -                 2007, 2008 Free Software Foundation, Inc.
> +                 2007, 2008, 2009 Free Software Foundation, Inc.
>
>    This program is free software: you can redistribute it and/or modify
>    it under the terms of the GNU General Public License as published by
> @@ -472,7 +472,7 @@ consider_visiting(FTS *p, FTSENT *ent)
>       || ent->fts_info == FTS_NS /* e.g. symlink loop */)
>     {
>       assert (!state.have_stat);
> -      assert (state.type != 0);
> +      assert (ent->fts_info == FTS_NSOK || state.type != 0);
>       mode = state.type;
>     }
>   else
> --
> 1.6.2.rc0.238.g7a114

Applied.  Thanks.

I tried this out on 3 million files or so on ext3, but wasn't able to
measure a performance difference (unsurprising, since it does populate
d_type).    However, we now seem to have closed the stat() gap with
oldfind; a pair of tests running on a directory tree of 132807 files
gives the following number of calls to stat:

TRACE.ftsfind:        7197           newfstatat
TRACE.ftsfind:        7373           fstat

TRACE.oldfind:        7195           stat
TRACE.oldfind:        7198           newfstatat
TRACE.oldfind:        7204           fstat

FWIW, the only remaining gap between oldfind and ftsfind is the
problem with the handling directory exits.   The current ftsfind code
doesn't know when we're done with each directory, so it can't
accumulate a list of arguments to process with "-execdir {} +":

$ cat /tmp/show-dir-and-arg.sh
#! /bin/sh
echo "dir=$(pwd), args=$*"
$ ./find/oldfind . -execdir /tmp/show-dir-and-arg.sh {} \+ | wc -l
47
$ ./find/find . -execdir /tmp/show-dir-and-arg.sh {} \+ | wc -l
568

James.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]