bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#29921: O(n^2) performance of rm -r


From: Niklas Hambüchen
Subject: bug#29921: O(n^2) performance of rm -r
Date: Sun, 31 Dec 2017 22:07:31 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Thunderbird/58.0

Hello,

in commit

  rm -r: avoid O(n^2) performance for a directory with very many entries
  http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a

it says that `rm -r`

  "now displays linear performance, even when operating on million-entry
directories on ext3 and ext4"

I found this not to be the case on my systems running ext4 on Linux 4.9
on a WD 4TB spinning disk, coreutils rm 8.28.
As reported on https://bugs.python.org/issue32453#msg309303, I got:

 nfiles     real   user     sys

 100000    0.51s  0.07s   0.43s
 200000    2.46s  0.15s   0.89s
 400000   10.78s  0.26s   2.21s
 800000   44.72s  0.58s   6.03s
1600000  180.37s  1.06s  10.70s

Each 2x increase of number of files results in 4x increased deletion
time, making for a clean O(n^2) quadratic curve.

I'm testing this with:

  set -e
  rm -rf dirtest/
  echo  100000 && (mkdir dirtest && cd dirtest && seq 1  100000 | xargs
touch) && time rm -r dirtest/
  echo  200000 && (mkdir dirtest && cd dirtest && seq 1  200000 | xargs
touch) && time rm -r dirtest/
  echo  400000 && (mkdir dirtest && cd dirtest && seq 1  400000 | xargs
touch) && time rm -r dirtest/
  echo  800000 && (mkdir dirtest && cd dirtest && seq 1  800000 | xargs
touch) && time rm -r dirtest/
  echo 1600000 && (mkdir dirtest && cd dirtest && seq 1 1600000 | xargs
touch) && time rm -r dirtest/


On another system, Ubuntu 16.04, coretuils rm 8.25, with Linux 4.10 on
ext4, I get:

 nfiles     real   user      sys

 100000    0.94s  0.06s   0.516s
 200000    2.94s  0.10s   1.060s
 400000   10.88s  0.30s   2.508s
 800000   34.60s  0.48s   4.912s
1600000  203.87s  0.99s  11.708s

Also quadratic.

Same machine on XFS:

 nfiles     real   user     sys

 100000    3.37s  0.04s   0.98s
 200000    7.20s  0.06s   2.03s
 400000   22.52s  0.16s   5.11s
 800000   53.31s  0.37s  11.46s
1600000  200.83s  0.76s  22.41s

Quadratic.

What is the complexity of `rm -r` supposed to be?
Was it really linear in the past as the patch describes, and this is a
regression?
Can we make it work linearly again?

Thanks!
Niklas





reply via email to

[Prev in Thread] Current Thread [Next in Thread]