bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#8200: cp -lr uses a lot of CPU time.


From: Jim Meyering
Subject: bug#8200: cp -lr uses a lot of CPU time.
Date: Tue, 08 Mar 2011 16:05:04 +0100

Rogier Wolff wrote:
> In my backupscripts I need a "cp -lr" every day. I make backups of
> directories that hold up to millions of files.
>
> When
>
>       cp -lr sourc dest
>
> runs for a while, it becomes CPU limited. Virtual memory is only about
> 2Mb. "resident" is under 1M.

Thank you for the bug report.
That sounds like there is a serious problem, somewhere.
If you give us enough information, we'll find the cause.

For starters, what version of cp did you use?
Run cp --version

> Top reports:
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 26721 root      20   0  2456  720  468 R 58.0  0.1  65:32.60 cp
>  2855 root      20   0  2560  936  624 R 40.8  0.1  30:30.52 cp
>
> and I doubt they are half way through.
>
> I wrote an application I call "cplr" which does the obvious in the
> obvious manner, and it doesn't have this problem.
>
> I've run "strace", and determined that it is not doing systemcalls
> that take much CPU time. Most system calls return in microseconds.

Please give us a sample listing of the syscalls that strace
shows you when you trace one of those long-running cp commands.
A few hundred lines worth would be good.

> The time spent /between/ system calls runs up into the hundreds of
> milliseconds. You might say: well that's way less than a
> second. Sure. But if you need to do that tens of thousands of times it
> becomes quite significant....
>
>
> So my question is: Why does cp -lr take such rediculous amounts of CPU
> time?
>
> Or another way: BUG REPORT: cp -lr takes unneccessary amounts of CPU
> time.

What type of file system are you using, and is it nearly full?
Run this from e.g, the source directory:  df -hT .

Ideally, you'd attach to one of those processes with gdb and step through
the code enough to tell us where it's spending its time, presumably in
coreutils-8.10/src/copy.c.  Just running "gdb -p 26721" (where 26721
is the PID of one of your running cp processes) and typing "backtrace"
at the prompt may give us a good clue.

Next best, you would give us access to your system or a copy of your hierarchy.
But we don't even ask for that, because that's rarely feasible.
Next best: you would give us the output of these two commands:
[if you can do this, please respond privately, not to the list]

    find YOUR_SRC_DIR -ls | xz -e > src.find.xz
    find YOUR_DST_DIR -ls | xz -e > dst.find.xz

[if you don't have xz, install it or use bzip2 -9 instead of "xz -e";
 xz is better]

With that, we'd get an idea of hard link counts and max/average
number of entries per directory, name length, etc.

However, most people don't want to share file names like that.
If you can, please put those two compressed files somewhere like
an upload site and reply with links to them.
Otherwise, please give us some statistics describing your
two hierarchies by running these commands:

These give counts of files and directories for each of your source
and destination directories:
    find YOUR_SRC_DIR -type f |wc -l
    find YOUR_SRC_DIR -type d |wc -l
    find YOUR_DST_DIR -type f |wc -l
    find YOUR_DST_DIR -type d |wc -l

Print the total number of links for each of those directories:
    find YOUR_SRC_DIR -type f -printf '%n\n'|awk '{s += $1} END {printf "%F\n", 
s}'
    find YOUR_DST_DIR -type f -printf '%n\n'|awk '{s += $1} END {printf "%F\n", 
s}'

Jim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]