bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18681: cp Specific fail example


From: Linda Walsh
Subject: bug#18681: cp Specific fail example
Date: Sun, 12 Oct 2014 19:45:19 -0700
User-agent: Thunderbird



Bob Proulx wrote:
Wow.  Just to be clear an rsync copy took 75 to 90 minutes but a cp
---
        Actually in the case I used for illustration, it was 110 minutes,
but that was longer than normal.  Last night's figures:

: rsync took 87m, 34s      [which is fairly quick given the size of the diffs.]
: Empty-directory removal took 1m, 58s
: Find used space for /home.diff...sz=2.5GB, min=3.1GB, extsz=4.0MB, n-ext'=806
: Copying diffs to dated static snap...Time: 0m, 17s.

It wasn't a copy, but a diff between 2 volumes (the same volume, but one
is a ~24+hour snapshot started the on the previous run.  So I look at
the differences between two temporal copies then copy that to a 3rd
partition that starts out empty.  So rsync is comparing file times (doesn't
do file reads, _by_ _default_, unless it needs to move the data (as indicated
by size and timestamps) -- examines all file time/dates on my 'home'
partition, and compares those against a mostly-the-same- active LVM
snapshot.  Out of 871G, on the long day, it found ~5G of changes --
last night was only 3G... varies based on how much change happened to
the volume over the period... smallest size now is 600m, largest I've seen
has been about 18G.

Once the *difference* is on the 3rd volume ("home.diff"), I destroy
the active snapshot created 'yesterday', then recreate it as as a dynamically
sized static -- enough to hold the diff.  Then cp is used to move
whatever "diffs" were put on the "diff" volume by rsync.  So
Those diffs -- most of them are _likely_ to be in memory -- AND as
I mentioned, I didn't do a sync after the copy (it happens automatically,
but isn't included in the timing).

But if I used rsync to do that exact same copy, it would take at least 2-3
times as long... actually... hold on... I can copy it from that partition made
yesterday ... into the diff parition.. but will tar up the source
to prime the cache...

This is the volume:
df .
Filesystem                          Size  Used Avail Use% Mounted on
/dev/Data/Home-2014.10.08-03.07.05  5.5G  4.4G  1.1G  81%\
                                    /home/.snapdir/@GMT-2014.10.08-03.07.05
Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> du -sh .
4.4G  .

ok... running cp 1st, then remove, then rsync...:

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo cp -a . /home.diff/.
6.39sec 0.15usr 6.23sys (99.81% cpu)

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo rm -fr /home.diff/.
1.69sec 0.03usr 1.64sys (99.43% cpu)

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo rsync -aHAX . /home.diff/.
20.83sec 27.02usr 11.68sys (185.84% cpu)

----185% cpu!... hey! that's cheating and still 3x slower... here's 1 core:

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo rm -fr /home.diff/.
1.73sec 0.03usr 1.69sys (99.39% cpu)

Ishtar:.snapdir/@GMT-2014.10.08-03.07.05> \
        time sudo taskset -a 02 rsync -aHAX . /home.diff/.
38.52sec 25.92usr 11.90sys (98.18% cpu)
---
so limiting it to 1 cpu... 6x slower. (remember this is all
in memory buffered)


Note... rsync has been sped up slightly over the past couple of years
and 'cp' has slown down somewhat  over the same time period, so these
diffs used to be worse.


Then 'cp' is used to copy the image on 'home.diff' to the dynamically
sized
copy took less than 1 minute?  I find that very suspicious.
---
        Well, hopefully the above explanation is more clear and
highlights what we wanted to measure.



It appears that you are using features from rsync that do not exist in
cp.  Therefore the work being done in the task isn't equivalent work.
In that case it is probably quite reasonable for rsync to be slower
than cp.
----
Yup... Never would argue differently, but for what it does, rsync is
still pig slow, but when the amount of data you need to move is hundreds
of times smaller than the total, it can't be beat!



Also consider that if cp were to acquire all of the enhancements that
have been requested for cp as time has gone by then cp would be just
as featureful (bloated!) as rsync and likely just as slow as rsync
too.
----
        Nope...rsync is slow because it does everything over a client
server model --- even when it is local.  So everything is written through
a pipe .. that's why it can't come close to cp -- and why cp would never
be so slow -- I can't imagine it using a pipe to copy a file anywhere!


This is something to consider every time someone asks for a
creeping feature to cp.  Especially if they say they want the feature
in cp because it is faster than rsync.  The natural progression is
that cp would become rsync.
----
        Not even!  Note.  cp already has a comparison function
built in that it uses during "cp -u"... but it doesn't go through
pipes.  It used to use larger buffer sizes or maybe tell posix
to pre-alloc the destination space, dunno, but it used to be
faster.. I can't say for certain, but it seems to be using
smaller buffer sizes.  Another reason rsync is so slow -- uses
a relatively small i/o size 1-4k last I looked. I've asked them
to increase it, but going through a pipe it won't help alot.

This is from a different email on the rsync list from 7/26:

One might ask why rsync is so slow --
copying 800G from 1 partition to another via xfsdump/restore takes a bit under 2 hours,
or about 170MB/s, but with rsync, on the same partition with rsync transfering
less than 1/1000th as much (700MB [in a differential as I mentioned above]), it
took ~70-80 minutes... or about 163kB/s.

Transfer speeds depend on many factors.  One of the largest is
transfer size (how much transfered with 1 write /read.
Transferring 1GB,  @ 1-meg at a time, took 2.08s read, and
1.56s to write (using direct io).

Transfer it in 4K chunks: 37.28s, to read, and 43.02s to write.
1k buffers are 4x slower than that!

Also in rsync, they've added the posix calls to reserve
space in the target location for a file being copied in.
Specifically, this is to lower disk fragmentation (does
cp do anything like that, been a while since I looked).




If rsync wasn't so slow at local I/O...*sigh*....

The advantage of rsync is that it can be interrupted and restarted and
the restarted process will efficiently avoid doing work that is
already done.  An interrupted and restarted cp will perform the same
work again from start to finish.
----
        I wouldn't trust that it would.  If you interrupt it at exactly
the wrong time, I'd be afraid some file might get set with the right
data but the wrong Meta info (acls, primarily).


If I am doing a simple copy from A to B then I use 'cp -av A B'.  If I
am doing it the second time then I will use rsync to avoid repeating
previously done work 'rsync -av A B'.
---
        Wouldn't cp -auv A B do the same?



If I want progress indication...  If I want placement of backup files
in a particular directory...  If I want other fancy features that are
provided by rsync then it is worth it to use rsync.

  $ du -s coreutils
  238920  coreutils
  $ find coreutils -type f | wc -l
  15013

  $ rm -rf junk/coreutils
  # echo 3 > /proc/sys/vm/drop_caches
  $ time cp -a coreutils junk/
  real    1m2.137s
  user    0m0.140s
  sys     0m1.724s

  $ rm -rf junk/coreutils
  $ time cp -a coreutils junk/
  real    0m2.492s
  user    0m0.060s
  sys     0m1.064s

  $ rm -rf junk/coreutils
  # echo 3 > /proc/sys/vm/drop_caches
  $ time rsync -a coreutils junk/
  real    1m5.473s
  user    0m1.280s
  sys     0m2.112s

  $ rm -rf junk/coreutils
  $ time rsync -a coreutils junk/
  real    0m3.215s
  user    0m1.184s
  sys     0m1.536s
---
By default cp -a transfers acls and ext-attrs and preserves
hard links.   Rsync doesn't do any of that by default.
You need to  use "-aHAX" to compare them ...

you have to call them
out as 'extra' with rsync, so the above test may not be what it seems.
Though if you don't use ACL's (which I do), then maybe the above
is almost reasonable.  Still.. should use -aHAX

Is your rsync newer? i.e. does it have the posix-pre-alloc
hints?... Mine has a pre-alloc patch, but I think that was
suse-added and not the one in the mainline code.  Not sure.


rsync --version
rsync  version 3.1.0  protocol version 31
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc, SLP


I don't think mine does yet...

For normal use cp is a little faster than rsync.  Or rather rsync is a
little slower than cp.  But not enough to make a difference for
typical operations.  Having the file system cache warmed up makes a
*HUGE* difference.  Much larger than any other difference.  For copies
that take hours to run I am probably going to value the restart
ability more than raw speed.  YMMV.
----
        I'll value the accuracy of xfsdump/restore...

        Throw a few TB copies at rsync -- where all the data
won't fit in memory.... it also, I'm told, has problems with
hardlinks, acls and xattrs slowing it down, so it may be a
matter of usage...

        BUT all that said... note that I DO USE it... for the
job I'm doing in my snapper script, nothing else will.

Cheers!
Linda

(don't ya just love performance talk?)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]