monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] updates to net.venge.monotone.experiment.performance


From: Eric Anderson
Subject: [Monotone-devel] updates to net.venge.monotone.experiment.performance
Date: Wed, 9 Aug 2006 22:42:37 -0700

I've synced a whole bunch more performance improvements to
net.venge.monotone.experiment.performance.  

Once I get an answer for which ones of these are good for
synchronization to mainline, I'll split them out into individual
branches referenced off of mainline. Detailed changes and performance
measurements are in the checkin messages.


Ready for mainline:

5dc45de4c282a4e93ad1384a68b38740728cd0cb: xdelta/adler32 tuning

Improvements to the xdelta code in order to take advantage of the fact
that we are always using a relatively small window for the adler32
hash, and that when we skip forward, we normally skip forward by alot
so it's faster to just recompute the rolling checksum on the new data
than actually move the rolling checksum forward. 1.03x improvement in
cpu usage on the client, 1.12x improvement in cpu usage on the server.
This would probably give more of a performance benefit on other CPUs,
the Xeon I'm testing on has the half-latency ALU ops, so improving a
bunch of add, and operations shows much less improvement in CPU time
than it does in instruction fetch.

-----------
22457095fd36ea02a44652d45b5e7a6788cdea06: whitespace trimming

Modify whitespace trimming used to make cert id's to append to an
existing string rather than constructing a new string and appending
the new string to the existing one. 1.01x cpu reduction on client,
1.01x cpu reduction on server.

-----------
4d389c13b3bb1235c720b8392f9574f1ddb72d13: inline verify

Move the verify function to be inline so that it disappears from callgrind
output to make it easier to find real problems.  A non-statistically 
significant reduction in user time on client and server.  

-----------
27c06ef5b20ade167011e489bc2e5333eed00faf: configurable vcache size 

Allow for size of the vcache to be set by a lua hook so that people
can choose their tradeoff between memory and cpu usage.  Going from
the default size to 32MiB was worth a 1.52x cpu reduction on the
server, 32MiB to 128MiB was worth another 1.37x cpu reduction for a
cumulative 2.08x improvement from the original setting to 128MiB.


Needs more discussion:

e1a721eb1b1bf8d64229419ac1f73bda0a855590: stop zeroing in Botan::gzip

Remove zeroing of memory used by Botan to do compression. 1.06x
reduction in client time, 1.02x in server time.  I don't think for
monotone's usage there is any security risk introduced by this change,
but in needs to be thought through a little.  I'd expect that more
improvements could be done in this manner as Botan zeros both on free
and allocate, but my one attempt to remove more of the zeroing caused
failures.

-----------
464e510af4959231ff63352c902c689b0f1687aa: binary rosters

Patch to add in binary rosters; substantial (1.2x) speed improvement
for the client on pull, some speed improvement on annotate (only
informally tested, matters much more when annotating a file near the
end of the roster than the beginning).  A wash on the server when
serving from ascii rosters.  Based on a comparison with the timing
numbers in the next checkin, after I'd moved to binary rosters
everywhere, 1.06x speedup on the server to serve from binary rosters.
Could obsolete 4e99cc37f548b5884d63c48bc486dfe98c8d0bd2 although the
patch as written depends on it.

-----------
d6ac464bec394bf665ed8207a169c9ecdb7bbc05: fake pthread

Add in the fake-pthread hack to fake up pthread calls with no-ops so
that programs that don't really need pthreads but are forced to link
by a shared library dependency don't suffer.  1.20x performance
improvement.  Disabled by default in configure.  A truely hideous
hack, but 1.2x performance improvement is substantial.

-----------
4e99cc37f548b5884d63c48bc486dfe98c8d0bd2: fast text-roster annotation

Performance tuning of annotate -- create a special path for roster
parsing to make annotate go faster, and disable some of the database
cross checks; still spends a lot of time parsing, but is overall 5-20x
faster.  This patch could be obsoleted by
464e510af4959231ff63352c902c689b0f1687aa, as once binary rosters are
being used special casing annotation for text rosters doesn't make any
sense.  This patch must have also improved serve time because there
was a massive drop in the cpu usage between the pull measurements
before the patch and after, but I forgot to measure it here, so don't
know for sure that this is where it kicked in.

-----------
997a677db676734acc0d098979d2a9cee8765ec9: libcrypto ssl linking

Enable optional compilation with openssl libcrypto for the optimized
SHA1 hash.  Likely to be obsoleted by getting the fast assembly code
from libcrypto used in Botan.  Depending on how long that's expected
to take, it may be worth merging this patch now (it's disabled by
default) and letting people enable it if they want.  It was a
relatively substantial improvement at the time of the measurements,
probably a whole lot more after all of the other improvements have
been applied.

        -Eric






reply via email to

[Prev in Thread] Current Thread [Next in Thread]