[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Bug-gnubg] Benchmarks on server class machines and resulting change
From: |
Ingo Macherius |
Subject: |
RE: [Bug-gnubg] Benchmarks on server class machines and resulting change requests |
Date: |
Mon, 3 Aug 2009 23:06:02 +0200 |
Christian, I've conducted your suggested experiment (batch eval of saved
matches) and can confirm your answer. Calibrate ist not a suitable metric to
evaluate threading behaviour for gnubg.
The batch experiment did analyze five 7pt matches for 4 times each, with full
cache cleaning. The time was taken with unix "time" command. The results are
much more like what one would expect:
- Speed peaked wheen the number of threads equaled the number of cores
- Adding more threads than cores slowed down the evaluation (albeit, by only a
tiny nit)
- Speed decrease increased in the number of threads
The odd finding is that there still are some anonalies, which are:
- Going from 1 to 2 threads more than doubles the evaluation
- It has very little effect adding more threads, i.e. the gain is not linear in
# cores
- 2, 3 and 4 threads result in speeds very close to each other, much closer
than expected
I've attached a ZIP which contains the original OpenOffice 3.1 spreadsheet and
a PDF version of the graphs with the experiment details.
Thx a lot for your guidance!
Ingo
> -----Original Message-----
> From: Christian Anthon [mailto:address@hidden
> Sent: Monday, August 03, 2009 12:29 PM
> To: Ingo Macherius
> Cc: address@hidden
> Subject: Re: [Bug-gnubg] Benchmarks on server class machines
> and resulting change requests
>
>
> The calibrate function sucks bit time. The threaded calibrate
> function sucks even more. I'm tempted to call it useless. I
> believe that you are observing the following: There is some
> overhead involved in displaying and updating the calibration,
> and as you are increasing the number of threads more and more
> time is allocated to evaluation and less and less to
> overhead. If you really want to test the speed of the
> threading then you should analyse a match or perform a rollout.
>
> The original calibration was meant to calibrate certain
> timing functions against the speed of your computer, so
> overhead didn't really matter. That is the function measures
> the speed of your computer, not the speed of gnubg.
>
> Christian.
>
> On Sun, Aug 2, 2009 at 5:06 PM, Ingo
> Macherius<address@hidden> wrote:
> > I have benchmarked gnubg on two server machines, with
> particular focus
> > on multithreading. Both Machines are headless and run Debian 5.x
> > Lenny, Kernel 2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. The
> hardware is:
> >
> > box_A: 2xXeon 5130 @ 2GHz (4 physical cores in 2 chips)
> > box_B: 2xXeon Nocona @ 3GHz (2 physical cores plus 2 HT
> "cores" in 2
> > chips)
> >
> > I found two issues with current gnubg (latest CVS version
> as of August
> > 1st 2009, compiled with gcc 4.3.2.1 with -march=native and sse2
> > support):
> >
> > 1) The "calibrate" command output is off by a factor of 1000, i.e.
> > reports eval/s values 1000 times too high. This holds for
> the figure
> > reported in the official Debian binary installed via apt-get.
> >
> > 2) The limit of 16 threads is too low, I found that to
> utilize the CPU
> > power to 100% 8 threads per core are needed. Interestingly
> this holds
> > for the virtual HT cores as well.
> >
> > @1: Please check the timer code, the problem seems to be in
> timer.c.
> > Obviously the #ifdef part for Windows is fine, but all
> other machines use a faulty version of the timer. I can't
> really suggest a solution, but here is some background info
> from wikipedia: http://en.wikipedia.org/wiki/Rdtsc
> > I would help to fix this one by testing on the
> beforementioned machines under 64 bit Linux.
> >
> > @2: I've tested with a custom gnubg binary with the bug at @1 fixed
> > the hard way by dividing by 1000 hardcodedly and thread
> limit raised
> > to 256. While calibrate was running I've monitored CPU utilization
> > usiing the mpstat command.
> >
> > box_A peaks at about 202K eval/s with 8 threads per core
> (32 total),
> > from where on the number is static until it starts decreasing again
> > when you use hundreds of threads. between 1 and 3 threads I see the
> > expected gain of almost 100% per thread added. Using 4 threads is
> > lowering the throughput as compared to 3 threads. Between 5 and 32
> > threads I see rising throughput which first is linear, and becomes
> > asymptotic as we get closer to 32 threads. Below 32 threads, mpstat
> > reports significant idle times for each CPU, at 32 I see
> each is using
> > 100% of the cycles.
> >
> > A very similar behavior is visible on box_B, despite the
> fact 2 of its
> > "cores" are virtual HT cores.
> >
> > Extrapolating the results suggests gnubg should increase
> the limit for
> > the number of max. threads to 64, maybe even 128 or 256. Rationale:
> > recent server hardware with dual quadcores has 8 cores,
> which should
> > be fully utilizeable only with 64 threads. The suggested 128
> > anticipates future improvements. As there seems to be little to no
> > cost with higher values for max. threads, this seems like a
> cheap way
> > to speed up gnubg on server class machines and quad cores
> at little to
> > no cost.
> >
> > Cheers,
> > Ingo
> >
> >
> >
> > _______________________________________________
> > Bug-gnubg mailing list
> > address@hidden http://lists.gnu.org/mailman/listinfo/bug-gnubg
> >
>
gnubg_calibrate_vs_time.zip
Description: Binary data