bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-gnubg] Benchmarks on server class machines and resulting change


From: Ingo Macherius
Subject: RE: [Bug-gnubg] Benchmarks on server class machines and resulting change requests
Date: Mon, 3 Aug 2009 23:06:02 +0200

Christian, I've conducted your suggested experiment (batch eval of saved 
matches) and can confirm your answer. Calibrate ist not a suitable metric to 
evaluate threading behaviour for gnubg.

The batch experiment did analyze five 7pt matches for 4 times each, with full 
cache cleaning. The time was taken with unix "time" command. The results are 
much more like what one would expect:
- Speed peaked wheen the number of threads equaled the number of cores
- Adding more threads than cores slowed down the evaluation (albeit, by only a 
tiny nit)
- Speed decrease increased in the number of threads

The odd finding is that there still are some anonalies, which are:
- Going from 1 to 2 threads more than doubles the evaluation
- It has very little effect adding more threads, i.e. the gain is not linear in 
# cores
- 2, 3 and 4 threads result in speeds very close to each other, much closer 
than expected

I've attached a ZIP which contains the original OpenOffice 3.1 spreadsheet and 
a PDF version of the graphs with the experiment details.

Thx a lot for your guidance!

Ingo

> -----Original Message-----
> From: Christian Anthon [mailto:address@hidden 
> Sent: Monday, August 03, 2009 12:29 PM
> To: Ingo Macherius
> Cc: address@hidden
> Subject: Re: [Bug-gnubg] Benchmarks on server class machines 
> and resulting change requests
> 
> 
> The calibrate function sucks bit time. The threaded calibrate 
> function sucks even more. I'm tempted to call it useless. I 
> believe that you are observing the following: There is some 
> overhead involved in displaying and updating the calibration, 
> and as you are increasing the number of threads more and more 
> time is allocated to evaluation and less and less to 
> overhead. If you really want to test the speed of the 
> threading then you should analyse a match or perform a rollout.
> 
> The original calibration was meant to calibrate certain 
> timing functions against the speed of your computer, so 
> overhead didn't really matter. That is the function measures 
> the speed of your computer, not the speed of gnubg.
> 
> Christian.
> 
> On Sun, Aug 2, 2009 at 5:06 PM, Ingo 
> Macherius<address@hidden> wrote:
> > I have benchmarked gnubg on two server machines, with 
> particular focus 
> > on multithreading. Both Machines are headless and run Debian 5.x 
> > Lenny, Kernel 2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. The 
> hardware is:
> >
> > box_A: 2xXeon 5130 @ 2GHz (4 physical cores in 2 chips)
> > box_B: 2xXeon Nocona @ 3GHz (2 physical cores plus 2 HT 
> "cores" in 2 
> > chips)
> >
> > I found two issues with current gnubg (latest CVS version 
> as of August 
> > 1st 2009, compiled with gcc 4.3.2.1 with -march=native and sse2 
> > support):
> >
> > 1) The "calibrate" command output is off by a factor of 1000, i.e. 
> > reports eval/s values 1000 times too high. This holds for 
> the figure 
> > reported in the official Debian binary installed via apt-get.
> >
> > 2) The limit of 16 threads is too low, I found that to 
> utilize the CPU 
> > power to 100% 8 threads per core are needed. Interestingly 
> this holds 
> > for the virtual HT cores as well.
> >
> > @1: Please check the timer code, the problem seems to be in 
> timer.c. 
> > Obviously the #ifdef part for Windows is fine, but all 
> other machines use a faulty version of the timer. I can't 
> really suggest a solution, but here is some background info 
> from wikipedia: http://en.wikipedia.org/wiki/Rdtsc
> > I would help to fix this one by testing on the 
> beforementioned machines under 64 bit Linux.
> >
> > @2: I've tested with a custom gnubg binary with the bug at @1 fixed 
> > the hard way by dividing by 1000 hardcodedly and thread 
> limit raised 
> > to 256. While calibrate was running I've monitored CPU utilization 
> > usiing the mpstat command.
> >
> > box_A peaks at about 202K eval/s with 8 threads per core 
> (32 total), 
> > from where on the number is static until it starts decreasing again 
> > when you use hundreds of threads. between 1 and 3 threads I see the 
> > expected gain of almost 100% per thread added. Using 4 threads is 
> > lowering the throughput as compared to 3 threads. Between 5 and 32 
> > threads I see rising throughput which first is linear, and becomes 
> > asymptotic as we get closer to 32 threads. Below 32 threads, mpstat 
> > reports significant idle times for each CPU, at 32 I see 
> each is using 
> > 100% of the cycles.
> >
> > A very similar behavior is visible on box_B, despite the 
> fact 2 of its 
> > "cores" are virtual HT cores.
> >
> > Extrapolating the results suggests gnubg should increase 
> the limit for 
> > the number of max. threads to 64, maybe even 128 or 256. Rationale: 
> > recent server hardware with dual quadcores has 8 cores, 
> which should 
> > be fully utilizeable only with 64 threads. The suggested 128 
> > anticipates future improvements. As there seems to be little to no 
> > cost with higher values for max. threads, this seems like a 
> cheap way 
> > to speed up gnubg on server class machines and quad cores 
> at little to 
> > no cost.
> >
> > Cheers,
> > Ingo
> >
> >
> >
> > _______________________________________________
> > Bug-gnubg mailing list
> > address@hidden http://lists.gnu.org/mailman/listinfo/bug-gnubg
> >
> 

Attachment: gnubg_calibrate_vs_time.zip
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]