bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Benchmarks on server class machines and resulting change req


From: Ingo Macherius
Subject: [Bug-gnubg] Benchmarks on server class machines and resulting change requests
Date: Sun, 2 Aug 2009 17:06:54 +0200

I have benchmarked gnubg on two server machines, with particular focus on 
multithreading. Both Machines are headless and run Debian 5.x Lenny, Kernel 
2.6.26-2-amd64 #1 SMP x86_64 GNU/Linux. The hardware is:

box_A: 2xXeon 5130 @ 2GHz (4 physical cores in 2 chips)
box_B: 2xXeon Nocona @ 3GHz (2 physical cores plus 2 HT "cores" in 2 chips)

I found two issues with current gnubg (latest CVS version as of August 1st 
2009, compiled with gcc 4.3.2.1 with -march=native and sse2 support):

1) The "calibrate" command output is off by a factor of 1000, i.e. reports 
eval/s values 1000 times too high. This holds for the figure reported in the 
official Debian binary installed via apt-get.

2) The limit of 16 threads is too low, I found that to utilize the CPU power to 
100% 8 threads per core are needed. Interestingly this holds for the virtual HT 
cores as well.

@1: Please check the timer code, the problem seems to be in timer.c. Obviously 
the #ifdef part for Windows is fine, but all other machines use a faulty 
version of the timer. I can't really suggest a solution, but here is some 
background info from wikipedia: http://en.wikipedia.org/wiki/Rdtsc
I would help to fix this one by testing on the beforementioned machines under 
64 bit Linux.

@2: I've tested with a custom gnubg binary with the bug at @1 fixed the hard 
way by dividing by 1000 hardcodedly and thread limit raised to 256. While 
calibrate was running I've monitored CPU utilization usiing the mpstat command. 

box_A peaks at about 202K eval/s with 8 threads per core (32 total), from where 
on the number is static until it starts decreasing again when you use hundreds 
of threads. between 1 and 3 threads I see the expected gain of almost 100% per 
thread added. Using 4 threads is lowering the throughput as compared to 3 
threads. Between 5 and 32 threads I see rising throughput which first is 
linear, and becomes asymptotic as we get closer to 32 threads. Below 32 
threads, mpstat reports significant idle times for each CPU, at 32 I see each 
is using 100% of the cycles.

A very similar behavior is visible on box_B, despite the fact 2 of its "cores" 
are virtual HT cores.

Extrapolating the results suggests gnubg should increase the limit for the 
number of max. threads to 64, maybe even 128 or 256. Rationale: recent server 
hardware with dual quadcores has 8 cores, which should be fully utilizeable 
only with 64 threads. The suggested 128 anticipates future improvements. As 
there seems to be little to no cost with higher values for max. threads, this 
seems like a cheap way to speed up gnubg on server class machines and quad 
cores at little to no cost.

Cheers,
Ingo





reply via email to

[Prev in Thread] Current Thread [Next in Thread]