POV-ray on Quad Xeon and Opteron

POV-ray

POV-ray is an open source ray tracing package that has been around for what seems like forever. Actually since the late 1980’s! It has been a favorite system performance testing package since it’s inception because of the heavy load it places on the CPU. It has had an SMP parallel implementation since the mid 2000’s and is often used as a multi-core CPU parallel performance benchmark on both Linux and Windows.

So lets try it on our Quad socket many-core systems!

Test Systems

  • Puget Systems Peak Quad Xeon:

    • 4 x Intel Xeon E5-4624L v2 @1.9GHz 10-core
    • 64GB DDR3 1600 Reg ECC
  • Puget Systems Peak Quad Opteron:

    • 4 x AMD Opteron 6344 @2.6GHz 12-core
    • 64GB DDR3 1600 Reg ECC

Test OS’s

  • Windows Server 2008 R2
  • Linux CentOS 6.5

Windows install

Installing on Windows is straightforward. Just download the install setup file from http://www.povray.org/download/ and run it. It will install a graphical interface that includes controls for setting the number of system CPU cores to use and has a menu item for running the standard benchmark.

Linux install

I could not find a repository with a povray rpm for CentOS so I built it from source. The following is the basic outline for the build (with # comments …)

# use git to grab the source tree off of github
git clone https://github.com/POV-Ray/povray.git

# go to the unix directory and run the pre configure script
cd povray/unix
./prebuild.sh

# cd back to the main directory and run configure to set the 
# build environment and create the makefile
cd ../
# it complains if you don’t add a “COMPILED_BY”
./configure COMPILED_BY="dbk pugetsystems.com”

# Hey we have 40 cores we may as well try to use half of them
# for the build! ( -j 20 )
make -j 20

# su to root and do a make install
su 
make install

# done!

Results

One of the most interesting things about testing with the many-core quads is seeing how multi-threaded applications scale with thread count. We can fit that data to an Amdahl's Law curve.

Amdahl’s Law

Amdahl’s Law basically says that speedup of a parallel code is limited by the sequential fraction. The following formula gives us curve to fit our scaling and perfromance data to.

S(n) = T(1)/T(n) = 1/( ( 1-P ) + P/n )

Notes: The most glaring result on this plot is how bad the scaling is on Windows with Hyper-Threading on! The scaling with HT off is much better with Windows. For Linux HT made no difference (the actual Linux data points are with HT on) and the scaling is somewhat better than Windows.

Notes: On the Opteron system we see both Linux and Windows scale about the same with Windows doing slightly better.

It’s common to see PPS (pixels per second) numbers reported for POV-ray as a performance measure so we show that in the next two plots.

Notes: On the performance plots Linux clearly does much better that Windows and again Hyper-Threading clobbers the Windows performance (it made no difference with Linux).

Notes: The Opteron also shows a significant advantage going to Linux.

Hyper-Threading

The effect of hyper-threading turned out to be one of the most interesting aspects of this testing. ( I did a separate post about that Hyper-Threading may be Killing your Parallel Performance )

Linux was indifferent to hyper-threading up to the 40-physical cores, i.e. the results were the same with hyper-threading on or off. However, there was a nearly 20% improvement over the 40 core result by going to 80 cores with hyper-threading on.

For Windows, hyper-threading was just bad! ( as is clearly seen in the scaling and performance plots )

The following table shows hyper-threading effects for 40, 60, and 80 cores on the quad Xeon test system (40 physical cores).

Selected data points showing effect of Hyper-Threading

 
Threads Linux HT off Linux HT on Windows HT off Windows HT on
40 40 sec 40 sec 53 sec 79 sec
60 39 sec 35 sec 50 sec 78 sec
80 39 sec 32 sec 50 sec 78 sec

Caveats!

  • I expected run times for Linux and Windows to be nearly the same. Some of the difference ( with hyper-threading off ) may be from compiling the Linux version from source. However, I just did a standard build without any optimization tweaks to “configure”.
  • The Windows version ( Server 2008 R2 ) is a bit dated and something like Server 2012 R2 may give different/better(?) results.
  • You have to test your demanding application with hyper-threading on and off! In the past I would by default turn hyper-threading off on server/HPC type of hardware and leave it on for “desktop” hardware. From now on I’m actually going to test!

Happy computing! –dbk