Numerical Computing Performance of 3 Intel 8-core CPUs – i9 9900K vs i7 9800X vs Xeon 2145W



Intel makes a lot of different CPU’s! There are the very expensive multi-socket Xeon Scalable (Purley) processors, 58 of them! There are low power mobile and embedded processors and of course the single socket “Desktop PC”, “Enthusiast” and “Workstation” processors. In this post I’ll take a brief look at the numerical computing performance of three very capable 8-core processors. These CPU’s are in the “sweet-spot” with 8-cores and high core clock frequencies. All three are great CPU’s but there are some significant differences that can cause confusion.

i9 9900K or i9 9800X or Xeon 2145W — which processor is best for you? The answer is, as always, — it depends… Hopefully this post will help you decide which processor fits best with your dependencies.


The 8-core “sweet-spot”(?)

Why do I say 8-cores is the CPU “sweet-spot”?

  • 8-core systems in the form of dual socket 4-core CPU’s from both Intel and AMD were the the foundation of modern parallel computing. That was the standard scientific workstation configuration through most of the 2000’s. Dual 4-core system nodes in clusters were the base of distributed parallel super-computing.
  • There are a lot of applications that will scale in parallel efficiently on 8-cores. Writing parallel code can be very difficult and scaling can fall off rapidly after 4-8 processes. There are inherently parallel applications that will scale to 10’s of thousands of processor cores but a typical target for a programmer is to get good scaling with 4-8 cores on a single system. That, in its self, can be a remarkable achievement!
  • Modern 8-core processors as presented here offer very good performance for the cost.
  • 8-cores allows simultaneous application and job runs allowing efficient workflow and good hardware utilization.
  • A system with a good 8-core CPU makes a great platform for GPU accelerated computing!

For a very simple low cost workstation you could use a processor with fewer cores but these days I feel an 8-core is a good base-line for a compute oriented workstation.

It can certainly be advantageous to have more cores. If you have code that scales well in parallel or a heavy multi-tasking workflow an Intel X-series or Xeon-W 18-core processor offers excellent performance for a very reasonable cost. In fact the 18-core processors are so good that I generally don’t recommend dual Xeon workstations very often anymore.


Important differences between i9 9900K, i7 9800X, and 2145W Xeon

The following table list some of the specification differences between these processors relevant for consideration in a numerical computing workstation configuration.

Intel 8-Core i9 9900K, i7 9800X, Xeon 2145W Features

Features i9-9900K i7 9800X Xeon 2145W
Code Name Coffee Lake Skylake-X Skylake-W
Base Clock 3.6GHz 3.8GHz 3.7GHz
Max Turbo 5.0GHz 4.5GHz 4.5GHz
All Core 4.7GHz 4.1GHz 4.3GHz
Cache 16 MB 16.5 MB 11 MB
TDP 95 W 165 W 140 W
Max Mem 64 GB 128 GB 512 GB (Reg ECC)
Mem Channels 2 4 4
Max PCIe lanes 16 44 48
X16 GPU support 1 2 3 (4 w/PLX)
Vector Unit AVX2 AVX512 AVX512
Price $500 $600 $1113

The features that will have the biggest impact on compute performance are core clocks and AVX unit. The high clock speeds, fast memory, large cache and low power consumption of the Coffee-Lake processor is very compelling. However, the last item in the table above, AVX, can have a significant impact on numerical compute performance.

Important differences to note for a system specification are the maximum amount of memory that can be used and the number of PCIe lanes. The number of PCIe lanes is particularly important for GPU accelerated workstations since it is good (but not essential) to use X16 slots for multi-GPU configurations.

Note: 32GB non-Reg DDR4 memory modules are becoming available so it may be possible soon to have 128GB memory in a 9900K system and 256GB in a 9800X system.


Hardware under test:

I used open test-beds with the hardware but you can try different configurations using all of these components on our general “Custom Computers” page. (We do have more application oriented pages too so feel free to explore.)

  • Intel Core i9 9900K 3.6GHz 8-Core
    • Gigabyte Z390 Designare Motherboard (1 x X16 PCIe)
    • 64 GB DDR4-2666 Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti
  • Intel Core i7 9800X 3.8GHz 8-Core
    • Gigabyte X299 Designare Motherboard (2 x X16 PCIe)
    • 128GB DDR4-2666 Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti
  • Intel Xeon 2145W 3.7GHz 8-Core
    • Asus WS C422 SAGE/10G Motherboard (4 x X16 PCIe)
    • 256GB DDR4-2666 Reg ECC Memory
    • 1 TB Intel 660p M.2 SSD
    • NVIDIA RTX 2080Ti

Software:

I had the OS and applications installed on the Intel 660p M.2 drive and swapped it between the test systems.

I am running Linux for this testing but there is no reason to expect that the same types of workloads on Windows 10 would show any significant difference in performance.


Results

Linpack

An optimized Linpack benchmark can achieve near theoretical peak performance for double precision floating point on a CPU. It is the first benchmark I run on any new CPU’s. It is the benchmark (still) used to rank the Top500 supercomputers in the world. I feel it is the best performance indicator for numerical computation with maximally optimized software. I even went to the trouble to build an optimized Linpack for AMD Threadripper recently. The Intel optimized Linpack makes great use of the excellent MKL library. There are many programs that link to MKL for performance. This includes the very useful “numerical compute scripting” packages Anaconda Python and Mathworks MATLAB.

linpack chart

Clearly the AVX512 vector units in the 9800X and 2145W have a significant impact on Linpack performance. This is basic numerical linear algebra which is the core of a lot of compute intensive application.

Note: These jobs ran with 8 “real” threads since “Hyperthreads” are not useful for this calculation.

Note: These results are with a large problems size of 75000 simultaneous equations (a 75000 x 75000 “triangular solve”) and used approximately 44GB of systems memory.

NAMD

I also tested with the Molecular Dynamics package NAMD. NAMD scales really well across multiple cores and it is not specifically optimized for Intel hardware. It is highly optimized code and it uses the very interesting Charm++ for it’s parallel capabilities. NAMD is an important program and I like it for testing since it is a good example of well optimized code that scales to massive numbers of processes and also has very good GPU acceleration that needs to be balanced by good CPU performance.

NAMD CPU

For these job runs the high all-core-turbo clock of the 9900K has the advantage. The AVX512 vector units are not that important for this code that is designed to run well on a wide variety of hardware.

Note: These jobs ran with 16 threads since “Hyperthreads” help with the way NAMD uses threads. It is always worth experiment with Hyperthreads to see if they help or not.

Note: The performance units here are “days per nano-second” of simulation time. The 9900K would save 1 day out of a week long job run to get 1 nano-second of simulation time. Adding a GPU will dramatically increase the performance as will be seen in the next chart.

NAMD GPU

The first thing to notice is that the performance has increased by over a factor of 10 by including the NVIDIA RTX 2080Ti! There seems to be an advantage for the 9800X and 2145W when teh GPU is added to the system. I’m not sure exactly why that is. These CPU’s do have a lot more PCIe lanes than the 9900K but all 3 of these systems were running with 1 GPU in a full X16 slot.

Conclusions and Recommendations

All 3 of these CPU’s are great!

Given that my focus is high performance numerical computing I would probably not recommend the i9 9900K. It is a very good processor and the high core clocks will give many applications excellent performance. It is limited by not having the newest Intel core architecture. At it’s center it is basically a Haswell core (with lots of incremental tweaks). It is also very limited as a platform CPU for a GPU accelerated system since it only supplies 16 PCIe lanes.

The i7 9800X and Xeon 2145W share the same core architecture as the Intel Scalable (Purley) high-end Xeon CPU’s. There are 2 AVX512 vector units per core and the numerical compute performance is outstanding (For code that is optimized for it!). I like both of these processors a lot! The i7 9800X is part of the newly released “X-Series” processors. They offer tremendous performance value. The Xeon 2145W and in general the Xeon-W series are also offer great performance for the cost compared to the much more expensive Xeon Scalable Xeon (Skylake-SP). Both “X-series” and Xeon-W CPU’s are available in a variety of core counts up to 18-core. They are great alternatives for what in the past would have been a dual socket Workstation. The Xeon-W also has the advantage of being a Xeon processor i.e. it has more PCIe lanes and supports a larger memory footprint has ECC memory support, very high-end motherboards etc..

Note: Skylake is the newest “core” architecture for Intel. It is the basis for their high-end processors. There was a “Skylake” CPU that was based on the Haswell “core” in the desktop core-i7 line a few “generations” ago. Marketing! I believe we will see an new “core” architecture from Intel by the end of 2019 (hopefully along with a new PICe v4 capable chipset).

Here’s my recommendations for a CPU intended for the base of a numerical compute oriented Workstation.

  • For a system where cost is a significant concern I would certainly recommend the “X-series” CPU’s and the 8-core i7 9800X is easy to like. If you are working with code capable of GPU acceleration you can configure a system with 2 GPU’s at X16 and have what is probably the best high-end performance per dollar you can get.

  • For a more high-end Workstation capable of using 4 GPU’s for acceleration and capable of large memory configurations (the best overall platform configuration). The single socket Xeon-W CPU’s are the way to go. I recommend these CPU’s in a single socket configuration over a dual Xeon configuration for most applications since it avoids problems that can be caused by memory contention in multi-socket systems.

I hope this post has cleared up any confusion you may have had about these different CPU’s. If you still have question go ahead and ask in the comments!

Happy computing –dbk