Here at the Pixelary, we try to squeeze every bit of performance out of our hardware. And one thing we noticed is that when using CUDA GPU rendering, Blender’s Cycles renders significantly faster on Linux than on Windows. What gives?
The above test is run on a single Titan X (Maxwell) GPU with an AMD Ryzen 7 CPU. But it isn’t limited to Maxwell generation GPUs either. Here is the result for a 1080Ti (Pascal generation):
We get the same slow performance on Windows regardless of whether the GPU is used for display or not, nor does it matter if the we render from the command line.
Not willing to accept that Windows 10 is ‘simply slower’. We set out to find a solution.
Turns out, when doing GPU rendering on Windows 8 or above, any command that’s issued by Blender has to go through the WDDM, or Windows Display Driver Model. This driver layer is responsible for handling all the display devices, but it often adds a significant overhead to computing tasks. This model is a core component of Windows and cannot be disabled simply.
Luckily, the smart people at Nvidia already has a solution for it. To by pass the WDDM completely, we need to set the GPU as a “Tesla Compute Cluster”, or TCC for short. Once we enabled that, the GPU is no longer visible as a display device under Windows. But it’s still accessible by all CUDA apps. We than ran all the Blender benchmark again and here is the result:
With TCC enabled, Windows performance is exactly the same as Linux!
Now, here is the bad news. TCC is only available on Geforce Titan and Geforce Quadro line of GPUs, it is not available for Geforce GTX series. And it only works if you have another GPU to drive the display output (since TCC devices cannot be used to drive any display). But if you have to stick to a Windows environment and have a separate GPU, TCC might just be what you need to get that extra 30% performance back.
Rendering with AMD devices using OpenCL does not have this performance discrepancy.
So now we know WHY Windows is slower, we still would like to see ways to work around the WDDM limitation through more efficient kernels or reduced called to the WWDM. This will ensure that all Geforce users who cannot enabled TCC will still benefit from a speedy render.