Machine Learning / CUDA Performance - Has the 385.12 TITAN Xp Driver Made Improvements?

Hi Folks

I’m planning to build a new desktop in a few months time primarily to further my studies in machine learning. My current plan is to use one or two 1080 Ti’s as some machine learning focused benchmarks I managed to find have shown them to have similar performance to the Titan Xp with much better value, a good fit for my build performance / price goals.

However, the Titan Xp recently gained significant performance increases with the optimisations provided in the 385.12 driver update for creative design applications such as 3D modelling and simulation. There are some benchmarks published here.

So now I’m wondering, have these optimisations significantly increased the computational (CUDA) performance of the Titan Xp in general? Or are they application specific? and would there be any increased performance for machine learning applications (cuDNN)?

Many thanks

Unlikely. The performance gains reported appear to be for graphics applications, not compute applications, plus they seem to specific to the Titan Xp. The linked article notes (emphasis mine):

There are various vendor-overclocked models of the GTX 1080 Ti available and some may be faster than the Titan Xp on specific workloads. The only way to know for sure is to try both GPUs with the specific workloads you are interested in. Keep in mind that vendor-overclocked models may or may not have been binned based on compute applications.

That would make sense that the optimisations are for graphics rather than compute. I think my plan of two 1080 Ti’s are still the better value choice in my case.

I will bear in mind the possibility of binned parts, I’m suppose some benchmarks and comparison will give me a good idea if this is the case.

Thanks for the info!

I think the emphasis in the last sentence of njuffa’s post was more on the may not part.

I personally would never use an overclocked card for CUDA work, even if factory overclocked and only for private leisure. The risk of getting stuck on chasing some kind of Heisenbug to me is just not worth the few percent improvement.

I have no insights into how vendors qualify their overclocked GPUs. The few times I have seen particular apps being mentioned in that context, they were games.

The risk of an occasional incorrect computation result is likely larger in compute-centric apps than in graphic-centric apps where many results are used transiently (one frame). However, there are compute applications that are either more tolerant of numerical errors (think Monte Carlo), or self-correcting (e.g. iterative methods controlled by residual computations). The possible performance improvements from aggressively overclocked GPUs can be as much as 20% (at which point they exploit all of the typical engineering margin built into electronic products), not a trivial amount.

It is up to the individual GPU user to determine the risk / reward ratio that is best for them. There is no one right answer to the question of vendor-overclocked parts. Personally, I tend to err on the conservative side these days, after being an avid overclocker in my student days (and experiencing some weird bugs caused by overclocking, e.g. the occasional incorrect square root result from a math coprocessor overclocked by about 20%).

I’m with @tera on this one.

But I suppose if I had chosen to make my living based on Ethereum mining, I might feel differently.

Thanks for the further insight guys. This will be my entry to accelerated computation so I’m sure there’s a steep learning curve to climb.

From what I understand, vendor overclocked parts are sold under the premise that they are stable at the advertised clock speed as a minimum. Although I do understand that their tests can be limited and focused more on graphical operations, as the target customers are gamers, and might not be a guarantee for lengthy computation jobs.

Many of the vendors provide overclocking utilities to enable you to try and push the card even further. These utilities (at least for the vendor i’m looking to buy from) also let you underclock the cards, typically for reduced heat output, so I’ll be able to drop the clock back down to stock speeds for critical calculations.

As @njuffa mentions, it’s a risk / reward trade-off that I feel can only be optimised through experimentation. As my work will really only be focused on furthering my studies, for the foreseeable future, the occasional strange behaviour will surely aid in my learning :)

Thanks again!