NUMA confusion

Started by Tangled-Universe, November 07, 2019, 03:04:06 PM

Previous topic - Next topic

Tangled-Universe

Hi All,

A few hours ago AMD announced its new line of threadripper CPU's, among one is a 24 core beast at 3.8(!) Ghz per core.
My current machine -pretty expensive back then- is almost 8 years old and the investment has been totally worth it.
I'm pretty much decided on getting this new 24-core CPU, allowing me to render at 16 threads and use the remaining 8 threads for creating other scenes or other 3D software stuff.

However, I feel I'm getting confused by some NUMA related discussions.
Historically, I think, NUMA had to do with multi-CPU machines, meaning more than one CPU socket.
AMD is now basically welding multiple CPU's into one die and renaming each individual CPU/CCD as 'chiplet' and here and there I read about NUMA related issues with renderers, especially with the Ryzen 2970X or 2990WX performing no better than a its 16-core little sister 2950X.

Reading the NUMA issue report in the bug tracker seems to point out that TG currently does not support multi-socket machines very well, but I'm concerned about whether if TG can fully utilize the potential of this upcoming CPU?

Cheers,
Martin

penboack

Have you read this?

https://www.anandtech.com/show/15062/amds-2019-fall-update/3

I built a TR-2950X workstation back in May and I am very pleased with it, perhaps more so now that I see the prices for TR3 not to mention the TDP which will demand some serious cooling :o .

I was surprised at the pricing, it will be interesting to see the price performance ratio versus the Intel CPUs.
Intel has AVX512, TR3 has AVX256, I am not sure what this will mean for render performance going forwards as I think the Terragen Renderer uses Intel Embree.

WAS

I am not positive this is an issue for you with Terragen with 24 core / 48 threads but I may be wrong.

penboack

I don't think that the TR3 design is a NUMA design, the chipsets on a die arrangement is the same as Ryder 3000 and EPYC.


Oshyan

NUMA is a bit complicated and I am fairly sure I do not fully understand it. With that caveat out of the way, I'll say a few things. :D

First, NUMA is not *inherent* to multi-processor systems, as far as I understand. So Terragen for example works very well on older, non-NUMA Xeon dual CPU machines and the like. The main issues have come in more recently with *single CPUs* that operate under NUMA, which includes (as best I recall) both some newer high core Xeons and some of the Threadrippers.

The other thing is that I'm not clear on exactly where fault/responsibility lies in taking advantage of NUMA architecture. There were/are needed *OS-level* changes to get best performance out of e.g. Threadripper in its NUMA mode, but there also is some need for application support. I believe, though, that if the OS presented NUMA core sets as a single pool, then Terragen would work with them as normal (subject to the same scaling issues we're already aware of, of course). I'm not certain why this isn't done except that it seems perhaps to not be possible or not as efficient (since NUMA itself is a way to hopefully improve efficiency with non-uniform memory access).

So it seems we do need to add NUMA support at some point, but I'm not sure what is involved in that.

I am also unfortunately not totally *certain* whether the 24 core Zen 2 Threadripper uses NUMA. But I *think* not, in fact I think none of the Zen 2 chips do. There's a detailed discussion of this here: https://news.ycombinator.com/item?id=21029031

And one particularly interesting comment there re: your concerns and the history of this is that the 2990WX had, according to one poster, a particularly challenging implementation of NUMA which would explain some of the problems many applications had in taking full advantage of it. From what I can tell Zen 2 is much improved in all of this so it's very well possible that TG will do just fine without any modifications on Zen 2. 

Also here: https://www.servethehome.com/amd-epyc-7002-series-rome-delivers-a-knockout/4/
"The AMD EPYC 7002 series is presented, by default, as a single NUMA node per socket, down from four in the previous generation."

We'll have to wait for actual benchmarks to really know all of this for sure, of course. Which presents a challenge if you're wanting to buy a system ASAP. But personally I think it is always best to wait for others to "beta test" any new hardware (or software) release. ;)

- Oshyan

WAS

#6
I also wanted to mention one more thing. AMD TDP is a system-wide formula. It's not just the dye producing the heat. It comps for ambient case temp, and GPU. AMD is always classically looked at as hot, but it's simply not the case. It's just misunderstood as it's not the same formula as Intel. Additionally, the actual thermal design limits are much different. For example My TDP is 65 (much lower than a TR), but my thermals max time is 95c - unrelated to what the TDP would suggest you with the system formula -- and actually the same as Intels max thermal limit on it's equivalent i5-5800k.

Also, in all the years of rendering I've been doing, I've only ever had Intel CPUs die on me. I got a old FX with a dead core, and it still posts and works. Lol

Tangled-Universe

Thank you guys!

WAS, that last link on processor groups you posted seems pretty clear on the topic.
Over 64 cores = splitting cores in groups = potential issues.
A 24 core threadripper does not meet this criterium.

Oshyan, great to hear I'm not the only one confused by the topic! Seems there's a bit of unnecessary fuzz about it, besides the justified problematic examples you described.
Thanks for taking the time to look up information/discussion regarding this confusing topic.

I feel much better about this now, though frankly my enthusiasm is hard to temper anyway :P
I need a new machine and the Cinebench 20 scores for this model are at around 13000 points, while my current CPU is 1300(!) points.
Can't wait to upgrade and realize the things in my mind's eye with TG.
It would also be interesting to compare rendering performance between Linux and Windows, since the differences between those 2 OS'es seem to become more emphasized when core counts and memory usage both increase.
Also I have some plans for vdb export stuff, which adds another reason to configure my new build with dual OS.

I feel it's a pretty safe bet to go for this CPU.