NUMA is a bit complicated and I am fairly sure I do not fully understand it. With that caveat out of the way, I'll say a few things.
First, NUMA is not *inherent* to multi-processor systems, as far as I understand. So Terragen for example works very well on older, non-NUMA Xeon dual CPU machines and the like. The main issues have come in more recently with *single CPUs* that operate under NUMA, which includes (as best I recall) both some newer high core Xeons and some of the Threadrippers.
The other thing is that I'm not clear on exactly where fault/responsibility lies in taking advantage of NUMA architecture. There were/are needed *OS-level* changes to get best performance out of e.g. Threadripper in its NUMA mode, but there also is some need for application support. I believe, though, that if the OS presented NUMA core sets as a single pool, then Terragen would work with them as normal (subject to the same scaling issues we're already aware of, of course). I'm not certain why this isn't done except that it seems perhaps to not be possible or not as efficient (since NUMA itself is a way to hopefully improve efficiency with non-uniform memory access).
So it seems we do need to add NUMA support at some point, but I'm not sure what is involved in that.
I am also unfortunately not totally *certain* whether the 24 core Zen 2 Threadripper uses NUMA. But I *think* not, in fact I think none of the Zen 2 chips do. There's a detailed discussion of this here:
https://news.ycombinator.com/item?id=21029031And one particularly interesting comment there re: your concerns and the history of this is that the 2990WX had, according to one poster, a particularly challenging implementation of NUMA which would explain some of the problems many applications had in taking full advantage of it. From what I can tell Zen 2 is much improved in all of this so it's very well possible that TG will do just fine without any modifications on Zen 2.
Also here:
https://www.servethehome.com/amd-epyc-7002-series-rome-delivers-a-knockout/4/"The AMD EPYC 7002 series is presented, by default, as a single NUMA node per socket, down from four in the previous generation."
We'll have to wait for actual benchmarks to really know all of this for sure, of course. Which presents a challenge if you're wanting to buy a system ASAP. But personally I think it is always best to wait for others to "beta test" any new hardware (or software) release.
- Oshyan