Quote from: Matt on September 05, 2019, 04:09:18 PMI like the updated curve - it is a much better fit for the data. The comparison with the green theoretical line really says a lot about the performance falloff.
Thanks
The exponential fit is actually only ~2%-point better than the previous one, but at least we're 100% sure now that it is constructed using correct data!
I do not mean to demotivate you by showing or discussing the falloff! I discuss it from the point of view of the graph.
Neither do I mean to demotivate by discussing in length what the graph tells us.
It's just that I took a lot of time and text to convince Oshyan that my initial interpretation and statement was not pulled out of thin air.
Perhaps my initial statement that 32 cores are not justifyable seemed bold and premature, but this data supports it and there's nothing I can do to change that. I'm sorry.
About extrapolation, if I still may:
I exported the non-linear regression curve to .csv and imported it into this online tool:
http://www.xuru.org/rt/NLR.aspThen I limited the parameters to 2.
The best fit it returned from the curve was this formula: y=7.326788565 x / (x + 24.53756167)
If we back-fit our own data we get this:
4 threads = 1,027 => measured = 1
8 threads = 1,801 => measured = 1.803
12 threads = 2,405 => measured = 2,448
16 threads = 2,892 => measured = 2,854
The back fit is OK, but not utterly fantastic, as it under-estimates 12 threads and over-estimates 16 threads.
The 16 threads point has the highest variation, so any extrapolation from this wobbly point will have a relatively large uncertainty range.
Knowing this and proceeding into the unknown:
24 threads = 3.623
32 threads = 4.147
Given the original curve (pity that my software does not give me the fitting formula and that we need this online tool!) this seems a little bit over-estimated.
If I extend the curve with the mind's eye I cannot imagine it exceeding a score of 4 at 32 threads.
Anyway... let's be gentle this time and say that 32 threads scores 4.147 out of 8.
It's up to each person to decide whether such an efficiency is worth it and perhaps I should have refrained from stating it's not justifyable to buy a 32-core, but given the results I think it was not completely unfair in doing so. You buy 32, but get the performance of 16, that's what the data tells us thus far. In the end that's why I wanted to do this test.
Like I said myself I still may end up with a 24 core machine, if such a CPU will be released soon. Or perhaps even 32-cores, but then I know at least what to expect and I can think what else to do with the excess cores. Running 2 instances of TG, like Richard does is not a bad idea at all!
In some way I regret this test as I feel a bit like I'm bringing bad or unwanted news here, while I have different intentions.
I just want to clarify and emphasize that my intention with this topic is gathering data to make an informed buying decision.
Not to criticize the software or open up a can of worms about its performance.
And definitely not to have its creator feel bad about!
I genuinely think it's not that much better with other renderers. Just check the Vray benchmark database and perhaps more.
Multi-threading is challenging, just look at genomic assembly for instance, that's by itself not hugely complex (churning and chopping A,C,T and G's in little bits and reference them to a reference static genome) and yet multi-threading that is far from perfect. Let alone a renderer which deals with so many dependencies in scene space and then chops that up in buckets which needs low-level scene space derived data, but also high-level neighbouring bucket data for its own calculations.
It's vastly complicated and definitely a testament to Matt's achievement so far I think.
Cheers to that!