Does hyper-threading actually help TG2 renders?

Started by coremelt, March 29, 2012, 12:05:43 PM

Previous topic - Next topic

coremelt

I'm bored while waiting for a render so I read about hyper-threading.  As I understand it the processor says to the OS it has two cores for every physical core and it has two pipelines for feeding instructions, but it can only execute one task at a time.  If your task is IO bound in some way waiting for RAM or disk this can help.

But as I understand it for pure 3D rendering like TG2 the process is CPU bound anyway.  So is there any actual benefit in render times?  Any one tested switching off hyper-threading on a CPU in bios and then comparing on the benchmark scene?

FrankB

yes, that was the first thing I tested when I bought one of the first i7 in late 2009.
Simply put: using HT makes TG2 renders finish faster.


jo

Hi,

Frank is right, on any machine newer than a P4 at least. I've attached a graph showing how render times decrease as more cores are added. This is on my machine which has 16 cores - 8 physical cores and 8 virtual cores. You can see that as it goes past 8 the curve flattens off, which is when virtual cores start to be used. This is because virtual cores have about 20-30% the performance of real cores. However there is still an appreciable reduction in render time. I always use 16 threads for rendering.

Regards,

Jo

Kadri


Jo , what do you think about rendering the same scene in two or more parts(or an animation) with more then one TG2 open if memory is no issue ?
Theoretic speaking it looks like it would render much faster that way (after 4 cores especially) ?

jo

Hi Kadri,

I have no particular opinions about that. I haven't tested it myself. I think Matt has looked into it. If I remember rightly with an animation he found that rendering two frames at once took slightly less time than the total time of two single frames, but that the time difference wasn't enough to outweigh the inconvenience of having to wait twice as long to see the resulting frames. To put it another way:

1 frame at a time takes 2 hours to render each frame
2 frames at once take 3 hours 55 minutes to render
There is 5 minutes difference in total (2 frames rendered individually takes 4 hours)

Please note I just made these figures up for the example, they don't necessarily reflect a real situation.

What I would say is that if you're going to be rendering an animation and you expect it to take a while to render then you try rendering one frame and then two frames at once and see what happens. If you think rendering two frames at once is worthwhile then go with that. You could use the results as part of the animation so you're not losing much time over it i.e. render frame 1 to see how long one frame takes and then render frames 2 and 3 together to see how long two frames take. Standard disclaimer that results may depend on your scene or different parts of your scene :-).

Regards,

Jo



PabloMack

Quote from: jo on March 29, 2012, 06:51:57 PMI think Matt has looked into it. If I remember rightly with an animation he found that rendering two frames at once took slightly less time than the total time of two single frames, but that the time difference wasn't enough to outweigh the inconvenience of having to wait twice as long to see the resulting frames.

I think I understand why this is and I have used multiple instances of TG to speed renders up in the past. Watching the screen and CPU usage as the rendering happens shows that the screen is divided into squares so that all of your cores that are enabled for rendering are handed their subsections of the total picture and start working on them at the same time. If a thread finishes its square ahead of the others then it waits until all threads have finished their squares. When the last thread finishes its square then they are all handed new squares to work on at the same time. As each thread finishes its piece of the picture, I see that the CPU usage goes down by one core's worth of total CPU utilization. It is this thread idling that makes TG2 not fully utilize all cores in a single instance of a TG2 program.

Now that I have TG3, I haven't observed this behavior and I suspect that thread dispatch has been improved by a finished thread leapfrogging ahead and taking the next square to work on but I can't say for sure if it is doing this. I need to do more testing.

jo

Hi,

AFAIK threads never waited for all current buckets to be finished before getting a new one. There is simply a list of buckets and when a thread finishes a bucket it gets a new one and starts rendering it straight away. What you will see is that towards the end of an image the thread count/CPU usage will reduce as buckets are finished but there are no more remaining in the list.

We have made improvements to the threading performance over time.

Regards,

Jo

PabloMack

#8
Quote from: jo on December 13, 2013, 03:28:34 PMAFAIK threads never waited for all current buckets to be finished before getting a new one. There is simply a list of buckets and when a thread finishes a bucket it gets a new one and starts rendering it straight away.

I think I am not remembering correctly. As I now recall, it is near the end of a frame where all threads finished with their buckets wait until the last one is done before the next frame can begin. Over the course of many frames in an animation, this idle thread phenomenon adds up and can account for the increased performance of using multiple instances of TG2 (with fewer threads per process) over one instance that uses all threads.

So these little square-shaped areas assigned one to each thread are called "buckets"? In Lightwave, one bucket seems to be one horzontal scan line.

jo

Hi,

Quote from: PabloMack on December 13, 2013, 03:40:18 PM
I think I am not remembering correctly. As I now recall, it is near the end of a frame where all threads finished with their buckets wait until the last one is done before the next frame can begin. Over the course of many frames in an animation, this idle thread phenomenon adds up and can account for the increased performance of using multiple instances of TG2 (with fewer threads per process) over one instance that uses all threads.

Yes, that can happen at the end of a render. The TG benchmark image was a good example of that, the render would happen very steadily until there was one bucket left that took ages. If the render is a frame of an animation it could mean there is some advantage to rendering multiple frames at once. However I wouldn't change my advice about this from that which I posted further up the thread.

Quote
So these little square-shaped areas assigned one to each thread are called "buckets"? 

Yes, those are the buckets. Any parameters you see referring to buckets have something to do with those squares. I was looking for a good explanation of bucket rendering online but I couldn't find one. I will have to write one up for the TG docs.

Quote
In Lightwave, one bucket seems to be one horzontal scan line.

I think the LW renderer is actually a scanline renderer, which is a different approach than using buckets.

Regards,

Jo