Rendering via CUDA

PorcupineFloyd · November 03, 2009, 06:19:27 PM

Errrrr I've just stumbled upon this: http://flam4.sourceforge.net/

This is basically an Apophysis compatible fractal frame renderer which utilizes CUDA to do computation.

I've downloaded it, opened a single .flame file, hit "Render!" and was completely shocked. It rendered at pretty nice quality with 5 - 7 FPS. And then I've found that it can also render flames to disk. So I've typed in resolution of 3600 x 2400 and quality of 2500 and hit render. After like 2 - 5 minutes it was rendered and I had to pick up my jaw from the floor.
Last week I was rendering the same flame in Apophysis and it took like 3 or 4 hours to complete.

Now imagine that Terragen could benefit from it. In one way or another. Maybe just parts of rendering if the whole thing cannot be done but you know... from 4 hours to 4 minutes - and it's just GTX 260. Why bother about i7 then?

rcallicotte · November 04, 2009, 01:32:51 PM

Can't wait to try it. Good find! Thanks.

penang · November 06, 2009, 06:36:03 AM

Quote from: PorcupineFloyd on November 03, 2009, 06:19:27 PM
Errrrr I've just stumbled upon this: http://flam4.sourceforge.net/

This is basically an Apophysis compatible fractal frame renderer which utilizes CUDA to do computation.

I've downloaded it, opened a single .flame file, hit "Render!" and was completely shocked. It rendered at pretty nice quality with 5 - 7 FPS. And then I've found that it can also render flames to disk. So I've typed in resolution of 3600 x 2400 and quality of 2500 and hit render. After like 2 - 5 minutes it was rendered and I had to pick up my jaw from the floor.
Last week I was rendering the same flame in Apophysis and it took like 3 or 4 hours to complete.

Now imagine that Terragen could benefit from it. In one way or another. Maybe just parts of rendering if the whole thing cannot be done but you know... from 4 hours to 4 minutes - and it's just GTX 260. Why bother about i7 then?

Yes, I am a user of Flam4 as well, and yes, it *is* that fast !!

Think of the cost of GPUs and then think of the cost of i7..... the example I outlined in my previous message (HD 4870's performance is 10X that of the best i7 in the market), I can only come to one conclusion -----

If the rendering function of Terragen can be offload to GPU (either Nvidia or ATI or both), then the performance of Terragen would have jumped by at least 10 X, and, most importantly, the MARKET for Terragen would have expanded as well !!!!

Imagine people doesn't have to pay for 10 very expensive i7 and can still enjoy that type of rendering speed... think of how many more people who are trying out Terragen would gladly PAY to get it !!

Ultimately it gonna be a win-win for both, the users and the owner of Terragen !

PG · November 06, 2009, 06:53:30 AM

Obviously you can imagine that Planetside don't want to get into this yet for very good reasons, they haven't perfected the program for CPU rendering yet so if they started on integrated GPU rendering now they'd end up making two programs simultaneously. I'm still going to advocate my distributed computing idea here. While it's uneconomical and inefficient for Planetside to start working with CUDA or Stream or indeed OpenCL now, there are those of us in the community who already have experience with it.

I'm still waiting for David from nVidia to come back with ideas on how this would best work with CUDA, I don't know anyone at ATI unfortunately, but from what he got from the dev team last time we spoke they think it should be about a five month job with a team of 2 or 3 depending on the implementation. I mentioned a batching idea to them using a similar system to BigBen's BBAST program and they reckoned about 3 months for that. Creating a project for BOINC would be another month or so.

PorcupineFloyd · November 06, 2009, 08:28:16 AM

Or maybe Terragen could use some kind of externall renderer, just like Maya and 3D Max are able to do. I've seen some projects (like "furry ball") that are in fact externall, GPU renderers for those platforms. This way it would be easier for anybody to program an external implementation of rendering engine for Terragen. Or perhaps Planetside could hire another coder just for that matter (for offloading some computation on GPU).
It could be also a matter of offloading some parts of rendering or workflow on GPU. Let it be preview, GI or simply a computation of populations.

Oshyan · November 06, 2009, 10:31:37 PM

While we would love to take advantage of this technology soon, as PG has said the reality is that it's not really mature yet. OpenCL is the best bet since it is not specific to any one company's GPU technology, but it is still not finalized, much less widely understood or deployed.

This is also an important and very telling quote "they think it should be about a five month job with a team of 2 or 3 depending on the implementation". Now if you estimate conservatively, which is always a good idea in software development, you would probably say 6 months with 3 people. We only have 2 developers *at all* currently, and even if we were able to add another one, that would mean stopping development on everything else to concentrate on this for 6 months. Is it worth it? Questionable. There are a lot of other features that could be added in that time.

The other thing to consider is that as time goes on and these systems and APIs mature, it will get easier and faster to develop for them. What may be a 6 month, 3 person job now, could become a 3 month 2 person job in a year. That seems like a much better use of time to me.

In the end we have to choose our development targets very carefully. We have limited resources and a lot of features to work on. The more successful we are (the more licenses we are able to sell), the more we can put back into financing faster and better development, and that's something we're committed to. So tell your friends to buy TG, or better yet buy it for them for Christmas.

- Oshyan

penang · November 06, 2009, 11:29:22 PM

Let me share a little bit of my programming experience:

There is always an endless list to do for any worthwhile project --- to streamline this, to speed up that, to add this feature, to fix that bug.... and so on.

It's a chore to keep up with all these, I tell you, but as someone who make a living doing programming, I learn to keep my head up by drawing up a plan.

You see, bug fixing is important, but I need to be able to sort out those bug reports and determine which bug must be fixed NOW, which bug can wait.

Same as features.

There are features that would be nice to have, but it wouldn't add much to the whole program. These I put them on the "to do" section of the list.

But then, there are things that would add A LOT to the program. These I put them on the "urgent" section of the list.

Speaking of Terragen, offloading the rendering part to GPU fits this description.

Nvidia doesn't have any GPU that can do double precision yet (maybe they will have one available by 2011) but ATI does, now.

ATI doesn't have CUDA, but it does have other tools available for programmers. Please refer to this page ( http://developer.amd.com/GPU/Pages/default.aspx )

There are SDKs available which cab help programmers in offloading their graphics onto ATI's GPUs.

Maybe the guys behind Terragen can take a look-see into what ATI has for offer, and maybe give it a spin.

My estimation on a speed up of 10 times via an ATI 4870 GPU, as versus that of the use of a 4-core i7 CPU, turns out to be a very conservative estimate.

A friend of mine who does video programming told me that offloading double-precision calculation to 4870 GPU he constantly gets over 40 times the speed on the best i7 on the market. He also told me that in some cases, he has managed to tune the application to get almost 60 times speed boost.

40 - 60 times !!! Can you imagine that?

But even if Terragen gets a 10 times speed boost, I would already be very happy !

And btw, I use 4870 as the example since it's already out in the market since July of last year and a lot of people are using it.

Do you know that the new 5870 (which is still in short supply right now) has TWICE the stream processors of 4870? Instead of 800 Stream Processor (4870), the 5870 packs with 1,600 Stream Processors !

Which means, by next year, people using 5870 will get an at least a 20 times speed boost (with a possibility of up to 120 times speed boost !!!) if Terragen can offload its rendering to the GPU.

Which means, something that normally takes four hours to render can be completed in just two minutes.

How many of you would pick your jaws from the floor, if that happens?

Quote from: Oshyan on November 06, 2009, 10:31:37 PM
While we would love to take advantage of this technology soon, as PG has said the reality is that it's not really mature yet. OpenCL is the best bet since it is not specific to any one company's GPU technology, but it is still not finalized, much less widely understood or deployed.

This is also an important and very telling quote "they think it should be about a five month job with a team of 2 or 3 depending on the implementation". Now if you estimate conservatively, which is always a good idea in software development, you would probably say 6 months with 3 people. We only have 2 developers *at all* currently, and even if we were able to add another one, that would mean stopping development on everything else to concentrate on this for 6 months. Is it worth it? Questionable. There are a lot of other features that could be added in that time.

The other thing to consider is that as time goes on and these systems and APIs mature, it will get easier and faster to develop for them. What may be a 6 month, 3 person job now, could become a 3 month 2 person job in a year. That seems like a much better use of time to me.

In the end we have to choose our development targets very carefully. We have limited resources and a lot of features to work on. The more successful we are (the more licenses we are able to sell), the more we can put back into financing faster and better development, and that's something we're committed to. So tell your friends to buy TG, or better yet buy it for them for Christmas.

- Oshyan

Oshyan · November 07, 2009, 12:25:41 AM

We've been doing this for a long time, hopefully we've learned how to prioritize development tasks by now.

I think your estimates of speedup are appealing, but really based on a lot of assumption. As yet no one has ported an existing production renderer to a GPU 1:1. There have been versions of CPU renderers converted to GPU renderers, with *similar* features, *similar* output, etc. (e.g. Vray RT and others), but as yet I haven't seen a successful CPU to GPU direct port where the features and output are identical, at least not for any major, production rendering system. There's a good reason for this - rendering systems are highly complex. GPUs are great at some of the tasks necessary to make a fast renderer, but others have to be adapted to work the way GPUs are most efficient.

There are also very important memory considerations and potential pitfalls, e.g. right now people run into memory issues with TG2 scenes on machines with 4+GB of RAM and 64 bit OSs. Now imagine trying to render on a graphics card with 1GB (the current max for any normal consumer-level card) of RAM. Of course there are various ways around this limitation, but it's just an example to show that it's not as simple as "just do it on the graphics card, it'll be faster".

The long and the short of it is that, while the potential speedups are exciting, a lot of research would need to be done just to see how feasible it would be and what kind of speedup might be possible in practice. Then there is the actual implementation time. We're definitely keeping an eye on these technologies and will take advantage of them if and when we can do so to greatest effect. For the time being we have a lot of headroom in multithreading efficiency, caching, new rendering methods for objects (coming in the next release), and other areas that will be far more widely supported and available sooner.

- Oshyan

Kadri · November 07, 2009, 04:55:27 AM

Oshyan , last night i was nearly posting here some things like you said (not that technical certainly).
I have a programmer friend and when programmers say 2 months it is most of the time 2 or 3 times the lenght what they say in the end

I know lightwave and follow the other 3d programs. This cuda openCl thing is in infancy right now. I am sure it will be seen in all of them in time.
But every team has its own shedule i am sure. Who doesn't want to be 10 times more faster rendering in his program? It would be a killer feature.
Don't get me wrong these are nice things to read here and everyone has the right to say something here about this, guys.
In the end what you want are good things for TG2.

Anyway...

But i have a question Osyhan. Maybe in the near time we can not see this in the rendering front in TG2. But what about the 3D preview?
Is there a chance that we could see this first there?. This would be a very good feature too. There hasn't to be everything perfect to be usefull...

Cheers.

Kadri.

PorcupineFloyd · November 07, 2009, 05:30:29 AM

It would be lovely to simply have a 3D preview which utilizes all cores and those darn populators that don't take hours to populate a bigger area of trees or grass. Maybe populators could easily use GPUs? It shouldn't be that hard to implement it (compared to whole rendering).

And to what Oshyan said - I'm really happy that I've bought a TG2 license. There is something special about software firms which are very small and make they products really valuable. You don't think long when you have to decide to spent the money or not.

PG · November 07, 2009, 04:44:27 PM

Quote from: penang on November 06, 2009, 11:29:22 PM
Nvidia doesn't have any GPU that can do double precision yet (maybe they will have one available by 2011) but ATI does, now.

The GTX 300 series has double precision and CUDA 3.0 will fully utilise this too. They also account for about 70% of users in the GPU market.

Oshyan · November 07, 2009, 07:26:16 PM

The current 3D preview uses a modified version of the normal rendering engine, so if were able to GPU-accelerate that, we ought to have GPU acceleration for the main renderer too. It's basically the same set of problems, and hence the same issues mentioned above. However I do think multithreading for the 3D preview (and populators) would help a lot. Currently TG2 scales well to 4 and in some cases 8 threads on appropriate hardware, and the 3D preview is single-threaded. So imagine it being 4 or almost 8 times faster.

- Oshyan

PorcupineFloyd · November 07, 2009, 08:24:49 PM

Yes, it's kinda straightforward that both the main renderer and preview one use the same engine, but making it use more than one thread would really make it usefull

Right now, on more complex projects I rather set up a render node with quality of 0.3 that works as a preview, instead of using preview window because of how slow it is.

Kadri · November 08, 2009, 02:40:31 AM

Thanks , Oshyan.
It seems the next 1-2 builds will be very effective

Kadri.

Oshyan · November 08, 2009, 04:36:09 PM

Just to clarify, the next build will not include a multithreaded preview. But it's something we do of course want to include in the future. But yes the new object rendering method coming in the next release will be a nice improvement.

- Oshyan