Rendering via CUDA

PorcupineFloyd · October 01, 2009, 07:25:47 AM

http://www.mentalimages.com/products/iray

Looks like it's going to happen really soon. I wonder what advantages will it bring in performance and if it'll be possible to implement CUDA based rendering in TG2.

PG · October 01, 2009, 08:33:30 AM

CUDA is/was already being used in a render farm program called BURP (Big and Ugly Rendering Project) on BOINC. They're a right pain on prioritising projects but anyone could send them a project to be rendered and it would be done through distributed computing.
It's not very well integrated with particular programs though unfortunately. This looks like it's actually a plugin which'd be really cool. And yeah it'd be amazing to run TG2 on your GPU but typically GPU rendering or GPGPU computing (or crunching) is massively intensive and it can destroy them pretty quickly.I've had to buy 2 new GPUs from running BOINC and Folding@home.

I'm still hopeful in my distributed rendering idea for TG2 that I posted a while ago. People seemed pretty interested in it and it would lessen the load on peoples GPUs if they only had to do a small section of a scene. I've been working with CUDA for the last few months and am in contact with a guy at nVidia who's helping me with it, so if Planetside are interested in helping me develop an application for BOINC (not coding, obviously they're way too busy for that) then we could have this.

Oh and those images. OMFG. And they probably rendered in 10 minutes on a GPU.

Edit: For those not familiar with just how powerful GPUs are, here's a comparison of the evolution of nVidia chips versus Intel CPUs.

Tangled-Universe · October 01, 2009, 11:37:42 AM

I always appreciate graphs explaining stuff but not when they lack legends for axes

I mean: there's virtually no difference on the left half of the X-axis and from then the difference becomes huge, but what are these conditions? Are those conditions only experimental and not representative for users at home etc. etc.?

PG · October 01, 2009, 11:49:41 AM

Yeah I just googled it

It x axis is just time. 2003 to 2008. So from the NV30 to the GT200 nVidia chips and the Intel Northwood up to the Harpertown.

Henry Blewer · October 01, 2009, 11:57:46 AM

I burned out my graphics card just a while ago. I asked too much of it. I do not think this is a good idea.

Cyber-Angel · October 02, 2009, 02:05:08 AM

I'd rather use my GPU then the current situation we have with CPU rendering which I'm sure is a good way too shorten the life of your CPU: those of us with GPU's capable of TG2 rendering and I speak for my self here. I think that right now TG2 uses whats called Software Rendering where the rendering is handed off too the software for rendering (Thus CPU intensive) Vs Hardware Rendering where the software hands off the rendering to hardware at render time which I believe is faster (Hardware permitting).

I mean form what I've read both Mentalray and Maya both have Hardware Rendering all be it on certified hardware so maybe that is what is going to have to happen in the future of Terragen, if Hardware Rendering is implemented some where down the road.

Regards to you.

Cyber-Angel

PG · October 02, 2009, 12:09:37 PM

Terragen should benefit massively from GPU rendering. Not just in terms of performance. CUDA allows you to arrange threads into blocks that use the same piece of shared memory and can collaborate on the kernel they're processing. This should mean that GI errors should be avoided and possibly missing polys in populations or displacements.

jo · October 09, 2009, 12:55:00 AM

Hi PG,

Quote from: PG on October 02, 2009, 12:09:37 PM
Terragen should benefit massively from GPU rendering. Not just in terms of performance. CUDA allows you to arrange threads into blocks that use the same piece of shared memory and can collaborate on the kernel they're processing. This should mean that GI errors should be avoided and possibly missing polys in populations or displacements.

I'm sorry, but that's just really wrong in so many ways :-). I was going to try and explain it but can't think where to start. I mean that in the best possible way, I'm not trying to be offensive.

Regards,

Jo

jo · October 09, 2009, 01:29:49 AM

Hi,

Quote from: Cyber-Angel on October 02, 2009, 02:05:08 AM
I'd rather use my GPU then the current situation we have with CPU rendering which I'm sure is a good way too shorten the life of your CPU: those of us with GPU's capable of TG2 rendering and I speak for my self here.

Are you worried the transistors will wear out ? ;-) If you're worried about overheating then you need a better CPU cooler I guess.

QuoteI think that right now TG2 uses whats called Software Rendering where the rendering is handed off too the software for rendering (Thus CPU intensive) Vs Hardware Rendering where the software hands off the rendering to hardware at render time which I believe is faster (Hardware permitting).

The difference is in hardware. A GPU is essentially a very specialised vector processor ( like the SSE unit on CPUs ) with lots of cores so it can do a lot of work in parallel. Their architecture is becoming more general and better support for floating point numbers but I think a lot of what is enabling GPGPU type stuff is actually software layers like CUDA in between the developer and the GPU which make it easier to program with it.

You still have to write your application to suit what GPUs are good at, which is processing lots of similar data in parallel.

I think there is still some settling down needed before TG2 could be made to use the GPU for final rendering. TG2 uses double precision floating point numbers extensively. GPGPU stuff seems to only just be settling down to fully supporting single precision floating point and double precision is still a bit rudimentary. There also needs to be a clear leader in an API which looks it will stick around and make it worth investing in. CUDA is still restricted to NVIDIA cards as far as I'm aware. I don't even own one ( which is accidental rather than deliberate ). I don't think CUDA is a good bet long term. I like the idea of OpenCL, which NVIDIA along with others do support, as a cross platform processor agnostic API. It will be interesting to see how it works out. One of the good things about it is that it also supports vector units on CPUs so if there were parts of TG2 which we could rewrite to work with OpenCL then that would also get us SSE unit support on the CPU, for example.

Regards,

Jo

PG · October 09, 2009, 04:43:40 AM

As far as I've been told by nVidia's rep, CUDA can be used with a driver API that allows you to execute a single function with different arguments on each thread. So whatever functions govern, well whatever, you execute that but change the input arguments for which bit needs to be rendered next. That's what he said anyway.
I'm still relatively new to CUDA, so I'll ask the rep if he thinks it's viable.

penang · November 02, 2009, 03:08:26 AM

Quote from: jo on October 09, 2009, 01:29:49 AMI think there is still some settling down needed before TG2 could be made to use the GPU for final rendering. TG2 uses double precision floating point numbers extensively. GPGPU stuff seems to only just be settling down to fully supporting single precision floating point and double precision is still a bit rudimentary. There also needs to be a clear leader in an API which looks it will stick around and make it worth investing in. CUDA is still restricted to NVIDIA cards as far as I'm aware.

To answer the double precision: ATI has a GPU that does double precision. http://ati.amd.com/products/streamprocessor/specs.html

Cards are already on the market, made by AMD itself. Very expensive for the moment, more than 900 dollars for a card with that GPU and 2GB of GDDR5.

To answer CUDA restricted in Nvidia cards: http://www.maximumpc.com/article/news/cuda_running_a_radeon

But there is an alternative, OpenCL, which AMD backs, as well as Intel, Nvidia, IBM, Samsung and many more.

So back to the question: Will we be able to offload Terragen's rendering to GPU if the GPU can do double precision?

PG · November 02, 2009, 06:13:54 AM

Well the Geforce GTX 2xx series can support double precision, with the 192 core acheiving 30FPU/s with 64bit units. My 216 core can do 37. Plus they're about £100

Still waiting for word from nVidia on the capability of CUDA for this kind of thing.

PorcupineFloyd · November 02, 2009, 06:47:26 AM

It's still a matter of coding / porting it so it'll work with CUDA or Ati's version. I really wonder what goods will OpenCL bring.

PG · November 02, 2009, 04:06:35 PM

I don't know a massive amount about OpenCL, does anyone know how it uses threads for GPGPU? CUDA up to 2.3 runs one function on a given number of threads and performs the operations on differing values for each core, with 3.0 it will allow multiple functions to be run. As far as I've read up, OpenCL and Stream are different but I couldn't find anything specific.

penang · November 03, 2009, 01:15:28 AM

Since we don't have the code base of TG 2 (almost) no one can know if it can be ported to CUDA / OpenCL.

However, the availability of GPU doing double precision FP means a lot.

Refer to this page --- http://en.wikipedia.org/wiki/AMD_FireStream#AMD_stream_processing_lineup

Take, for instant, the ATI 9270 (or HD 4870).

It has 800 stream cores, and it can do a maximum of 16,384 threads, all in parallel. Clocking at 750 MHz.

Compare this to the best Intel (desktop) CPUs of today. The i7 - 960, with 4 cores. Clocking at 3.46 GHz.

Do the math and you would come to the realization that, for real BANG for your bucks, GPUs really outshine CPUs.

Even if we half the performance of GPUs for performing double precision FP, and half the perfomance of GPUs again, just for fun, the net result is still mind boggling ---- The HD4870 is still capable of rendering pictures using double precision FP 10X the speed of the i7 - 960 !