Heterogeneous System Architecture (HSA)

Kadri · February 16, 2014, 04:53:31 PM

I wasn't aware of this.

Probably like Cuda(less so) and OpenCL(more so) this will take a long time too for all of us to see.

http://developer.amd.com/resources/heterogeneous-computing/what-is-heterogeneous-system-architecture-hsa/

Like many things on paper it looks nice

But it looks more reasonably-to me- instead some approaches that are using only the GPU.

PabloMack · February 16, 2014, 08:44:28 PM

I hope you don't mind if I chime in.

I just bought one of these AMD APUs. I have done a lot of research in this area as of late so I will share some of my thoughts with you. To build the new system, I started with a CoolerMaster Elite 130 enclosure. It is very inexpensive, very compact and very nice looking. It is also well-ventillated considering that these small enclosures usually house low power systems. I mounted an ASRock FM2A88X-ITX+ motherboard into it. I have never used a mini ITX form factor before so the experience was thrilling. They are tiny compared to ATX and the CPU dominates the board. The A10-7850K is currently AMD's flagship Kaveri APU and it comes with a comparatively modest heat sink/cooling fan. Since I am not over-clocking, I plan to use what came in the box. This little enclosure takes a standard ATX power supply but the little recess in the back of the enclosure tends to cover one edge of the PSU's air intake.

What makes HSA different from traditional systems is the integration of the graphics unit with a multi-core CPU on the same die. But Kaveri is different from all previous APUs in one important respect. The way systems have previously been designed is to have the GPU with its memory isolated from the CPU and its memory. The two send data back and forth over a high speed serial link. AMD's is called "Hyper Transport". But since, with Kaveri, the CPU and GPU are on the same die, they can actually share the same memory. But in code that talks with a GPU as with CUDA, the programmer has to set aside redundant buffers in both the CPU and GPU. And they still send data back and forth using more memory than is actually now needed. The main program has to send a lot of data over the link to the GPU and the same has to happen in the other direction. This does cut down on memory contention in traditional systems but there is a significant lag time for the communications to take place. If the communication lag time plus the GPU computaion takes longer than it would have taken for the CPU to do the work itself, then GPU code actually slows down program execution. So programmers who write applications that take advantage of the GPU must design their code wisely.

What AMD has done new with the Kaveri processors is that the CPU can actually pass addresses of where the data resides in shared memory to the GPU. The lag time for communications is virtually eliminated. To take advantage of this, AMD wrote a new API called Mantle that can run 50% to 300% faster than DirectX. As I understand it, the popular video game called "Battlefield 4" takes advantage of this new API and runs very fast on a Kaveri processor. I got a free license for this game when I bought the A10-7850K. The gotcha on shared memory is that you must have very fast memory in your system because both the CPU and GPU contend for the same data paths to main memory since they share their address spaces. One very exciting thing about Kaveri is that it supports 4K video. I plan to move my installation of Hitfilm 2 Ultimate over to this new computer so that I can do true 4K editing.

I have another reason for wanting HSA. Since 1988 I have been designing a parallel programming language. In 2009 I started to write a code generator that supports the x86 32-bit architecture. I plan to migrate to 64-bits after the backend that I use supports it. It is also undergoing changes to support the ARM-7 architecture. I was very excited to find out that AMD will be supporting HSA programming with the use of what they call HSAIL (HSA Intermediate Language). This is sort of an assembly language for programming the GPU, as I understand it. It keeps the hardware somewhat abstract so that future implementations will be hidden from the programmer. Once I modify my compiler to generate HSAIL code and implement more parallel programming features, this new language will be able to do parallel operations implicitly; something that neither CUDA or OpenCL can do. They are very explicit non-parallel programming languages.

For those of you who are interested in programming languages, I have prepared a number of training videos on the one I have been developing. I have posted them on YouTube:
https://www.youtube.com/watch?v=BK5HJtWLZrI
https://www.youtube.com/watch?v=JCQtcaFwjJM
https://www.youtube.com/watch?v=8-IbcNSmwQ4
https://www.youtube.com/watch?v=ldjE1m2OtAM
https://www.youtube.com/watch?v=X1-xk0dh0FI
https://www.youtube.com/watch?v=TNuxAI6MvHg
https://www.youtube.com/watch?v=wwuD79eX7p8
https://www.youtube.com/watch?v=Nsr4DZ8Ogmo
https://www.youtube.com/watch?v=QEOIelAsktI

Also some tutorials:
https://www.youtube.com/watch?v=XBHc4SOL-Ms
https://www.youtube.com/watch?v=Gf0gCYkrvfI
https://www.youtube.com/watch?v=hj2XUn3VQvo
https://www.youtube.com/watch?v=Dlw6h05OjCM
https://www.youtube.com/watch?v=9sbRsOHGPw0
https://www.youtube.com/watch?v=y_QM-t6uGUQ
https://www.youtube.com/watch?v=a_CoFjPescw

Kadri · February 17, 2014, 12:12:50 PM

Thanks Pablo.
I did not liked Cuda because it was only for Nvidia hardware.
Do you know how HSA is from that aspect. Does it have any chance ?

Heterogeneous System Architecture (HSA)

Kadri

PabloMack

Kadri