Announcement

**reepca** · 02 February 2016, 12:16 AM

How far back can I hope to see HSA support for discrete GPUs? I assume at some point it becomes technically impractical for the architecture, but how far back is that?

**bridgman** · 02 February 2016, 12:28 AM

CI is the first generation with support for user queues, HW scheduling and AQL, but there's a limit on MEC microcode store size so at the moment we can't fit support for PM4 (what graphics uses), AQL (what HSA uses) and HW scheduling (what HSA also uses) in a single image. Kaveri has two MEC blocks so we were able to configure one for AQL+HWS and the other for AQL+PM4, but IIRC the dGPUs only have a single MEC block so that approach won't work there.

VI doesn't have those limits, and it also adds the ability to interrupt the execution of long-running shader threads and roll the context out to memory so you can context-switch even if individual shaders are running for seconds or minutes. We call that Compute Wave Save/Restore (CWSR) and the latest KFD release (part of the Boltzmann stack under GPUOpen) includes CWSR support for Carrizo.

I'm tempted to add support for Hawaii even if it means we don't have HW scheduling (although I need to check if PCIE atomics work on it) -- if so then Bonaire support would more-or-less come along for the ride -- but going back to SI gets hard because it doesn't have any of the uncore support required for HSA. Ask me again in a month if it makes sense to define an HSA subset that could run on SI, I should have an answer by then.

For hardware earlier than SI the ISA changes completely so probably doesn't make sense at all.

**bsp2020** · 02 February 2016, 02:20 AM

bridgman
I thought Carrizo has VI GPU. Did you mean Kaveri has 2 MEC blocks and has been configured for AQL+HWS and AQL+PM4?

**mannerov** · 02 February 2016, 02:44 AM

What about tonga ? It is VI afterall.

**bridgman** · 02 February 2016, 08:21 AM

Tonga paths are already in the code we released, but hardly any testing. Initial release is really focused on Fiji.

We did initial dGPU development on Tonga but switched over as Fiji's became available, which was pretty early in the process.

**mannerov** · 02 February 2016, 09:24 AM

All these things seem a very good start, but to me it is missing:

. Some performance figures. Are these hsa port of the tools better than opencl performance wise. How does it compare with Cuda. For example hccaffe says it has been ported to hcc with C++AMP: What are the gains compared to 16 core cpu with fiji for example on AlexNet ? And compared to cuda on some similar nvidia card ? If there is no figure, it gives the impression there is bad performance and you don't want to show it.
. Some information about the plans to mainline your branches into the master branches of these libs. When ? Are you going to even do that ? The problem with separate branches is that libs often change and have bug fixes, so they'll be soon outdated. At least we have some info for gcc and OpenMP support.
. Just Fiji supported ? Tonga should get support too, as well as older SI cards in some form (ok, perhaps they're no good for hsa, but they could do hcc. hccaffe shouldn't require hsa as it seems to do now).

I understand you don't want to spend time for older cards, but if some dev has got some old SI card and wants to try out if your libs work, if he isn't able to have it work for his card, he's not going to buy a new card just to test. And when times come to buy new cards to extend compute capabilities of their lab, they're going to buy Nvidia because their try with AMD didn't work.

**bridgman** · 02 February 2016, 11:00 PM

Originally posted by mannerov View Post

All these things seem a very good start, but to me it is missing:

. Some performance figures. Are these hsa port of the tools better than opencl performance wise. How does it compare with Cuda. For example hccaffe says it has been ported to hcc with C++AMP: What are the gains compared to 16 core cpu with fiji for example on AlexNet ? And compared to cuda on some similar nvidia card ? If there is no figure, it gives the impression there is bad performance and you don't want to show it.

It's just a "getting started" guide

Originally posted by mannerov View Post

. Some information about the plans to mainline your branches into the master branches of these libs. When ? Are you going to even do that ? The problem with separate branches is that libs often change and have bug fixes, so they'll be soon outdated. At least we have some info for gcc and OpenMP support.

Thought I had already talked about that both here and on IRC, but in case I didn't:

- current code is not upstreamable because it relies on hard-pinning from userspace (as do the other HPC APIs out there today)
- we wanted to get the functionality into devs hands early (it's a "developer preview") but are in the process of adding support for GPU access to unpinned memory
- updated RFC should go out in the next day or two describing the associated changes
- trying to get into 4.6, not sure we will be able to get it all in but will definitely start pushing dGPU support and Carrizo fixes into 4.6

Originally posted by mannerov View Post

. Just Fiji supported ? Tonga should get support too, as well as older SI cards in some form (ok, perhaps they're no good for hsa, but they could do hcc. hccaffe shouldn't require hsa as it seems to do now).

- Fiji is the real focus based on power efficiency, bandwidth and compute power, with Kaveri & Carrizo also supported -- Tonga has HW issues re: HSA support - we have workarounds in the code for some but don't know yet if we can work around all of them

- SI cards don't have any HSA features other than limited ability to access memory via ATC over PCIE bus - no user queues, no HW scheduler, nothing. The ability to support HSA features really starts with CI and goes up from there

Originally posted by mannerov View Post

I understand you don't want to spend time for older cards, but if some dev has got some old SI card and wants to try out if your libs work, if he isn't able to have it work for his card, he's not going to buy a new card just to test. And when times come to buy new cards to extend compute capabilities of their lab, they're going to buy Nvidia because their try with AMD didn't work.

- this is why we are being very clear about what is supported - we have Kaveri and Carrizo support today and are looking to see if we can add support for other hardware, but I really don't think SI hardware has much of a chance here. The SI generation is essentially NI with a GCN core; it wasn't until CI that we started adding HSA hardware support

**bridgman** · 02 February 2016, 11:00 PM

Response unapproved and bogus "next page" message again. Bleah !

... and "Edit" locks up forever which is maybe the most aggravating. The only way to save a work-in-process post without writing it in a different tool is to post the response then edit it... but when it gets moderated then you can't even edit and don't know when it is going to appear so you *can* finish working on it.

... and when you click on the "next page" link your unapproved message disappears along with the "Edit" button. Auggh !!!

**bridgman** · 03 February 2016, 12:54 AM

OK, can't wait any longer, will leave post as is and come back in the morning.

mannerov, re: publishing performance numbers AFAIK we weren't planning to do that until the 1.0 release - this is just the first of the developer previews we talked about at SC15.

Announcement

AMD's Guide To Using Boltzmann ROCK/ROCR & HCC On Linux

AMD's Guide To Using Boltzmann ROCK/ROCR & HCC On Linux

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment