AMD's Background On The ROCm OpenCL Stack

Written by Michael Larabel in AMD on 5 July 2017 at 06:32 AM EDT. 27 Comments

A ROCm (Radeon Open eCosystem) developer at AMD has shared some of their background work on their OpenCL compiler stack, including the LLVM focus, as well as some of their current performance focuses for this open-source compute offering.

Gregory Stoner has written in a GitHub comment about their past and ongoing work around ROCm OpenCL performance. A Phoronix reader pointed out to me this comment. It's interesting so I've copied it below. His comments came up in a thread about why the ROCm performance is currently slow for Ethereum mining.

Now the AMDGPUpro driver for Vega10 supports the new lighting compiler and ROCm stack as well

When we started the ROCm project, we made a decision to build out fully open source solution, which meant we need to move away from the traditional Shader Compiler used in our graphics stack since it was staying proprietary. The traditional flow was two-stage compiler; we would compile the code to an intermediate language, HSAIL, then it would be picked up finalized and compiled by our shader compiler. This same backend used by Graphics shaders.

This journey started in earnest a little over a year ago to look the best way forward to fully open source compiler. We began with the LLVM R600 codebase which needed a bit of work to get to be production class compiler. But it was the right foundation to meet our goal of a fully open stack,

With this transition, we know we will have performance gaps, which we are working to close. What we need help with from the community is assist us in testing a broader set of applications and reporting the and do some analysis potentially why. One thing we have seen as well sometimes you need to code differently for LLVM compiler then the SC based compiler to get the best performance out if it.

We are now active in the LLVM community, pushing upgrades to the code base to better enable GPU computing. Also, changes are also up-streamed into LLVM repository.

Note one significant changes the compiler now generate GCN ISA binary object directly. With this change, it makes it easier for the compiler supports Inline ASM support for all of our languages ( OpenCL, HCC, HIP) and also native assembler and disassembler support. It is also a critical foundation for our math library and MiOpen projects.

For the last year, we have spent more time focusing on FIJI and Vega10 with Deep Learning Frameworks, MIOpen, and GEMM solvers. We also have been filling in the gaps in LLVM for the optimization we need for GPU Computing, also improving the scheduler, register allocator, loop optimizer and lot more. It is a bit of work as you can imagine. But we already saw where the effort been worth it since it faster on a number of the codes.

We test thing like follow on the compiler

Benchmarks: Bablestream, SHOC, Mixbench, Lattice, ViennaCL, COMD, Lulesh, xsbench. Rodina, DeepBench
Libraries: clFFT, rocBLAS, rocFFT, MIOpen
Application:
OpenCL: Torch-CL, Gromacs;
HIP: Caffe Torch, Tensorflow,
HCC: NAMD
Internal test we built up for performance for OpenCL
Conformance tests for
OpenCL 1.2 and 2.0 Conformance tests
HCC conformance test
Not above is small sample of what we run on the compiler. We do A/B compares

New test recently added: Radeon Rays, SideFX Houdini Test, Blender, Radeon ProRender,
In the process of adding a number of currency mining apps

On Ray Tracer we are just starting our performance analysis and optimization that more specific to this class of work, What you see over the summer is we will be focusing on optimization for the compiler for currency mining and raytracing. I just have to stage this work in with the team. I saw you referenced Phoronix article, for ROCm 1.5 the new compiler was faster than LLVM/HSAIL/SC on FIJI for Blender, but for Luxmark we were slower. //www.phoronix.com/review/rocm-15-opencl&num=2

One thing I will leave you with is we build standardized loader and linker and object format, with this it allows us to do some you never could do with AMGGPUpro driver, upgrades the compiler before we release a new driver. So we can now address issue independently of the base driver for OpenCL, HCC, and HIP and the base LLVM compiler foundation.

And for those wondering, I will have some more ROCm 1.6 OpenCL benchmarks shortly, which was released last week.

27 Comments