Benchmarking Radeon Open Compute ROCm 1.4 OpenCL
Last month with AMD/GPUOpen's ROCm 1.4 release they delivered on OpenCL support, albeit for this initial release all of the code is not yet open-source. I tried out ROCm 1.4 with the currently supported GPUs to see how the OpenCL performance compares to just using the AMDGPU-PRO OpenCL implementation.
With December's ROCm 1.4 release there was an OpenCL 2.0 compatible kernel language implementation and OpenCL 1.2 compatible runtime. ROCm, for the uninitiated, is the Radeon Open Compute effort for HPC and Ultrascale GPU computing.
The project is explained on their GitHub project site, "Using our knowledge of the HSA Standards and, more importantly, the HSA Runtime, we have been able to successfully extended support to the dGPU with critical features for accelerating NUMA computation. As a result, the ROCK driver is composed of several components based on our efforts to develop the Heterogeneous System Architecture for APUs, including the new AMDGPU driver, the Kernel Fusion Driver (KFD), the HSA+ Runtime and an LLVM based compilation stack which provides support for key languages."
This developer preview support of OpenCL in ROCm is limited to just Fiji (R9 Fury series) and Baffin/Ellesmere (Radeon RX 400 series) hardware. So for this article I tested ROCm 1.4 using the Radeon R9 Fury, RX 460, and RX 480. AMD does provide binary Debian packages of ROCm built for Ubuntu 16.04, which with it provide a patched Linux 4.6 based kernel.
ROCm 1.4 still depends upon some binary blobs but the developers are working to either deprecate that functionality or open-source those components in the future. The OpenCL component is entirely closed-source for this release along with hsa-ext-rocr-dev. As AMD developers have explained in our forums, they are still working to fully open-up their OpenCL driver stack and in the long-run this ROCm approach will be their solution, with no longer investing in the Clover-based OpenCL Gallium3D driver, etc.
On a clean installation of Ubuntu 16.04 LTS x86_64 it was very easy to setup ROCm 1.4 and I was quickly off to running OpenCL benchmarks, once making some adjustments to the test profiles in dealing with all of the components being housed in /opt/rocm rather than conventional paths.
With the same GPUs on the same system I also ran some OpenCL benchmarks using the AMDGPU-PRO 16.50 driver. The OpenCL driver with ROCm 1.4 was identified as AMD-APP 2300.5 while on AMDGPU-PRO 16.50 it was marked as AMD-APP 2236.5. The rest of the stack, including the Linux 4.6-based kernel, were setup by the ROCm installation through its Debian archive. All of the OpenCL benchmarks were run via the open-source Phoronix Test Suite benchmarking software. The Ubuntu 16.04 test system was with an Intel Xeon E3-1280 v5 Skylake CPU, MSI C236A WORKSTATION, 16GB DDR4 memory, and 256GB OCZ RD400 NVMe SSD.