LLVM Merges Initial Support For OpenMP Kernel Language

Written by Michael Larabel in LLVM on 6 October 2023 at 10:00 AM EDT. 8 Comments

Merged to LLVM 18 Git yesterday was the initial support for the OpenMP kernel language, an effort around having performance portable GPU codes as an alternative to the likes of the proprietary CUDA.

This work is a set of extensions to LLVM OpenMP coming out of researchers at Stony Brook University and the Lawrence Livermore National Laboratory (LLNL). See their 2023 research paper while the abstract on this OpenMP kernel language effort comes down to:

"In this work, we introduce extensions to LLVM OpenMP, transforming it into a versatile and performance portable kernel language for GPU programming. These extensions allow for the seamless porting of programs from kernel languages to high-performance OpenMP GPU programs with minimal modifications. To evaluate our extension, we implemented a proof-of-concept prototype that contains a subset of extensions we proposed. We ported six established CUDA proxy and benchmark applications and evaluated their performance on both AMD and NVIDIA platforms. By comparing with native versions (HIP and CUDA), our results show that OpenMP, augmented with our extensions, can not only match but also in some cases exceed the performance of kernel languages, thereby offering performance portability with minimal effort from
application developers."

Merged yesterday to LLVM Git is just the initial support for this OpenMP kernel language:

"This patch starts the support for OpenMP kernel language, basically to write OpenMP target region in SIMT style, similar to kernel languages such as CUDA. What included in this first patch is the ompx_bare clause for target teams directive. When ompx_bare exists, globalization is disabled such that local variables will not be globalized. The runtime init/deinit function calls will not be emitted. That being said, almost all OpenMP executable directives are not supported in the region, such as parallel, task. This patch doesn't include the Sema checks for that, so the use of them is UB. Simple directives, such as atomic, can be used. We provide a set of APIs (for C, they are prefix with ompx_; for C++, they are in ompx namespace) to get thread id, block id, etc."

Ultimately the hope is these extensions will ease the transition from kernel languages like CUDA to OpenMP in a portable and cross-vendor manner. As outlined in the aforelinked research paper, the early proof-of-concept performance results have been very promising compared to NVIDIA CUDA and AMD HIP. It will be very interesting to see how this OpenMP kernel language work progresses with mainline LLVM.

8 Comments