AMD Begins Prototyping CRIU Support For ROCm Compute

Written by Michael Larabel in AMD on 1 May 2021 at 12:45 PM EDT. 5 Comments
AMD
As part of AMD's growing HPC focus and maturing of their Radeon Open eCosystem GPU compute stack, they ended out this week by making public a prototype implementation of CRIU support for ROCm.

AMD's Radeon open-source graphics software developers are working on Checkpoint/Restore In Userspace (CRIU) handling for ROCm. CRIU allows the ability to freeze a running process and archiving it to disk that can then be thawed/restored later on. This user-space-based solution is, of course, much more tricky when it comes to handling processes interacting with the GPU.

Overnight an initial set of patches were posted for the AMD Radeon graphic's "AMDKFD" kernel code for supporting CRIU. These 17 patches with more than two thousand lines of new kernel code is still in a "request for comments" / prototyping stage.

Ultimately they are working towards being able to upstream this checkpoint/restore support in the AMDKFD driver that will be usable to the ROCm stack.so ROCm applications can be CRIU'ed. The new kernel ioctl for the new capabilities is still not finalized yet, so it may be a while before this support is squared away.

In any case for those interested in CRIU around AMD Radeon compute workloads, see this patch series for more details.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week