AMD Proposing Redesign For How Linux GPU Drivers Work - Explicit Fences Everywhere

Written by Michael Larabel in Mesa on 20 April 2021 at 08:26 AM EDT. 38 Comments

Well known open-source AMD Linux graphics driver developer Marek Olšák published an initial proposal this week as "a redesign of how Linux graphics drivers work."

This redesign, which can safely co-exist with the current driver behavior, is about using explicit fences everywhere and a new memory management approach that doesn't make use of buffer object (BO) fences.

Here's the summary of the current situation and what Marek (and in turn, the AMD driver developers) are looking to address around Linux video memory handling:

The current Linux graphics architecture was initially designed for GPUs with only one graphics queue where everything was executed in the submission order and per-BO fences were used for memory management and CPU-GPU synchronization, not GPU-GPU synchronization. Later, multiple queues were added on top, which required the introduction of implicit GPU-GPU synchronization between queues of different processes using per-BO fences. Recently, even parallel execution within one queue was enabled where a command buffer starts draws and compute shaders, but doesn't wait for them, enabling parallelism between back-to-back command buffers. Modesetting also uses per-BO fences for scheduling flips. Our GPU scheduler was created to enable all those use cases, and it's the only reason why the scheduler exists.

The GPU scheduler, implicit synchronization, BO-fence-based memory management, and the tracking of per-BO fences increase CPU overhead and latency, and reduce parallelism. There is a desire to replace all of them with something much simpler.

So while the current open-source Linux graphics drivers are quite performant and generally competitive with Windows, there is a desire to do better. With Marek's proposed simpler approach he is ultimately hoping for lower latency and greater performance with the removal of per buffer object fences. The proposal also lays out changes around per-process vRAM usage quota handling and more.

Under the plan, implicit synchronization would be deprecated with time and potentially be done as part of new hardware bring-up in the driver where it would only be implemented with the explicit fences everywhere design.

Those interested in all the fine technical details of Marek's proposal can find it on the mailing list.

Intel driver developer Jason Ekstrand commented that they have been looking at pursuing a "user-space fences" approach, which it turns out was also a suggestion made by AMD's Windows driver developers, but then was deemed insufficient and may introduce new security implications. But long story short, improving Linux GPU memory management is a matter now being discussed for significant improvement / re-architecting moving forward. With Intel moving ahead with their high performance discrete graphics endeavours and AMD picking up more sizable wins for Linux-based supercomputers and more, further enhancing these kernel drivers are of interest to all parties.

38 Comments