Announcement

Collapse
No announcement yet.

"Guilty" API Proposed For Better Communicating Why Radeon GPUs Hang/Reset

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Guilty" API Proposed For Better Communicating Why Radeon GPUs Hang/Reset

    Phoronix: "Guilty" API Proposed For Better Communicating Why Radeon GPUs Hang/Reset

    A set of patches to the AMDGPU Linux kernel driver and Mesa's RADV Vulkan driver would allow more easily relaying information about the reasons why a GPU hang/reset occur so that the user-space software can be more informed about any issues...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    How about trying to implement something like this in DRM? It's something Windows had since Vista, while Linux still doesn't have it. (Although I heard AMDGPU has some very limited reset ability). Also, I always wondered why does incorrect graphics api usage lead to GPU hangs on Linux, while this is not the case on Windows. The fact that the Linux graphics stack is so prone to GPU hangs implies that WDDM is still years ahead.

    Comment


    • #3
      What I want to see is user session recovering after a GPU hang. It's still not happening yet.

      Comment


      • #4
        I'm not 100% sure if the bug is in the Radeon driver, Firefox, or Kwin (or a combination), but I do get intermittent Wayland crashes when playing embedded videos in Firefox (RX 6700 GPU).

        Would be nice to track down the cause since this is the only real thing that causes instability in my current setup.

        Comment


        • #5
          Originally posted by user1 View Post
          How about trying to implement something like this in DRM? It's something Windows had since Vista, while Linux still doesn't have it. (Although I heard AMDGPU has some very limited reset ability). Also, I always wondered why does incorrect graphics api usage lead to GPU hangs on Linux, while this is not the case on Windows. The fact that the Linux graphics stack is so prone to GPU hangs implies that WDDM is still years ahead.
          Well:

          1. At least with amdgpu+sway/wlroots, GPU resets worked fine for me (had several of them due to a GPU firmware bug)
          2. You can crash Windows GPU drivers too in the same way. Had issues in Blender with the Windows notebook from my workplace.
          3. Microsoft is dictating stuff like WDDM to GPU manufactures. So yeah, it is easier when you not have to fight someone like NVidia to implement basic stuff in their driver.

          Comment


          • #6
            I use Fedora 38 with the default Gnome desktop on Wayland.
            Never had a single crash using my cheap-ass RX 6400.
            I've only had to follow the instructions here:

            Comment


            • #7
              Originally posted by user1 View Post
              How about trying to implement something like this in DRM? It's something Windows had since Vista, while Linux still doesn't have it. (Although I heard AMDGPU has some very limited reset ability). Also, I always wondered why does incorrect graphics api usage lead to GPU hangs on Linux, while this is not the case on Windows. The fact that the Linux graphics stack is so prone to GPU hangs implies that WDDM is still years ahead.
              My ARC GPU resets just fine and I even get to keep my kwin_wayland session. Although I would like to see no resets at all, but I guess arc is too new for now.

              Comment


              • #8
                Originally posted by -MacNuke- View Post

                Well:

                1. At least with amdgpu+sway/wlroots, GPU resets worked fine for me (had several of them due to a GPU firmware bug)
                2. You can crash Windows GPU drivers too in the same way. Had issues in Blender with the Windows notebook from my workplace.
                3. Microsoft is dictating stuff like WDDM to GPU manufactures. So yeah, it is easier when you not have to fight someone like NVidia to implement basic stuff in their driver.
                1. That's what I was talking about, but again, it's much more limited and exclusive to AMDGPU. On Windows on the other hand, it's standardized across drivers thanks to WDDM and works in much more cases.
                2. If the graphics driver is buggy (which is true in the case of AMD Windows drivers at least on rdna), then even the timeout recovery might not work 100% of times. I assume blender crashed your GPU drivers simply because of driver bugs, but not because of incorrect graphics api usage like on Linux, which again, is the reason gpu hangs are not really a thing on Windows.
                3. Yeah, nothing is worse than the way Nvidia supports Linux which overally is an even bigger pita on Linux. The whole Mesa/DRM infrastructure is of course a step in the right direction because this way we have at least some gpu driver standardization for non Nvidia cards, but it still lags behind WDDM. I mean look at WDDM Wikipedia article and see how many features are introduced into WDDM over the years, while all of them of course are being standardized across all the vendors (even when ignoring some Windows specific features). I bet DRM only has a fraction of it. Even things like GPU temperature reading is part of WDDM and is standardized.

                Comment


                • #9
                  Originally posted by RejectModernity View Post

                  My ARC GPU resets just fine and I even get to keep my kwin_wayland session. Although I would like to see no resets at all, but I guess arc is too new for now.
                  It's nice to hear that Intel also has it, but it seems like on Linux the reset mechanism is implemented exclusively for each driver. On Windows however, it's standardized across drivers.

                  Comment


                  • #10
                    Originally posted by user1 View Post
                    is the reason gpu hangs are not really a thing on Windows
                    You say it like it is a regular thing on Linux. I use Linux for like 15 years now and GPU resets are a very rare thing to me. I had more GPU driver crashes across all vendors under Windows 10 and 11 than on Linux while using Linux for much more things.

                    And like I said. If the Linux GPU subsystem people could dictate what GPU vendors have to implement, such things would be a standard in Linux too. Thats why things are working better with amdgpu (and maybe Intel) than on Nvidias stuff.

                    Comment

                    Working...
                    X