Results 1 to 7 of 7

Thread: Performance Counters Stuck For Linux GPU Drivers

  1. #1
    Join Date
    Jan 2007
    Posts
    14,647

    Default Performance Counters Stuck For Linux GPU Drivers

    Phoronix: Performance Counters Stuck For Linux GPU Drivers

    While it mostly concerns developers, another current shortcoming of the open-source Linux graphics drivers is the lack of suitable performance counters support...

    http://www.phoronix.com/vr.php?view=MTI5MTQ

  2. #2
    Join Date
    Aug 2011
    Location
    Hillsboro, Oregon
    Posts
    134

    Default

    We've actually been meaning to support GL_AMD_performance_monitor for some time now. I started on it a while back, but got busy with other things and never finished it.

    We do have a few things: intel-gpu-top shows how busy the GPU is, a breakdown by unit, render vs. blitter usage, and basic counters like VS/PS invocations. However, the 'top' style interface is not always the most useful, since it doesn't allow you to capture data over time. We also have a small proof of concept program called 'chaps' in intel-gpu-tools which exposes Ironlake's MI_REPORT_PERF counters. Ultimately, the idea is to expose those via GL_AMD_performance_monitor. We currently don't have that for Sandybridge/Ivybridge though, sadly. As Michael mentioned, there are some hoops to jump through, but I think once someone writes the code that should be doable.

    Another new tool is Eric's INTEL_DEBUG=shader_time, which shows a breakdown of how many clock cycles were used by each vertex shader, 8-wide fragment shader, and 16-wide fragment shader. It's extremely useful for determining which shaders are the most expensive (sometimes large shaders are seldom used, while smaller shaders are used extremely frequently, so guessing doesn't always work). That allows us to focus our optimization efforts. (Sadly this is Ivybridge only, since the timestamp register didn't exist prior to that.)

    But overall I agree, we need more performance counters, and need to expose them to application developers.

  3. #3
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,072

    Default

    I'd like some Radeon performance counters. Pretty please?

  4. #4
    Join Date
    Feb 2009
    Location
    France
    Posts
    291

    Default

    Quote Originally Posted by Kayden View Post
    We've actually been meaning to support GL_AMD_performance_monitor for some time now. I started on it a while back, but got busy with other things and never finished it.

    We do have a few things: intel-gpu-top shows how busy the GPU is, a breakdown by unit, render vs. blitter usage, and basic counters like VS/PS invocations. However, the 'top' style interface is not always the most useful, since it doesn't allow you to capture data over time. We also have a small proof of concept program called 'chaps' in intel-gpu-tools which exposes Ironlake's MI_REPORT_PERF counters. Ultimately, the idea is to expose those via GL_AMD_performance_monitor. We currently don't have that for Sandybridge/Ivybridge though, sadly. As Michael mentioned, there are some hoops to jump through, but I think once someone writes the code that should be doable.

    Another new tool is Eric's INTEL_DEBUG=shader_time, which shows a breakdown of how many clock cycles were used by each vertex shader, 8-wide fragment shader, and 16-wide fragment shader. It's extremely useful for determining which shaders are the most expensive (sometimes large shaders are seldom used, while smaller shaders are used extremely frequently, so guessing doesn't always work). That allows us to focus our optimization efforts. (Sadly this is Ivybridge only, since the timestamp register didn't exist prior to that.)

    But overall I agree, we need more performance counters, and need to expose them to application developers.
    Yeah, I was pleased to see the intel-gpu-top and have been looking forward to doing the same on nouveau/nvidia. Anyway, it is going to be hard to actually find some hw-independant measures to expose to the game developers but things like shader execution time, memory bandwidth usage (in percent), PCIE bandwidth usage, shader engine usage and then the usual cache misses (this is going to be hard to make hw-independant and could be in the hw-dependant part).

  5. #5
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,072

    Default

    Does nv expose stall reasons? Those would be nice too if they exist. Ie, the fragment blocks not being able to run because the vertex shader has not completed for that whole block, and so on. (define block as thread group or whatever it's called on nv)

  6. #6
    Join Date
    Aug 2011
    Location
    Hillsboro, Oregon
    Posts
    134

    Default

    Quote Originally Posted by MPF View Post
    Yeah, I was pleased to see the intel-gpu-top and have been looking forward to doing the same on nouveau/nvidia. Anyway, it is going to be hard to actually find some hw-independant measures to expose to the game developers but things like shader execution time, memory bandwidth usage (in percent), PCIE bandwidth usage, shader engine usage and then the usual cache misses (this is going to be hard to make hw-independant and could be in the hw-dependant part).
    That's one of the nice things about the GL_AMD_performance_monitor extension, though...it just exposes a generic counter mechanism. Applications can query GL to get a list of available counters (organized in groups), and then get data from them. It doesn't actually specify what counters are available, so you can expose whatever your hardware offers.

  7. #7
    Join Date
    Feb 2009
    Location
    France
    Posts
    291

    Default

    Quote Originally Posted by Kayden View Post
    That's one of the nice things about the GL_AMD_performance_monitor extension, though...it just exposes a generic counter mechanism. Applications can query GL to get a list of available counters (organized in groups), and then get data from them. It doesn't actually specify what counters are available, so you can expose whatever your hardware offers.
    I'm not a big fan of exposing everything in a non-hw-independent way but this is better than nothing for sure! Exposing a more restricted subset of the performance counters but with some clearly-defined semantics should be of interest to game developers

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •