Page 2 of 2 FirstFirst 12
Results 11 to 20 of 20

Thread: Radeon Gallium3D R600g Color Tiling Performance

  1. #11
    Join Date
    Oct 2012
    Posts
    17

    Default

    Quote Originally Posted by pingufunkybeat View Post
    In your case, it is about 4ms faster at rendering a single texture.

    Optimising 4ms away is really hard work, especially if it consists of 100 different miliseconds collected across different parts of the driver. That's what my armchair response was about.
    4 Ms is really a very long time, a huge amount of CPU instructions. Plus CPU utilization is not high. This is not about code optimization. I am certain it is not a hundred little things, it must be a couple big ones.

  2. #12
    Join Date
    Aug 2012
    Posts
    245

    Default

    Quote Originally Posted by pingufunkybeat View Post
    In your case, it is about 4ms faster at rendering a single texture.

    Optimising 4ms away is really hard work, especially if it consists of 100 different miliseconds collected across different parts of the driver. That's what my armchair response was about.
    Please elaborate on what optimizations should be done on rendering a single image? In such a simple process, pretty much the same "calls" to the gpu should be made, no?
    The driver's bottleneck is the CPU, it's where it works as a program, no? But low CPU usage, pretty much eliminates this possibility for this case. So GPU is the bottleneck. So something must be wrong there. Such a simple case, indicates that something is done wrong, a speed up feature not used or extra/different usage of the GPU is done. And it doesn't seem like many small ones, more like a couple bigger ones as mentioned. I doubt it took AMD 15 years to optimize rendering a texture(one). One possibility, may be invalid though, could it have to do with texture compression? Test it with a simple gradient instead and see there tooPMtmikov .. xD

  3. #13
    Join Date
    Jun 2009
    Posts
    2,927

    Default

    Quote Originally Posted by Rigaldo View Post
    Please elaborate on what optimizations should be done on rendering a single image? In such a simple process, pretty much the same "calls" to the gpu should be made, no?
    Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.

    The driver's bottleneck is the CPU, it's where it works as a program, no? But low CPU usage, pretty much eliminates this possibility for this case.
    Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

    Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

    But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.

  4. #14
    Join Date
    Jul 2009
    Posts
    250

    Default

    Quote Originally Posted by pingufunkybeat View Post
    Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.


    Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

    Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

    But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.
    arent open source drivers single threaded? i wouldnt wonder that the cpu isnt fully used. would be interesting to see when the cpu would run in single core mode and then compare to catalyst...

  5. #15
    Join Date
    Dec 2007
    Posts
    2,341

    Default

    2D tiling should show a bigger improvement on bigger cards since they have more memory channels.

  6. #16
    Join Date
    May 2007
    Posts
    231

    Default

    Quote Originally Posted by tmikov View Post
    Fair enough. But what would be a good synthetic stress test?

    Also, do you have an idea why the blob is faster? Could it be memory clocks, power management, etc?
    This mesa demo fill probably have double perf with 2d tiling on (depending on the GPU has the high end GPU benefit more from 2D tiling)
    http://cgit.freedesktop.org/mesa/dem...rc/perf/fill.c

    There is not a single thing that explain the gap btw open source driver and closed source driver. There is no secret way to do thing, we have tools to capture fglrx command stream and there is nothing fundamentally different. Proper power management support with use of on chip governor to manage clock will probably improve performance a bit, better buffer heuristic placement, better shader compiler, less cpu overhead, less cp stalling, ... many little things like those add up.

  7. #17
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,034

    Default

    @tmikov

    Tried to profile it yet? Oprofile will show the cpu use, radeontop gpu use, latencytop any waits.

  8. #18
    Join Date
    Aug 2012
    Posts
    245

    Default

    Quote Originally Posted by pingufunkybeat View Post
    Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.


    Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

    Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

    But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.
    Neither am I a GPU developer of course. At best, I've tried programming with OpenGL, so even there I know just a few basics. But I still don't think it's s few small things here and there. And someone may know better whether compression can play any part in the specific benchmark of course, cause if it can, we likely found the culprit on the "single texture test" and maybe radeon is doing fine, who knows.

  9. #19
    Join Date
    Sep 2008
    Location
    Vilnius, Lithuania
    Posts
    2,537

    Default

    This makes me wonder... It's just a wild guess, but are you making sure that the GPU is trying to render only your texture, and nothing else? If the test was running like glxgears, then the difference in performance could very well be due to the GPU having to render all the windows in the background and such as well as the test object. And even if it is running fullscreen, are there any guarantees that the GPU is not trying to render something in the background or offscreen, before drawing the test texture on top?

  10. #20
    Join Date
    Mar 2012
    Posts
    116

    Default

    Quote Originally Posted by GreatEmerald View Post
    This makes me wonder... It's just a wild guess, but are you making sure that the GPU is trying to render only your texture, and nothing else? If the test was running like glxgears, then the difference in performance could very well be due to the GPU having to render all the windows in the background and such as well as the test object. And even if it is running fullscreen, are there any guarantees that the GPU is not trying to render something in the background or offscreen, before drawing the test texture on top?
    Try "sudo init 3" and using GLES

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •