Page 1 of 4 123 ... LastLast
Results 1 to 10 of 33

Thread: 2d tiling + sb -> no improvement in fill rate, curious

  1. #1
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,103

    Default 2d tiling + sb -> no improvement in fill rate, curious

    After upgrading my ddx, I finally got 2d tiling on my RV710. It was supposed to be the thing to increase fillrate on bw-limited cards.

    The mesa-demos fill bench had the exact same numbers with and without 2d tiling. Adding SB on top of 2d tiling improved some numbers, but that too had some curious results in the last test.

    This card, according to specs, is capable of 2.3 gigapixels/sec. It has only gotten about half that on the open drivers for years, tiling was supposed to improve it, it didn't. Any ideas on why it had no difference welcome.

    Everything was measured on the default power profile, which equals high profile on this card.

    3.7.10, mesa 9.1.1, ddx 7.1.0, libdrm 2.4.44

    The numbers, both with and without 2d tiling:
    Simple fill: 1.3 billion pixels/second
    Blended fill: 1.1 billion pixels/second
    Textured fill: 1.1 billion pixels/second
    Shader1 fill: 1.1 billion pixels/second
    Shader2 fill: 543.8 million pixels/second
    With SB:
    Simple fill: 1.3 billion pixels/second
    Blended fill: 1.1 billion pixels/second
    Textured fill: 1.2 billion pixels/second
    Shader1 fill: 1.2 billion pixels/second
    Shader2 fill: 588.0 million pixels/second
    SB gave some minor improvement. However, note the shader2 value: almost exactly half of shader1.

    Shader2 consists of shader1 + many no-ops that should be optimized out. By printing the results with R600_DEBUG=sb,sbstat,ps I could see both shaders were optimized to the exact same instructions.


    So, we have two curious things here:
    - why is the fillrate still only half of hw ability
    - why is the exact same shader half the speed, when only the pre-optimized shader differs

  2. #2
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,103

    Default

    Noting that the SB test was done on mesa git, not 9.1.1.

  3. #3
    Join Date
    Jul 2009
    Location
    Germany
    Posts
    492

    Default

    Quote Originally Posted by curaga View Post
    - why is the exact same shader half the speed, when only the pre-optimized shader differs
    This test (Shader2 fill) is weird. I just ran those with compositing desktop and without:

    with:
    Code:
    Simple fill: 7.6 billion pixels/second
       Blended fill: 7.6 billion pixels/second
       Textured fill: 7.6 billion pixels/second
       Shader1 fill: 7.6 billion pixels/second
       Shader2 fill: 4.6 billion pixels/second
    without:
    Code:
    Simple fill: 7.6 billion pixels/second
       Blended fill: 7.6 billion pixels/second
       Textured fill: 7.6 billion pixels/second
       Shader1 fill: 7.6 billion pixels/second
       Shader2 fill: 3.8 billion pixels/second
    So it's slower when there's less other workload on the GPU!?

  4. #4
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,103

    Default

    What's your card? I'm curious on how far 7.6 is from the specs of the hw.

  5. #5
    Join Date
    Jul 2009
    Location
    Germany
    Posts
    492

    Default

    Quote Originally Posted by curaga View Post
    What's your card? I'm curious on how far 7.6 is from the specs of the hw.
    ATI Radeon HD5770 (Evergreen/Juniper). Spec is 12 AFAIK.

    /edit:
    I just checked the source of fill and if you remove line 181-184 (where it calls swap buffers every 128 iteration) I get this and the output is still correct:

    Code:
       Simple fill: 8.2 billion pixels/second
       Blended fill: 7.9 billion pixels/second
       Textured fill: 8.1 billion pixels/second
       Shader1 fill: 8.4 billion pixels/second
       Shader2 fill: 5.2 billion pixels/second
    Shader2 is still slower but at least the other ones are closer to spec

    /edit2:
    and without the glFinish() it is still rendering correctly but the result is this:

    Code:
       Simple fill: 16.1 billion pixels/second
       Blended fill: 13.5 billion pixels/second
       Textured fill: 13.3 billion pixels/second
       Shader1 fill: 13.3 billion pixels/second
       Shader2 fill: 9.6 billion pixels/second
    which is above spec. But I'm not sure if it is allowed to do that :-D
    Last edited by droste; 05-26-2013 at 05:29 PM.

  6. #6
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,103

    Default

    Well without a swap, the driver is allowed to detect that you're overwriting the same buffer, and skip all rendering but the last.

    Removing the glfinish gets you invalid results, since the timing is cpu-side.

  7. #7
    Join Date
    Jul 2009
    Location
    Germany
    Posts
    492

    Default

    Quote Originally Posted by curaga View Post
    Well without a swap, the driver is allowed to detect that you're overwriting the same buffer, and skip all rendering but the last.
    Well yes. But either swapping after every draw is the correct thing to do, if you want to benchmark this or no swapping. But what's the reasoning for swapping every 128th iteration?

    Quote Originally Posted by curaga View Post
    Removing the glfinish gets you invalid results, since the timing is cpu-side.
    Yeah makes sense.

  8. #8
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,103

    Default

    Quote Originally Posted by droste View Post
    Well yes. But either swapping after every draw is the correct thing to do, if you want to benchmark this or no swapping. But what's the reasoning for swapping every 128th iteration?
    The comment says to please old drivers - so I gather it assumes both dumb (no overwriting check) and limited (no long queues) driver.

  9. #9
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,103

    Default

    Testing another random bench: http://www.graphics.stanford.edu/cou...-fall/as1.html

    This one renders straight to front, using fixed mode. It also gave 1.3Gpix/s, confirming this number.

  10. #10
    Join Date
    Jul 2009
    Location
    Germany
    Posts
    492

    Default

    Quote Originally Posted by curaga View Post
    The comment says to please old drivers - so I gather it assumes both dumb (no overwriting check) and limited (no long queues) driver.
    Yes of course. My point is, it is distorting the result. Nonetheless all of it doesn't explain why Shader2 Fill is so slow.

    Quote Originally Posted by curaga View Post
    Testing another random bench: http://www.graphics.stanford.edu/cou...-fall/as1.html

    This one renders straight to front, using fixed mode. It also gave 1.3Gpix/s, confirming this number.
    Code:
    --------------------------------------------------
    Vendor:      X.Org
    Renderer:    Gallium 0.4 on AMD JUNIPER
    Version:     3.0 Mesa 9.2.0 (git-44a117a)
    Visual:      RGBA=<8,8,8,0>  Z=<24>  double=1
    Geometry:    800x800+7+28
    Screen:      1920x1080
    --------------------------------------------------
    Fill Rate:      13466.93 MPix/second
    Triangle Rate:  54.30 Mtri/second
    For me it shows above spec speed

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •