Page 1 of 4 123 ... LastLast
Results 1 to 10 of 37

Thread: R6xx/R7XX kernel 2.6.33 module performance hacks

Hybrid View

  1. #1
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    134

    Default R6xx/R7XX kernel 2.6.33 module performance hacks

    here are some kernel 2.6.33 module performance patches for r6xx/7xx chipsets that I wrote.

    http://pastebin.ca/1743103
    http://pastebin.ca/1743100

    It made Torcs playable on my laptop.

    The benchmark x11perf -aa10text shows more than 5% improvement on my lap top

    Please give me a before and after benchmark with the command x11perf -aa10text if you could.

  2. #2
    Join Date
    Jan 2009
    Location
    Italy
    Posts
    82

    Default



    *_RING are macros, so they are already inlined. You also left in place the OUT_RING and ADVANCE_RING macros, so the ring may be written twice (with strange side effects since tail pointer may be changed twice).
    Furthermore I fail to parse the
    Code:
    if (write < ...
    statements; typo?

  3. #3
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    134

    Default

    Quote Originally Posted by tettamanti View Post


    *_RING are macros, so they are already inlined. You also left in place the OUT_RING and ADVANCE_RING macros, so the ring may be written twice (with strange side effects since tail pointer may be changed twice).
    Furthermore I fail to parse the
    Code:
    if (write < ...
    statements; typo?
    The macros implement a write, an increment of an index, and a mask operation. My patch does a check to see if the mask operation is needed and if not executes just writes. This leads to code that takes 1/4 the time required to execute because the cpu can execute both writes in parallel. This gives you two writes per cpu cycle.

    With the macro the write and the index increment can be done in parallel. The mask operation cannot be done in parallel with the next write because of a data dependency. So you get one write every two cpu cycles.

  4. #4
    Join Date
    Dec 2008
    Posts
    989

    Default

    Did you try benchmarking with progs/demos/gltestperf ? It locks-up the machine here

  5. #5
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    134

    Default

    Quote Originally Posted by monraaf View Post
    Did you try benchmarking with progs/demos/gltestperf ? It locks-up the machine here
    I didn't check with that one. I will try it.

    I did have someone test it with ut2004 with great results

  6. #6
    Join Date
    Dec 2008
    Posts
    989

    Default

    Just to be clear, it isn't your patch that's causing this because I haven't tried it yet. Just by the name of it it looks like some kind of OpenGL benchmark utility that can be used to test your performance hacks (if it doesn't kill your system)

  7. #7
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    134

    Default

    on my machine gltestperf hangs in the same spot with or without my patches.

  8. #8
    Join Date
    Jan 2010
    Posts
    21

    Default

    Quote Originally Posted by Obscene_CNN View Post
    I didn't check with that one. I will try it.

    I did have someone test it with ut2004 with great results
    That someone would be me

    x11perf gives me some improvemenet

    before

    Code:
    4800000 reps @   0.0012 msec (855000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0012 msec (848000.0/sec): Char in 80-char aa line (Charter 10)
    after
    Code:
    4800000 reps @   0.0011 msec (883000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0012 msec (866000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0011 msec (942000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0011 msec (906000.0/sec): Char in 80-char aa line (Charter 10)
    4800000 reps @   0.0012 msec (844000.0/sec): Char in 80-char aa line (Charter 10)
    glxgears is the same fps, but a lot smoother, w/o these 21 patches the gears would pause for a split ms.

    The most noticeable difference (for me) is in ut2004, where unplayble maps (13~17, it's now like 19~22) are unbearable And where playable maps have better fps and most important (like glxgears) not more pauses/shutter. I guess these patches improved the minimum frame rate and eliminated pauses.

    My system is an amd x2 3800 with 3850 AGP (8x) with resolution of 1600x1050.

    BTW, patches applied cleanly on 2.6.32 radeon-testing and played ut2004 about an hour, haven't had crashes nor redering bugs (AFAICS). And suspended to RAM and resumed correctly this morning.
    Last edited by xming; 01-09-2010 at 03:53 AM.

  9. #9
    Join Date
    Jan 2009
    Location
    Italy
    Posts
    82

    Default

    Ops, I totally missed the "return" inside the if block; ok, you don't touch the ring twice if the shortcut is taken.
    Anywayr what I see here is that GCC schedules the write to the ring (using a temp register for the index) between the increment and the masking, trying to fill the pipeline.
    One difference is that in your fast-path the index becomes an immediate; I tent to be wary of such optimizations (open-coding) though: it's far too easy to introduce bugs when the open-coded parts are not kept in sync.
    One thing that could be tried is moving the test for the wrap in BEGIN_RING, and set the mask to ~0 if it's not needed; gcc seems to be smart enough to skip the and in this case.
    Side effect is that if you pass the wrong number of words to BEGIN then you end up writing past the ring...

  10. #10
    Join Date
    May 2009
    Location
    Richland, WA
    Posts
    134

    Default

    tenttamanti,

    Well no matter how GCC tries to hide the mask op its still at least 4 times slower than my optimized code.

    Yes its ugly and hard to maintain but it can't be done faster in C. I'm almost willing to bet beer that this will not make it into the kernel (I would bet beer but I know that they will patch it into the kernel to win the bet then patch it out) but that's why I call it a hack.


    NOTE: I have discovered that the x11perf test is not a very good test case as it can fluctuate 5% . My patches do however give a noticeable improvement to 3d games that bog down a system.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •