R500 Texture Semaphores Merged To Master
The R500 texture semaphores work, the feature I wrote about and tested earlier this month, has been merged to master. This feature in the R300 Gallium3D open-source driver can provide some impressive performance improvements.
Tom Stellard, the former Google Summer of Code student who has since been hired by AMD, pushed several of his "r300/compiler" patches to mainline Mesa last night. The big one is implementing texture semaphore for Radeon X1000 (R500) hardware.
The benchmarks I did earlier this month confirmed these double-digit gains for certain texture-using OpenGL workloads like Lightsmark. Additionally, the RADEON_TEX_GROUP default of eight seemed to be an ideal number for delivering maximum performance.
Tom's other commits last night were allowing for merged instructions to be scheduled on demand, preventing the register allocator from creating non-native swizzles, not pairing output writes with GPR writes in the scheduler, and enabling register rename passing on R500 and using it before running through the optimization process.
The R300 compiler work on mainline Mesa can be viewed from CGit. This performance work will be part of Mesa 7.12/8.0. Read the earlier article for more information.
Tom Stellard, the former Google Summer of Code student who has since been hired by AMD, pushed several of his "r300/compiler" patches to mainline Mesa last night. The big one is implementing texture semaphore for Radeon X1000 (R500) hardware.
r300/compiler: Implement the texture semaphore
The texture semaphore allows for prefetching of texture data. On my RV515, this increases the FPS of Lightsmark by 33% (This is with the reg_rename pass enabled, which is enabled in the next commit). There is a new env variable now called RADEON_TEX_GROUP, which allows you to specify the maximum number of texture lookups to do at once. The default is 8, but different values could produce better results for various application / card combinations.
The benchmarks I did earlier this month confirmed these double-digit gains for certain texture-using OpenGL workloads like Lightsmark. Additionally, the RADEON_TEX_GROUP default of eight seemed to be an ideal number for delivering maximum performance.
Tom's other commits last night were allowing for merged instructions to be scheduled on demand, preventing the register allocator from creating non-native swizzles, not pairing output writes with GPR writes in the scheduler, and enabling register rename passing on R500 and using it before running through the optimization process.
The R300 compiler work on mainline Mesa can be viewed from CGit. This performance work will be part of Mesa 7.12/8.0. Read the earlier article for more information.
4 Comments