Announcement

**xxmitsu** · 05 February 2024, 07:53 AM

Hi Michael,

When you have time, could you please do a comparison benchmark with the current state of Zink vs RadeonSI ACO vs RadeonSI LLVM ?

**FireBurn** · 05 February 2024, 08:44 AM

Kinda pointless telling us about this without explaining what VOPD is or how this will be useful

**V1tol** · 05 February 2024, 09:08 AM

This PR applies cleanly on top of 24.0.0. Time to see if there is any noticeable difference in performance.

**zcansi** · 05 February 2024, 09:16 AM

Originally posted by FireBurn View Post

Kinda pointless telling us about this without explaining what VOPD is or how this will be useful

Seems to be an instruction telling the GPU to process two arithmetic instructions in parallel. So this optimization looks for compatible pairs of instructions and unifies them in the right way so they can be executed in parallel.

Corrections / additions welcome, this is just from skimming the PR...

**davidbepo** · 05 February 2024, 10:44 AM

Originally posted by FireBurn View Post

Kinda pointless telling us about this without explaining what VOPD is or how this will be useful

VOPD allows to use the doubled shader count Navi 3 has (but AMD doesnt advertise) on Wave32 instructions

**Kjell** · 05 February 2024, 12:14 PM

Originally posted by V1tol View Post

This PR applies cleanly on top of 24.0.0. Time to see if there is any noticeable difference in performance.

Any updates?

**QwertyChouskie** · 05 February 2024, 02:17 PM

From some quick research, it seems this is specifically for RDNA3.

**kiffmet** · 05 February 2024, 03:52 PM

Originally posted by FireBurn View Post

Kinda pointless telling us about this without explaining what VOPD is or how this will be useful

It's the technical term for the the dual-issue stuff introduced with RDNA3. Essentially, compared to RDNA2 the architecture has twice as many ALUs, but with the catch that the additional ones can only be utilized whenever two arithmetic/logic operations can be combined into a single VOPD instruction.

So best case scenario is double the throughput/twice the TFLOPs. For games it's probably more around +5 to +20% performance, depending on how much optimization the shader compiler can do.

**marek** · 05 February 2024, 07:00 PM

2x ALU is automatic with Wave64. Only Wave32 must use VOPD to utilize it.

Announcement

VOPD Scheduler For Valve's ACO Compiler Merged Into Mesa 24.1

VOPD Scheduler For Valve's ACO Compiler Merged Into Mesa 24.1

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment