What reference implementation are you talking about? An HD2900 should have ample bandwidth and shader power for decode. The gallium decode work is just being implemented now and Christian is working on an RV710 which is the same level, but a generation behind the GPU in the ontario APUs and he is getting good results.
the hd2900 is the reference implementation for any shader based video accelerations because the hd2900 do not have a UVD unit but can handle H264 codex per shader.

"and he is getting good results."

wow then i'm happy