This post seems to rest on two basic assertions:
1) Flash must have CPU-access to the decoded video surfaces.
2) Flash can't obtain CPU-access to the decoded video surfaces.
I believe that both of these assertions are wrong, at the very least for VDPAU, and probably for other video APIs as Gwenole claims above; he should know.
Taking the points in reverse order:
VDPAU currently has two different ways of obtaining CPU access to the surfaces. One can use APIs such as VdpVideoSurfaceGetBitsYCbCr or VdpOutputSurfaceGetBitsNative to download the data to the CPU, and then act on the data in any way. I don't believe any media players currently do this, because there's very little point. Alternatively, one can use the VDPAU presentation queue to present the data to an X pixmap. One can then use X APIs, or OpenGL's GLX_EXT_texture_from_pixmap to composite this data with application UI elements. At least XBMC uses this method (specifically GLX_EXT_tfp) very successfully today, even on low-end platforms; it is a very well tested path.
Finally, more mechanisms will be made available in the near future.
On to your second point:
I imagine that the only reason Flash requires CPU-access to the frames is to render/blend the UI on the CPU. I don't think this is the correct approach; GPU acceleration should be used for the UI rendering (or at least upload and blending). VDPAU itself has various rendering/blending/scaling operations built in specifically for this purpose. Alternatively, you could get the video into OpenGL and then use OpenGL's rendering/blending/scaling operations, as XBMC does. Do also note that VDPAU's VdpVideoMixer fully performs the YUV->RGB conversions you mentioned on the GPU, and if Flash really needs, it can download the resultant RGB data after that step with almost no effort.
I also take issue with your point that Flash is somehow fundamentally different to other media players. Specifically, MPlayer renders UI/OSD, subtitles, etc. on top of the video using VDPAU features. XBMC renders a potentially complex and pretty UI over the video using OpenGL. I believe both of these applictions, and others, can also perform network streaming at the same time for example. It sounds like they're both doing the exact same thing that Flash needs to.
Please note that I haven't yet read your "flash uses the GPU" post, or at least note recently. I'll go read it now. Still, I doubt that will change my mind that widely available APIs (across platforms and vendors) such as OpenGL are the correct way to accelerate graphics-oriented applications such as Flash.
In summary: If you have any issues understanding or using VDPAU, please feel free to contact NVIDIA. We'd be very happy to help you.