Sounds like trouble. Afaik a big part of the blitting code is done in userspace for both EXA and OpenGL drivers for new cards because the stuff is just too bloody huge. (requires setting the 3D engine in the GPU in a specific state to work)
Maybe the framebuffers also need hardware-specific userspace acceleration libraries for that to work sanely?
Had another look. There are actually blit functions for both the r100-500 and r600 series, and yes, the r600 version seems to involve setting up shaders to do the move.
Thing is, these blit functions just move a contiguous block of memory, without the per-line offsets and such that you need to move an arbitrary rectangle from one part of the framebuffer to the other. I'd need to add another function to the dispatch tables of each of the chipsets.
The good news is that it looks like one implementation would be sufficient for r100-r500. I'll have a go at implementing this when I get set up (half-way through moving house). I can't do anything about the r600+ versions as I don't have one to mess with.