Unfortunately scrolling (where source and destination rectangles overlap) turns out to be one of the harder things to do with the 3D engine. There was a lot of discussion during the initial driver support for r6xx and higher, finding the right balance between "go fast" and "no corruption". The technical issue is that blits are done with a 3D engine by using a texture for the source rectangle and a render target for the destination rectangle. Each has its own cache, so writes to the destination rectangle don't update the cached copy of the source rectangle which you want when doing a blit. Even worse, the render operation is scattered across multiple SIMDs and there is no guarantee that the copy operations for one scanline will be completed before the operations for the next one begin. The hardware guarantees correctly ordered writes between triangles/quads (IIRC) but not within a single primitive.