Cairo 1.12.4 Brings Worthwhile Changes
Phoronix: Cairo 1.12.4 Brings Worthwhile Changes
Taking a break from his crazy activity on the Intel driver and SNA acceleration architecture, Chris Wilson released today Cairo 1.12.4. There are some worthwhile changes and new features to this release making it worth the upgrade...
I've seen the image backend of cairo used often as the baseline for performance comparisons but I still don't know what it is.
It's standard software rendering AFAIK. (no xrender, or GL acceleration, etc.)
Originally Posted by liam
Originally Posted by smitty3268
Hmm, I thought xlib was software (hence the zero copy to gpu) while the xrender back end would have created the image on the gpu to start with?
The xlib backend rasterizes on the CPU, but uses the GPU (via XRender and EXA) for filling, copying, and compositing (which tend to be more frequent operations). I think it may also use XRender for path rendering after tesselating to trapezoids (which isn't hw accelerated by any driver I know of now, but could be in theory).
Originally Posted by TechMage89
That sounds right, but what is the difference with the image backend?
cairo-xlib tessellates the high-level paths from the user into trapezoids and sends those to the Xserver. The ddx then rasterises the trapezoids into a mask and composites that onto the destination. Both Nvidia and glamor use trapezoid shaders to avoid rasterising with the CPU, SNA uses the same high speed scanline rasteriser as cairo-image (both try to eliminate the intermediate mask), and EXA uses the slow pixman trapezoid rasterisation routines and the extra compositing step. (For -intel the CPU is faster at generating the RLE opacity mask and sending it as geometry to the GPU than the current GPUs are at executing the branch heavy trapezoid shader. The ultimate question is whether we can tolerate using MSAA and have GPUs sufficiently fast enough...)
Originally Posted by liam
cairo-image rasterises directly from the general complex polygon computed for the path (convert the curves into straight lines, convolve with a pen etc). This essentially folds the two passes peformed by cairo-xlib into one and eliminates the very computationally expensive Bentley-Ottmann routine for tessellating trapezoids. On the downside, cairo-image only uses a single core (and no GPU offload) for its rasterisation. Also, more work can be done for cairo-image to process the path without requiring an intermediate polygonisation (e.g. walk splines within the scanline rasteriser, use a hairline renderer for thin pens, compute offset curves, etc).
The next step to speed up cairo-xlib would be to eliminate the trapezoids and send paths directly to X - fix the protocol to be more useful for cairo, and also coincidentally would enable separate render threads within cairo. For Nvidia, they would then couple up their driver to use their existing NV_path acceleration, and I would do something similar for SNA (as usual, look at the early experiments in cairo-drm) if the GPU was not the bottleneck.
But does it work on Wayland?
Wow, I just came across this response.
Originally Posted by ickle
Thanks so much for the clear and detailed explanation!
Do you happen to know how Microsoft has managed to accelerate 2d operations so effectively with the gpu? As you point out, the branch heavy code seems as if it would be a problem for them as well (I'm assuming they don't use the cpu for that).
Last edited by liam; 12-29-2012 at 06:34 PM.