I already pointed out that yes, the draw call overhead in OGL is less than D3D in Windows, because of how D3D runs almost entirely in a separate process (so there's extra context switches and IPC for every D3D call) for security and stability reasons while the GL drivers do not.
It's like comparing OpenGL on classic Mesa vs OpenGL on Gallium. You get totally different performance numbers even though they're both using the extra same frontend Mesa OpenGL code. You even see classic Mesa running faster, or with more features, or with more stability in some cases, even though Gallium is generally considered the superior backend.
Where things go in D3D's favor are the API requiring less overhead (and yes, this _is real_, which is specifically why NVIDIA created all those fancy NV_ extensions for bindless graphics and direct state access that aren't in OpenGL proper -- they really do make a difference!) and the case that the D3D drivers on Windows are usually of higher quality than their OGL counterparts simply because the drivers are tested and vetted by far more apps.
It's d3d 10 that copied OpenGL solution and become faster than it's predecessors (but it seems it's still slower than OpenGL). Windows approach has nothing to stability and security, but it was just a design mistake.
Since OpenGL's IHV drivers have a user-mode component to them, IHVs have the ability to implement marshalling, thus improving performance. There is still kernel-mode switching, but the theoretical maximum number of switches under OpenGL implementations is simply equal to the Direct3D standard behavior.
Last edited by kraftman; 08-14-2012 at 04:16 AM.