I, for one, would really like to know how are the binary blobs different. I remember people stating that these don't use Mesa and work quite a lot differently than normal drivers, but no details are ever mentioned.
It's relatively simple, actually.
ATI makes use of the DRI. Their OpenGL libraries thus work with the existing Mesa GL stuff -- they are just drivers including the hardware-specific commands and optimizations that are executed by the Mesa GL API. The kernel module includes the kernel-componant DRI and framebuffer bits that all DRI-based drivers need. In short, the secret sauce is really just the hardware-specific components and the internal accelerated GL implementation, plus some extras like power management. As a result, they are limited by the Xorg/kernel/DRI/Mesa interfaces, and cannot offer features or performance fixes that aren't also possible in the Open Source drivers (which are pushing the new interfaces like DRI2 to fix performance problems).
The NVIDIA drivers do not use Mesa, nor do they use DRI or XAA/EXA. Their libGL.so.1 completely replaces Mesa and directly talks to a proprietary interface exported by the kernel blob. The X driver blob likewise plugs in as an X protocol handler and talks over the proprietary interface (and NVIDIA's libGL) to the kernel. Their secret sauce is pretty much _everything_ in the stack above the X protocol and the GL API. Since they define all of their own internal protocols they have never been limited by the design flaws of the Open Source stack and hence have been capable getting better performance and more features (e.g., they had accelerated indirect rendering long before AIGLX or Xglx ever came into being). NVIDIA did what they did partly because it meant reusing more of their existing code instead of having to code Linux/UNIX-specific code for DRI/Mesa/XAA integration, and also because they knew that those interfaces kinda sucked compared to what they already had and would impose ugly limitations.
The reason these binary blobs break so often is that both the kernel and Xorg have internal APIs for basic things (memory management in the kernel) and domain-specific things (hooking into the X protocol handlers) which can and do change to incorporate performance improvements or bug fixes, and the binary blobs cannot be updated by the kernel/Xorg developers at the same time as they change the interface. You thus end up waiting for ATI/NVIDIA to release new blobs that incorporate the interface changes.
fglrx replaces libGL.so.1.2 of course too, but not the other files. When this is the trick to keep a small install package instead of an ever growing one with every new xorg abi change, I really prefer nvidia's solution...
Oh, I wasn't aware that fglrx replaced libGL. My writeup is a little off, then, probably.
NVIDIA most certainly does not replace the whole Xorg stack. Just the parts of it that translate X commands into actual rendering operations, and (I think) the display management bits. The core non-rendering stuff, the input stuff, and other bits are not touched by NVIDIA's drivers at all.
The idea with things like EGL+KMS+DRI2+GEM/TTM behind the X server is that the Open Source stack will do exactly what NVIDIA does now, albeit using open multi-driver protocols instead of NVIDIA's proprietary stuff. All of the hardware rendering is pulled out of Xorg and put into the GL drivers and all of the display management is put into KMS, and then Xorg really just becomes a window arbiter, event dispatcher, and protocol handler... which is all it is when running the NVIDIA driver.
Wayland is the same thing, just with the X protocol removed and replaced with a bare-bones non-network-capable protocol, and a lot less code as a result of that. (The basic idea there being that if RENDER is implemented by having the client send commands to another process which just issues the same GL commands through the same libGL that the client could've done itself, why not just remove all that protocol and make the client do it? Then merge the compositor into the window server so you don't need protocol for an external process to access other process' window pixmaps, ask clients to manage their own window position and size and hope they all behave consistently, and voíla: you've got a super-simple window system that is far lighter -- but notably less flexible -- than X.)