It also gets confusing because GEM defines some calls as "driver specific", so you have "the Intel implementation of GEM" vs other implementations of GEM where the driver-specific calls are definied differently. I'm guessing that those driver-specific calls are where TTM shows through, but I stress that is only a guess.
I suspect that when people talk about "entirely GEM" they really mean "Intel's implementation of GEM with its particular driver-specific calls", not something that is "more true to GEM" than any other implementation.
A few months back the discussion on the boards was primarily around the fact that Keith felt the GEM code he had written should be directly useable on non-Intel GPUs, while other devs working on those GPUs disagreed. Given that you have knowledgeable, experienced people on both sides of the argument it's not at all clear which of them is right; the best anyone can do is add their own opinion to the mix. I suspect the true answer is along the lines of "yeah, I guess we could use the code Keith wrote but it would be a lot more work and risk than leveraging what we already have running (ie TTM)".
The other source of confusion is that in an IGP part (Integrated Graphics Processor), there is no dedicated video memory, just a reserved area of system memory, so with the right cache tricks the CPU can access that reserved area of memory like it was system memory (more quickly) rather than accessing it through the GPU. This is useful for IGP parts but not applicable to discrete parts with their separate video memory, so I suspect there is some sentiment that some of the Intel GEM code is not applicable to discrete GPUs. No idea how true this is these days.
The important point to understand is that system memory can be accessed more quickly than dedicated video memory by the *CPU*, while dedicated video memory can be accessed more quickly than system memory by the *GPU*.