Concerning Multi-GPU Graphics Cards
Hello again. I have a question regarding how GPU companies like ATI and NVIDIA make Crossfire and SLI "work".
From my understanding, it is completely up to the application (such as a video game) to "see" multiple GPUs on any given graphics card. And, as far as I know, multiple GPUs cannot "share" a common memory pool, and GPU #1 has to copy all its information to GPU #2 in order to work together.
Why can't two or more GPUs share a common memory pool, and if eight GPU's are stringed together via Crossfire/SLI, why can't any game or other application "see" them as one big GPU on one single graphics card?
Obviously I have an incomplete view of GPU and Crossfire/SLI technology, but the current techniques used to make multiple GPUs work together seem awfully tacky to me.
Could someone help me here?
No, your understanding is fine. The problem is that high end GPUs require extremely fast, wide memory buses to keep up with the processing power, and nobody has a good way to bridge that kind of bus from card to card yet. The PCIE bus used for a GPU is usually 16 bits wide and can transfer between 4 and 8 GB/sec. External PCIE links tend to be single lane, or between 0.25 and 0.5 GB/sec, similar to SATA.
A typical high end GPU has a memory bandwidth between 50 and 100 GB/sec - yes, maybe 100x the fastest card-to-card link available today and over 10x faster than the most exotic interconnects used in high end supercomputers. If the inter-card connections could keep up with the kind of memory bandwidth you need to run a GPU at full speed then running off a single memory pool would be a lot more attractive. Even then you couldn't afford to do many accesses to the shared memory because sharing a memory pool also means sharing bandwidth. Multi-socket and cluster OSes try to keep a high degree of affinity between process memory and the CPU running the process, since even with a high degree of interconnect "there's no place like local memory".
Last edited by bridgman; 03-02-2009 at 12:00 AM.
pardon me for poking my nose in, but 50gb/sec isn't exactly out of the reach of Hypertransport, and unless I read specification 3 wrong, it contains data on how to manage a processor to processor link over 3 meters. I would think an interconnect based on Hypertransport just might be able to satisfy some of these conditions.
I do think all this will become possible and it is getting closer. GPU performance and bandwidth requirements are continuing to grow as well, however, so I don't think we can assume that external buses will automatically catch up with GPU requirements.
The HT links being used today are more on the order of 8GB/sec although the number continues to climb. You can double the numbers if you count both directions but I don't think that maps typical GPU access patterns well so IMO the performance would be driven more by single-direction bandwidth.
Even a 32-bit HT3.1 link (spec'ed but never implemented outside a lab AFAIK) only gives 25 GB/sec in one direction and the cost of a connection like that would be *much* higher than anything used today. I haven't seen a spec for what an external 32-bit >3GHz cable would look like but it sure would be a lot more $$ than what we all use today.
Last edited by bridgman; 03-02-2009 at 12:15 PM.
after seeing people drop $600 plus ... each... various Nvidia cards... I've gotten the feeling that some people will do anything if it means .0001 frames more with all details turned off. (yes, I'm actually fussing at some of the people who have asked in the Quake live forums how to turn everything off while NOT running Radeon 7000's or Geforce DDR's).
Anyways, random fussing aside, thanks for the clarification on why that wouldn't work... "yet" I guess.
We ask the same question internally for every new generation of GPUs. It's the high tech version of "are we there yet ?"
Concerning Multi-GPU Graphics Cards
Video Card: 768 MB DirectX 10 Graphics Card with Shader 3.0 support (Nvidia .... still going strong despite no true public announcement concerning the rumored game. .... procedural system that enables multiple layers of dynamic clouds; thus, ... Umbra is GPU accelerated occlusion culling software developed by Umbra
DX10 is shader model 4.0 AFAIK, not 3.0.