Yep, single port but:
- relatively wide (256-384 bits on high end GPUs vs 64 bits/channel on CPUs)
- long bursts in and out of memory to maximise transfer rate
- caches optimized for throughput (eg read-only caches for textures)
Once you get onto the GPU the caches, registers and local stores do have many banks/ports in order to support simultaneous access.
In "older days" feeding the display consumed a big part of the available bandwidth, so having a dedicated port and on-chip shift register really helped.
It was actually the on-chip shift register that made the biggest difference -- that allowed an entire row to be read from the DRAM array and dropped into the shift register with a single RAS/CAS cycle, then the graphics engine would have full time access to the memory interface while data was shifted out to the display. Normal memory cycles could only access a single bit from the row on each access, vs the full-row access of a VRAM.
Nowadays the same approach is still used but rather than having a wide on-chip shift register the sequence is :
- memory controller starts a page-mode burst
- DRAM reads an entire row (DRAM always does this even for a single-bit access)
- memory controller burst-transfers the row (or 1/2, 1/4 etc..) into on-chip line buffer
- graphics engine gets full time access to memory while data is shifted from line buffer to display
Note that modern GPUs support multiple displays so you typically have multiple line buffers as well.


Reply With Quote
