Results 1 to 5 of 5

Thread: Modern cards framebuffer organization

  1. #1
    Join Date
    Dec 2012
    Posts
    3

    Default Modern cards framebuffer organization

    Hi all!

    Does anybody know how framebuffer is made in modern cards? I've heard that in older days dual ported DRAM (called VRAM) was used: RAMDAC read from one port, GPU write to another. But today AFAIK IHV use DDR/GDDR which has single port only. Did buses become fast enough to handle read and write to framebuffer via single port, or there is a dual ported VRAM hidden in GPU chip (cause there are no such chips mounted on PCB)?

  2. #2
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,386

    Default

    Yep, single port but:

    - relatively wide (256-384 bits on high end GPUs vs 64 bits/channel on CPUs)
    - long bursts in and out of memory to maximise transfer rate
    - caches optimized for throughput (eg read-only caches for textures)

    Once you get onto the GPU the caches, registers and local stores do have many banks/ports in order to support simultaneous access.

    In "older days" feeding the display consumed a big part of the available bandwidth, so having a dedicated port and on-chip shift register really helped.

    It was actually the on-chip shift register that made the biggest difference -- that allowed an entire row to be read from the DRAM array and dropped into the shift register with a single RAS/CAS cycle, then the graphics engine would have full time access to the memory interface while data was shifted out to the display. Normal memory cycles could only access a single bit from the row on each access, vs the full-row access of a VRAM.

    Nowadays the same approach is still used but rather than having a wide on-chip shift register the sequence is :

    - memory controller starts a page-mode burst
    - DRAM reads an entire row (DRAM always does this even for a single-bit access)
    - memory controller burst-transfers the row (or 1/2, 1/4 etc..) into on-chip line buffer
    - graphics engine gets full time access to memory while data is shifted from line buffer to display

    Note that modern GPUs support multiple displays so you typically have multiple line buffers as well.
    Last edited by bridgman; 12-16-2012 at 07:51 AM.

  3. #3
    Join Date
    Dec 2012
    Posts
    3

    Default

    Thanks for answer! Seems, it is rare and precious information

    Quote Originally Posted by bridgman View Post
    In "older days" feeding the display consumed a big part of the available bandwidth, so having a dedicated port and on-chip shift register really helped.

    It was actually the on-chip shift register that made the biggest difference -- that allowed an entire row to be read from the DRAM array and dropped into the shift register with a single RAS/CAS cycle, then the graphics engine would have full time access to the memory interface while data was shifted out to the display. Normal memory cycles could only access a single bit from the row on each access, vs the full-row access of a VRAM.
    Does this means that GPU main processor itself did a work of feeding display from this shift register? Both ports of VRAM was connected to GPU itself? I thought it was external device, RAMDAC, which read a memory and send colour bits to encoder..

    Quote Originally Posted by bridgman View Post
    - memory controller burst-transfers the row (or 1/2, 1/4 etc..) into on-chip line buffer
    - graphics engine gets full time access to memory while data is shifted from line buffer to display

    Note that modern GPUs support multiple displays so you typically have multiple line buffers as well.
    On-chip line buffer is just a part of on-chip memory? It's fast enough, that's because register is no more used?
    What is graphics engine? I know, it is too many questions, could you please provide a link to any texts about subject?

  4. #4
    Join Date
    Dec 2012
    Posts
    3

    Default

    Thanks for answer! Seems, it is rare and precious information

    Quote Originally Posted by bridgman View Post
    In "older days" feeding the display consumed a big part of the available bandwidth, so having a dedicated port and on-chip shift register really helped.

    It was actually the on-chip shift register that made the biggest difference -- that allowed an entire row to be read from the DRAM array and dropped into the shift register with a single RAS/CAS cycle, then the graphics engine would have full time access to the memory interface while data was shifted out to the display. Normal memory cycles could only access a single bit from the row on each access, vs the full-row access of a VRAM.
    Does this means that GPU main processor itself did a work of feeding display from this shift register? Both ports of VRAM was connected to GPU itself? I thought it was external device, RAMDAC, which read a memory and send colour bits to encoder..

    Quote Originally Posted by bridgman View Post
    - memory controller burst-transfers the row (or 1/2, 1/4 etc..) into on-chip line buffer
    - graphics engine gets full time access to memory while data is shifted from line buffer to display

    Note that modern GPUs support multiple displays so you typically have multiple line buffers as well.
    On-chip line buffer is just a part of on-chip memory? It's fast enough, that's because register is no more used?
    What is graphics engine? I know, it is too many questions, could you please provide a link to any texts about subject?

  5. #5
    Join Date
    Apr 2008
    Location
    Saskatchewan, Canada
    Posts
    460

    Default

    It's a few years since I've worked on graphics chips, but AFAIR the VRAM output was connected directly to the RAMDAC for the screen, so the GPU didn't have to read the data, just clock it out.

    With more modern chips the GPU reads the framebuffer data just like any other memory and sends it to the display itself. It may have to read from multiple buffers for overlays, cursor, etc to generate the final result to output by merging them together. Also, the framebuffer may not be linear (e.g. tiled for better render performance), so that adds more complications to the display hardware.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •