You totally wrong about vga part. A shader can run raster and texture mapping data, but using rops and tmus making it faster(if you theoretically cut all tmus and rops, still the vga will produce the same graphics at 60% speed. Rops and tmus are as many needed to assist stream processors, so you count only teraflops. A 512bit fermi has 700 64bit instruction teraflops at 1.3-1.4ghz, but you must count stream 32bit simple add functions. So you multiply by 6(fmac=3ops,64bit dual issue cores=2ops), and its 4+teraflops. CellBE for example has 250-i-gflops or 3-s-teraflops, rsx(200gflops uses 6spes= +1.85tflops. In d3d-to-ogl translations, gflops don't mater match, that mater is the instruction set, if you have many emulation and jit instructions then you are fast, see this in l3c part for example: http://en.wikipedia.org/wiki/Loongson


Reply With Quote
