It's a bit of both. In theory we have all the hardware details; in practice we have the hardware details from when the hardware was designed, which is theoretically the same as what was implemented but in practice may be a bit different. The last couple of generations haven't had that problem but it was a real challenge back in the 5xx/6xx days. We can also comb through bug reports and the associated fixes, but anyone who has searched an engineering-level bug tracker for matches to their symptoms understands how fruitless that can be despite everyone's best efforts.
In many cases we know what registers need to be programmed and what to put in them, but there are subtle seauencing dependencies which are not documented anywhere.
The devs have fairly easy access to 99% of the HW details and can work through the driver and hardware teams to get to maybe 99.9% fairly easily. That last 0.1% is a real pain though... and quite often the chip seems completely broken until you find that last 0.1% ;(
There are all kinds of troubleshooting tools, it's just that modern high end GPUs are really big and complex. That works OK when the effort is spread across thousands of people, but it's a bit hard for a small team to do the same. That's why I have hope for piggybacking on the original design effort.
Normally the devs work from HW documentation, diagnostic code and information they get from hardware and driver teams (not the actual driver code). Anything pulled directly from the proprietary driver code needs to be reviewed to a greater extent than normal. We basically need to differentiate between "how the chip is programmed" (which we want to release), "how our driver is designed", and "how the chip is designed" (we don't want to release the last two. The practices are constantly evolving, but this is what we do today.
EDIT - I wonder if having to delete and repost rather than edit is artificially inflating my post count ?


Reply With Quote

