A while back I posted that I was looking for an RROD Xbox360; I actually sent it off to MEFAS to get digested for solder joint inspection on the GPU through a process called "dye and pry". In this process, the motherboard is flooded with red ink, and then the GPU is mechanically pried off the board. The red ink flows into any of the tiny cracks in the solder balls, and at least in theory, when you pry the GPU off the cracked regions will shear first so you will be left with visible red spots at the points of failure.
I was a bit puzzled by these results because you didn't see any "catastrophic" failure -- pools of red ink over a connection interface -- just partial cracking. Partial cracking isn't terribly uncommon, and many products work quite well despite such artifacts. However, after reading the [SeattlePI RRoD] article, if Microsoft shorted safety margins around many of the design parameters to get the product out on time, it makes sense that the summation of many partial failures could lead to a total system failure -- failures that have symptoms that vaguely cluster together but are difficult to point to any single root cause. Heisenbugs. Yuck