In my previous post about the volatile keyword, Sverx left an insightful comment, noting that because nothing is changing the SRAM data outside of my code, it shouldn't really need to be marked as volatile. Which is true -- nothing really changes it at run-time.
So what was the actual problem and why did marking it as volatile correct it?
For one thing, as sverx mentions in the comment, SRAM can only be written 8 bits at a time. Which I'm doing (casting my data into 8-bit chars and looping through writing them one at a time). But now I'm wondering if the newer versions of gcc saw that, decided I was stupid, and optimized it into 16- or 32-bit writes. Which would make sense why adding debugging messages in the inner loop would change it. Marking as volatile might also have been enough to scare the compiler off from over-optimization, and fixed it as well, although not quite as correctly.