Thursday, August 1, 2019

Crazy crashing bug

Normally after I test on an emulator for awhile, I eventually throw the game onto hardware, and get frustrated at it not working.

THIS TIME, however, I ran into something new. The game was working on my primary-use emulator, Mesen (which is accurate, easy to use, has a wonderful debugger, and an author who is always adding new features). It worked fine on hardware. But this week I was away from my computer a bit, and wanted to do some testing, so I threw it on my phone (which is using the quite common emulator, fceux).  And on my phone, it now won't run AT ALL.

fceux is a pretty good emulator. Not quite as 100% accurate as Mesen, but usually good enough.  What could cause the game to crash on it but not Mesen or real hardware?

As I usually do when I get stumped with a hard-to-debug issue, I did a git bisect. Git bisect is the one feature of git that, for a single-developer project, makes it better than subversion. It helps you quickly run through past changes, and find the one specific change you made that causes a behavior to happen. I won't go into detail here about it, but if you haven't used it, you're missing out.

Anyway, I did a git bisect, and discovered that the breaking change was just adding a new room to my game. No new code, no new features.  With some experimentation, it turned out that adding ANY new room would break it. Which sounds like maybe I was pushing something over a bank boundary in my rom.  So I dug up my assemblers map file, and no. Nothing was being pushed out of place.

So what could it be? fceux has a debugger, but for some reason, I can't understand how to use it. Did I really need to learn its debugger just for this?

Well, because the problem happened somewhere on startup, the next thing to try (before learning the new debugger) was to insert a bit of code that would turn the screen blue and then infinitely loop. I'd add that at my first line of code, compile and test.  It turned the screen blue on fceux, as it should. Which meant I was getting that far in my code.  For the next few hours, I moved that snippet of code forward and back, trying to find the exact function call that was crashing.

The result?  It turns out I was calling the update function of the music engine (ggsound) before ever calling the initialization function of the library.  Ok, that's bad.  But why would it work on hardware and Mesen but not fceux?  Suddenly it occurred to me.  Mesen and hardware start up with mostly random values in ram.  Fceux starts with zeros. I'm sure that some branch somewhere in the music engine branches on zero, and breaks.  The chance of that piece of memory being 0 in Mesen or hardware are just 1 in 256. So I never saw it crash on those platforms.

That said, I still have no idea why it didn't happen until I added one more room to the game. Probably some dark voodoo. I don't want to know.

No comments:

Scrolling Camera Tweaks

One of the subtle things that can really impact how a scrolling platformer feels is how the camera works. Does the character stay centered, ...