Monday, July 24, 2017

Optimization and C

Starting this next game in C, I've known there will be places where C isn't fast enough, and I'd have to drop to assembly to optimize things.  Particularly because the 6502 processor really isn't a suitable target for C.  There's just too large a gap between how C structures things and how the 6502 needs things to be done.

Well, this week I decided to see how long my background updating routine takes.  The easiest way to see how long a long-running routine takes on the NES is to play with the color/bw register.  You can set a frame to render in B&W, and when the routine finishes, change the register to color.  The screen will switch partway through the frame, and you can see by where the color starts how much of your frame's computation time you've used.

In this example picture, you can see that the routine being checked runs from the beginning of rendering, to somewhere around 10% of the screen height.  So it's taking up somewhere in the general order of 10% of the total processing time available.

Well, my background updating routine (which renders a new slice of background just offscreen to prepare for scrolling) was taking somewhere around 40% of my total time.  It was horrible.  After playing around, it turns out that this loop was the culprit:

    while (temp > 0) {  //temp is just a counter of how many times to do this
        cj = *tempPtr2;  //get the current metatile into cj
        slicePtr = (u8*) metatile_ptr[ci]; //figure out which slice array to use
        ci += 4;
        if (ci == 16) {    //if we've finished a tile, go to the next one down
            ci = 0;
            tempPtr2 += yIncrement;

It's not important to get into the details of exactly what this loop is doing, but a few things ended up being problematic:

1.  I look up slicePtr each time through the loop, although if you pay attention, it turns out there are only 4 different values it can be.  Pulling those out of the loop gained me about 5%.  

2. More importantly, and this is where C starts to fall apart, getting a value by index into an array can be super-slow if the array isn't contant. (ie if it's a non-constant pointer pointing at an array of data).  This is because the 6502 only allows indirect indexed addressing from zero page.  And what exactly does that nonsense mean?  The 6502 has a single special page of memory, the "zero page" that's, well, special.  To do a pointer-based lookup, you first have to copy the pointer to the zero page, then do an index from that.  So for a single slicePtr[cj] lookup, it's something like:
lda slicePtr
sta tempPtr
lda slicePtr+1
sta tempPtr+1
ldy cj       
lda (slicePtr),y

That's 21 clock cycles, if I haven't forgotten all my instruction timings in the months since making an Atari game.

So to improve this, I allocated space on the zero page for 4 pointers, and not only pulled them out of the loop (like I was talking about in step 1), but dropped to inline assembly, and saved them on the zero page once, so I wouldn't have to jump through those hoops every time.   This ended up being a huge savings in time.

3. By this point, I figured I had optimized it that far, I might as well go further, and unroll the loop a little, and do most of the computation in inline assembly:

    tempPtrA = (u8*)metatile_ptr[ci];
    ci += 4;
    tempPtrB = (u8*)metatile_ptr[ci];
    ci += 4;
    tempPtrC = (u8*)metatile_ptr[ci];
    ci += 4;
    tempPtrD = (u8*)metatile_ptr[ci];

    while (temp >= 4) {  //temp is just a counter of how many times to do this
        cj = *tempPtr2;  //get the current metatile into cj
        __asm__("ldx %v", vram_buffer_current);
        __asm__("ldy %v", cj);
        __asm__("lda (%v),y", tempPtrA);
        __asm__("sta %v,x", vram_buffer);

        __asm__("ldy %v", cj);
        __asm__("lda (%v),y", tempPtrB);
        __asm__("sta %v,x", vram_buffer);

        __asm__("ldy %v", cj);
        __asm__("lda (%v),y", tempPtrC);
        __asm__("sta %v,x", vram_buffer);

        __asm__("ldy %v", cj);
        __asm__("lda (%v),y", tempPtrD);
        __asm__("sta %v,x", vram_buffer);

        __asm__("stx %v", vram_buffer_current);

        tempPtr2 += yIncrement;
        temp = temp - 4;

It's certainly a bit uglier, but it went from the whole routine being somewhere around 40% of my frame time, to about 10%.   I'll take that optimization.

Edit: And if you're wondering why my variable names are so awful, that's once again an artifact of the 6502 way of doing things.  C's method of allocating local variables on the stack isn't a particularly good fit for the 6502, so it's much faster to allocate a handful of generic global variables on the zero page that can be reused all over the place.  So most of these oddly named variables are common globals that are used for all sorts of things. (ie ci and cj are common index variable that I use in all sorts of loops)

Thursday, July 13, 2017

Level loading and scrolling engine

Well, after a bit of work here, a bit there, between baby feedings and lack of sleep, I've managed to get the first bits of my level-loading and scrolling engine done.

First, I needed to figure out a level format. The NES native background tiles are 8x8.  But levels stored at that resolution end up being huge in ROM space, so most people use something else. Sometimes some sort of compression (gzip? run length encoding?) but that works best with games that either only scroll one direction, or that have a big ram buffer on the cartridge to decompress the data into.  The cheap cartridge mapper/board I'm planning to use, GT-ROM (which has some awesome features) doesn't have extra ram.  So that's out.  Instead, I'm going to try doing 32x32 pixel metatiles.  Where each "tile" of my level data represents 4x4 hardware tiles.  The convenient thing about this is that the NES tracks palette data for backgrounds in 32x32 chunks, so this will simplify that calculation.

So first, I had to build the system for defining and accessing metatiles.  And figure out tooling for having a UI to work with them.  Luckily a guy on the nesdev forums made a cool editor that does just that.  I don't particularly like the level editor piece of that tool, but it's perfect for a metatile editor, and it can spit out the definition in JSON format, which made it really easy to write a python script that gets run as part of my build process to transform the JSON format into the necessary source code data format.

Then, for actual level editing, I decided to stick with Tiled, which I used with Robo-Ninja, and is super flexible.  It can save levels in a nice textual format, so again, I wrote a python script to be run as part of the build process to convert these levels into the format the game needs.

Next up was drawing a level to the screen, and then scrolling.  It's late and the baby finally fell asleep, so I'll talk about that later.  Until then, here's an ugly picture of my test level data.  Frankengraphics gave me some beautiful background tiles to work with, and I turned them into a horrible ugly mess for now. But at least it's starting to look something like a game....

As an addendum:  Python drives me crazy. I know the kids say it's great, but the significant whitespace, the duck-typing, the ugly-formatted documentation - it all drives me nuts.

Friday, June 30, 2017

I had a baby

The new baby showed up a few weeks sooner than we expected.  Everyone is healthy.  But I'm tired. That's all.

Tuesday, June 6, 2017

Blaster MetroidVania

Well, I've been working on the next game, but not saying much.  Mostly because the majority of my free time involves working on fixing things in my new house, or getting ready for the arrival of a new baby in a couple weeks.  So when I do manage to squeeze in some development, I don't have time to say much about it.

That said, I've found someone to collaborate with on the next game, so things are moving forward!  We're doing a metroidvania-style Blaster Master-inspired adventure.

First things first, while Frankengraphics is working on some preliminary art, I'm trying to get a scrolling engine working.  Which on the NES, is a beast.  The nametables (the chunk of video memory that tells the graphics chip which background tiles to draw where) work somewhat similarly to how they worked on the GBA, but with a few things that make the whole thing more complicated, the primary ones being a slow processor, and a slow and unwieldy method for accessing video memory.

The biggest hassle is doing the math to keep track of what address corresponds to what location on the screen, once the screen starts scrolling around.  I won't go into detail now (I'm too tired -- see this post from Spacey McRacey which talks about some of it), but it's math that's simple until you have to do it quickly on a primitive 6502.

All that to say, getting a multidirectional scrolling engine working isn't quick work.

Tonight, I got fed up with trying to debug my function that renders a new column of tiles when scrolling left or right, and decided it was time to add some on-screen debugging.  Usually with any project, there's a point where I get fed up enough to write some decent on-screen debugging features, and that time is now.  The trick is, because I'm trying to debug background tile rendering, I need to make my debugging tools use sprites instead.  So while normally a text system on the nes is rendered using background tiles (because the nes can only render 8 sprites per scanline), I'm going to display it using sprites for now.  Which means no matter how badly I goof up the background layout, I should be (in theory) able to see debug information on the screen as sprites.

I haven't gotten very far with the debugging system yet.  Mainly because I don't have any sprite code at all yet, so I need to start from zero.  That was tonight's goal: get some basic functions set up for rendering general sprites.  Then next time I can add some debug functions that dump data to the screen.

Ok, bedtime.  I realize this post has been pretty rambly and incoherent.  I guess that means I'm tired.

Monday, May 1, 2017

Sixer problems

Well, once again the folks at AtariAge are smart.

The usual cause of this problem is accidentally reading from a write-only register.  On many systems, this will return the address you are trying to read from, but some will give garbage.  But it's an easy mistake to miss:

lda #13 ; loads the value 13 into the accumulator
lda 13  ; loads the value at memory location 13 into the accumulator
; (which actually is a write-only register, so on some systems will accidentally load 13 into the register, but others will load garbage)

There's actually a setting in Stella (the emulator) that lets you force it to return garbage when reading from these registers, so by setting that flag, I can reproduce the problem that happens on the sixers.  And knowing what sort of typo to look for, I can probably sort this out relatively quickly.

Sunday, April 30, 2017

Atari Anguna: bugs on a sixer

First, a brief bit of Atari history:  The original Atari 2600s had 6 switches on the front -- the power, B/W, Select, Reset, and one difficulty switch per player.  Later models moved the difficulty switches to the back.  All the Atari versions now have goofy names that the collectors call them.  The original "heavy sixers" were bigger and heavier than the "light sixers" but both are sixers (having the 6 switches).  You could also have a 4-switch "woody" (that still sported the awesome late 70's wood-grain), or the darker "vader".  Or poor saps like me have the cheap little re-released Junior.

Well, I had only tested Anguna on my Junior, and a 4-switch woody (and in the Stella emulator).  It worked fine, so I figured we were good (theoretically, all the models are supposed to work the same).

Well, I just found out that that's not the case.  The main character doesn't animate correctly on the sixers.  I'm not completely sure what to do about it -- how do I debug a problem that only shows up on hardware I don't have?

Current possibilities, from least to most painful:

  1. Somebody on the AtariAge forums has a guess about a possible difference in the hardware, and I can just go fix it (this would be a long shot)
  2. Get a sixer.  That means either buying one (maybe $50 if I'm lucky) or borrowing one.
  3. Make changes. Email them to someone with a sixer. Wait for report. Try again.

Saturday, April 22, 2017

Why retro games?

Recently my friend Bryan asked the question "Have you written before about what motivates you to create retro games?"

I've somewhat hinted at it a few years ago, but there's a few reasons:

Nostalgia is one piece of it. I grew up playing Atari and NES, and so the idea of being able to make a quality game for the system I grew up with is really fun.  As a kid, I wondered what it was like to make games for these systems, and spent hours sketching game designs on paper.  It's fun to finally get to do what I dreamed about back then. (I've wanted to make a NES game ever since the system first came out. After 30 years of preparation, I finally feel like I can pull it off!)

Another factor is the challenge, and how different it is from my day-to-day work programming.  I like programming, but the change of being so close to the hardware is a nice change..  The problems are somewhat similar to what I might solve at work, but really different in actual code.  And learning how new (old) video game systems work is not only a fun challenge, but also interesting from a history/nostalgia persepctive. (ie "oh, THAT'S how and why they did that on that game!").  Add in some crazy limitations (only 76 processor cycles per scanline!?) and you have a good fun technical challenge.

Really, the biggest reason, though, might be about the community involved.  For modern platforms (PCs, phone games, etc), the indie market is incredibly flooded right now. There are tons and tons of games being created all the time.  While there are a few crazy success stories like flappy bird, the vast majority of indie games don't really ever get noticed.  So you either pick a goal of just trying to be financially profitable (which requires more commitment than I'm willing to make right now, and a team of artists, musicians, etc), or just be happy making a goofy little game for the fun of it (which is what I did with Robo-Ninja).

In the case of Robo-Ninja, about 3000 people installed it.  Other than my friends, I got about one message about it, a couple of store ratings (some 1 star, some 3-5 stars), and that's about it.  I had fun making it, I enjoyed hearing from my friends that played it, but it just felt lonely, like I was riding a big crowded subway car alone in a big city.

In contrast, when I started making Anguna for Gameboy Advance 12 years ago, the GBA was just at the end of its commercial life, so it wasn't about retro gaming: it was just about the fun of making a homebrew game for a video game console.  What surprised me was hearing people talk about it. I'd get emails from people asking about it, reviews appeared on some websites, and there was a community of people to talk about it with.

Atari Anguna was also started just for the fun of it, but upon finishing,  people on the AtariAge forums expressed an interest in the game.  Realistically, it was a lot fewer than the 3000 that had downloaded Robo-Ninja. But it was a lot more fun and rewarding, having a small number of people be vocally interested, as opposed to a few thousand nameless installs.

The height of this was at last year's Portland Retro Games Expo when I got to chat with other Atari and NES developers, many of whom had made games I had played, and a number of whom had played and enjoyed Anguna.

So really, it's not just about the "retro" aspect of retro games, although that's part of it.  It's also about knowing that people play and enjoy my games, and getting to talk to them about it.  Right now that means retro gaming is the fun niche where I can make a cool game without a big team, have a ton of fun doing it, and hear about people enjoying it.  

Oh yeah, and retro games are hip right now.  I'ts fun to be doing something hip.

But who knows, I've always been a bit non-committal in my hobby projects. Maybe I'll suddenly decide I want to try something else in a couple years once I finish this game....

Optimization and C

Starting this next game in C, I've known there will be places where C isn't fast enough, and I'd have to drop to assembly to ...