All posts by Eevee

Weekly roundup: Vacay

Post Syndicated from Eevee original https://eev.ee/dev/2019/07/16/weekly-roundup-vacay/

I’m burnt out. I just can’t get into anything. And I’ve been dealing with a huge stack of accumulated errands from last month. And it’s fucking hot in here and that just pisses me off all the time???

So I’m trying to step back and chill and draw and hang out with folks and whatever. Sorry. I don’t know why I’m apologizing.

  • fox flux: Added some sparkles to a key.

  • mario maker: Made Star Anise’s Dream Land (5TQJG0MNG), a happy-go-lucky level inspired by my cat, and Koopa Valley (463-9CJPVG), an attempt at some standard friendly SMW-like fare. Also made half of like six other levels, but I’m having trouble even finishing those.

  • art: I’ve been drawing, just, a bunch of porn. It’s nice to be getting back into that. Drawing, I mean, not porn. But porn too.

See you next week.

Weekly roundup: Let’s try that again

Post Syndicated from Eevee original https://eev.ee/dev/2019/07/02/weekly-roundup-lets-try-that-again/

Hello, hello! It’s been a while. June ended up being an avalanche of errands and personal problems that neatly segued into each other, over and over. Good times! I think everything’s settled down now, but who knows.

Anyway, that gives us three weeks to catch up on:

  • fox flux: Finished and committed a bunch of half-implemented ideas in an attempt to get git clean for once (still more to go though); took a crack at porting sound effects from MilkyTracker and sfxr to Sunvox, which was much harder than expected; experimented with a nighttime palette; drew some new vastly improved swimming sprites from scratch.

    Did some work on the camera, which has always been pretty lazy. (I’ve improved it a lot since that recording, so don’t judge it too harshly.) Started on a redone menu, which should be a great improvement over the demo’s menu which was just “resume” and “quit”. Redrew the base dialogue portraits, and they look fantastic, but apparently I never tweeted about that, but you can see it in the next link!

    After spending all this time on miscellaneous mechanics and other bits and pieces, I decided it was finally time to get a basic gameplay loop going — enter a level, get some stuff, leave the level. The results are extremely rough, but I’ve made a start! It’s turning into a game! Which is weird because it was already a game once!

  • secret game engine thing: Not a lot, but I’ve cleared some design roadblocks that were seriously getting in the way.

  • art: Some doodles. Also I drew some beautiful gift art for my and Ash’s Metapodth anniversary.

  • alice’s day off: Wrote some stuff! It’s a miracle.

Currently attempting to get my ass back in gear, with moderate success.

Weekly roundup: Ironically stable

Post Syndicated from Eevee original https://eev.ee/dev/2019/06/12/weekly-roundup-ironically-stable/

I remain on a fox flux kick. Keep trying to do other stuff as well and then not doing that? Hm.

  • fox flux: Documented the hell out of all my rewritten collision code, removed some old hacks, put some methods on a new type that was an ad-hoc table before, and fixed a final remaining edge case in a satisfying way. Did kinda start writing about all this but didn’t finish it yet.

    Then I fixed all the stuff I’d broken about pushing in the process, and cleaned it up somewhat.

    Water is gradually improving but still kinda rough.

    Also added some experimental candy? Candy is pretty good.

    I did some more overhauling of the palette; I’m really really liking how it’s coming out.

    And also a preposterous amount of brainstorming. Like I’ve got half a dozen sheets of paper with tiny 8pt notes crammed on them. This ought to be a fun game.

Welp, back to that, then.

Weekly roundup: Exactly at the top

Post Syndicated from Eevee original https://eev.ee/dev/2019/06/05/weekly-roundup-exactly-at-the-top/

Hello! I’ve been a little preoccupied with meatspace things again, but here is some digital stuff.

  • fox flux: I have been a busy little beaver. I consolidated 1D and 2D motion, made ground adherence more conservative about how far it tries to drop you, and totally overhauled climbing to not incredibly suck. But who cares about any of that.

    What I really did is spend like a solid week overhauling collision detection. Finally, after years of wanting it, I have overlap resolution and nearly zero-cost contact detection! Which means that if objects overlap by some horrible twist of fate, instead of freely clipping through each other, they’re now free to move apart but not closer together. It’s god damn magic. Also I now know exactly where you’re touching objects which will probably come in handy for like, critters that walk back and forth on a platform without walking off it? Or something? I forget exactly why I wanted that but hey it’s nice.

    As an added bonus, I can finally fix climbing off the top of ladders — instead of hopping off the top and then landing, you stop at exactly the top, which is incredibly satisfying.

    I will almost certainly be wringing a blog post out of all this.

  • art: I worked more on that animation and then kinda forgot about it. Hm. Also some doodling or whatever?

    I drew a little… comic? Series of panels? I drew a thing about a ground adherence bug I ran into, and also a general explanation of ground adherence. It’s on Twitter, though it seems worth preserving elsewhere, once I figure out where that is.

  • gleam: I finally made some kind of real start on an editor for the little Flora VNs I put together. It doesn’t do a lot yet, but it has some UI, which is backwards from how I usually make these things, so that’s promising.

  • stream: Ash streamed some Spyro while I commentated, and then I streamed some Hat in Time while they commentated, and that was all great.

I am juggling too many things but I extremely want to get them all moving so I guess I’ll get back to it!

Weekly roundup: Pushing it

Post Syndicated from Eevee original https://eev.ee/dev/2019/05/15/weekly-roundup-pushing-it/

I remember saying something about balancing my time better, and that did not happen.

  • fox flux: I basically spent the whole week working on push physics. It was tough going at first, but I finally got it working correctly which feels like a goddamn Christmas miracle.

    I probably did some sprite work in there somewhere too, to let my brain cool down a bit.

    I’m excited about this game, ah! There’s a ton of work to go but I’m actually starting to see some mechanics come together.

  • stream: Ash and I played a ridiculous adventure game for a bit. Hm, maybe we should finish that. It’ll be on YouTube, uh, eventually.

Not so much this week; I ended up nocturnal and that threw me entirely for a loop. Back to waking with the sun now and feeling pretty good, so, fingers crossed.

Weekly roundup: In flux 2

Post Syndicated from Eevee original https://eev.ee/dev/2019/05/05/weekly-roundup-in-flux-2/

  • fox flux: I’m not sure what happened but I mostly did fox flux this week! It’s kind of a huge mess at the moment — I have a thousand lines of uncommitted changes from a dozen different half-finished experimental ideas, which makes starting on a new idea a bit daunting. So I spent some time finishing up and committing about half of that stuff, and then… um… started a few new half-finished experimental ideas. I am good at software development.

    I got a bit lost in the weeds trying to make the physics of pushing blocks work a bit better, which I’d still like to do, but I think it might require completely rethinking how pushing works (mainly in order to avoid a two-pixel gap in some situations, sigh, but that kinda thing’s important to me) and also redoing how friction and whatnot works. I can’t wait.

    Also been finishing up some visual effects I started ages ago but didn’t quite figure out, filling in some missing pixel art (which I think I got a little better + faster at), and fleshing out mechanics + trying out some new ones. It turns out, if you think your game needs more mechanics, a good place to start is to implement the existing ones so you can run around and play with them freely and see what new stuff comes to mind. Who knew?

  • art: I painted a picture. Not porn, for once! I’m definitely gonna do this more often; it was quicker and easier than I expected, and came out better too.

I missed working on fox flux and am glad to be doing it again, but I’ve clearly gotta balance my time across other stuff a bit better, too.

Weekly roundup: Bit of this, bit of that

Post Syndicated from Eevee original https://eev.ee/dev/2019/04/29/weekly-roundup-bit-of-this-bit-of-that/

I don’t have a cool theme or pun this week!

  • irl: I did a whole bunch of errands, aggressively slashed my tab/email count, and went hiking. Very exciting for you, I know.

  • secret thing: I taught it to animate tiles and movement, and tried this out with a conveyor belt, which instantly threw a wrench in my whole plan. Hm, well, I’ll figure it out. I wrote about the concept for $4 patrons (who will also be getting a bunch of beta builds when this is usable), if you’re interested.

  • cherry kisses: I have like four logic bugs reported by several different people, and they all feel related, but they’re also completely impossible. Like there is no way any of these could’ve happened. Except they did. And I have no goddamn idea how. I’ve spent like a day and a half wrestling with this and have barely made progress so far, but I would really like to make the game not randomly crash for folks.

    At least it autosaves, I guess.

  • art: I drew more things and I increasingly like them! I don’t know what happened, but I hit a point where I’m aggressively attacking all kinds of small details that I don’t do quite right — details that, formerly, I’d just glaze over because it was hard enough getting the general pose right. So that’s good.

    Still working on categorizing old SFW art, too. There’s just a whole lot of it.

  • fox flux: I picked this back up, but didn’t actually make tangible progress until Sunday, but I’m listing it anyway to pad this list out a bit.

  • streaming: Ash played through Doom II totally blind, while I provided commentary, which I guess doesn’t make it totally blind. Anyway we have a whole playlist of this nonsense now.

I’m juggling half a dozen things and am generally excited about all of them! It’s a nice way to feel.

Weekly roundup: Back to normal

Post Syndicated from Eevee original https://eev.ee/dev/2019/04/22/weekly-roundup-back-to-normal/

As I said before, I was occupied for a bit, but now I should finally be able to get back to doing these weekly! I did manage to get a few things done over the past three weeks:

  • flora: Finished up and published a Luneko species sheet! Happy April Fool.

    That’s Anise. Anise is the April fool, and also he’s happy.

  • blog: I wrote about how the particle wipe generator works, in lurid detail! I think it’s an interesting little read, even if you have no use for the tool itself.

    I also spent a lot of time backfilling old art on my (clean) art gallery. It’s not updated quite yet; there’s a lot to go to, shockingly so, and I haven’t even made it through year one yet. Honestly, I’m kind of embarrassed by how much my output declined over time.

  • art: Speaking of, I’m back to drawing regularly, instead of just saying I wish I were drawing regularly! I think I’ve actually been drawing pretty regularly for like two weeks now. Most of it is porn. I should probably draw some not-porn, too. It’s just, you know, porn is a lot of fun to draw.

  • secret thing: I laid some groundwork for the little game engine I’m writing and haven’t really talked about yet. More on that, including maybe even a name, once I feel like I have some kinda proof of concept.

  • sudoku thing: I taught it about extra regions so now it can be used to play hyper sudoku? I don’t know why I’m even making this. It’s kind of unusable until I add undo/redo and puzzle generation, and both of those are effort. I guess I’ll see if my spite is strong enough to power me through both.

  • streaming: Ash and I played video games on the internet while high and you can watch it if you really want to for some reason.

Hey, that’s not too bad a haul, considering I didn’t even have time to work for most of the month! Got some good stuff going on, glad to see I’m up to speed again at last.

Particle wipe generator

Post Syndicated from Eevee original https://eev.ee/release/2019/04/20/particle-wipe-generator/

Animation of solid orange transitioning to green via a swirl of little fox face shapes

🔗 Particle wipe generator on itch or hosted locally
🔗 Source code

This is a tool for making particle wipes, a type of transition whose name I made up because I don’t think they have a well-known name! They can be used in Ren’Py, RPG Maker, or anything that lets you write a shader.

Most of my games have done screen transitions with simple fades, and I wanted to try something different here, but I couldn’t find a tool to make the effect I wanted. So I wrote my own. If you’re interested, here’s how it works:

The idea

I was inspired by two things. One is Cave Story’s transitions.

The end of Cave Story's intro cutscene, which transitions to gameplay with an animated pattern of diamonds

That looks rad, right? I think it does, anyway. I wanted to do something similar myself.

At a glance, this effect looks pretty simple. The screen is sliced into a grid. A diamond shape starts expanding from the center of each cell until the cell is filled. By staggering when each cell starts, you can make an animation that seems to wipe from the bottom upwards, or from the edges inwards, or who knows what else.

Here’s a frame from the above capture, showing the grid. You can see from the blocks near the middle that it’s the same as the tile grid.

Notice that Cave Story transitions either from a scene to a solid color, or vice versa. Offhand, I don’t think the game ever transitions directly between two scenes.

My guess is that it’s manually drawing solid color on top of the tilemap until the entire screen is obscured, switching maps in the background, then reversing the progress. The various sizes of diamond might even be physical sprites on a foreground layer!

That poses a slight problem for me, because I want to be able to transition directly between scenes as well. Enter inspiration number two: Ren’Py.

Ren’Py is a visual novel engine, and it supports a ton of screen transitions. That makes sense, since visual novels generally don’t have much animated art, so most of the animation happens in transitions and sprite effects.

One such transition is a generic one called ImageDissolve, which can do a mask transition (another term I made up). It takes a grayscale mask, which tells it the order to reveal pixels. Where the mask is black, the corresponding pixels of the “after” scene are shown almost immediately; where the mask is white, those “after” pixels are the last to appear.

(I suddenly realize that Ren’Py does that backwards, with white pixels being first, but that doesn’t make sense to be since black pixels are zero.)

That’s a bit of a mouthful to describe with text, so here’s a basic example. A linear gradient from black to white will play out as a straight wipe in the same direction.

This approach can capture any kind of transition where pixels are revealed in a given order, and if I implement it with a shader (which is very easy), I can emulate the Cave Story style without being limited to a solid color! Neat!

The problem

The problem is… how do I generate the mask image? I searched around a bit and found folks who’d made transitions for use with Ren’Py, but no explanation of how they did it.

Let me think about emulating Cave Story’s effect using the mask approach, one step at a time.

Forget about the wipe effect for now and concentrate on a single cell. When the diamond is just starting to appear, it should be black. When it completely fills the cell — i.e., when it’s big enough that its edges just barely touch the cell corners — it should be white. In the middle somewhere, it should be medium gray. Imagining (or drawing) a few cases suggests a simple diamond gradient, which seems correct.

Now for the wipe effect. All it really does is stagger when the animation starts. The upwards wipe, for example, starts animating all the cells on the bottom row, waits some short amount of time, then starts animating all the cells on the next row up, and so on. An ASCII diagram of this process (for a simplified, smaller screen) might look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
   00 |                     XXXXXXXX
   01 |                  XXXXXXXX
R  02 |               XXXXXXXX
o  03 |            XXXXXXXX
w  04 |         XXXXXXXX
   05 |      XXXXXXXX
   06 |   XXXXXXXX
   07 |XXXXXXXX
      +-----------------------------
                  Time

My example cell above spans the full range from black to white, but if I want to stagger the cells like this, I need to squash that into a smaller range. What range, though? To get that scaling right, I need to know the total time the entire animation takes.

That’s kind of a weird question, because nothing I’m working with actually measures time! I only have numbers from 0 to 1; the amount of time is really a matter of how fast you play back the animation.

So let me approach this the other way around. The total “time” is 1, the full range of values I’m working with. My example has 8 rows. Each row starts playing after the row beneath it is ⅜ of the way through its animation; call this fraction the “delay”. There are 7 such delays, one fewer than the number of rows, because the first row doesn’t have a delay.

If the length of a single cell’s animation is \(t\) (which is actually a fraction of the length of the whole animation), then the last row starts after a total delay of \(t \times (8 – 1) \times \frac38\). Its own length is \(t\), and that should bring us to the end of the animation, so:

$$
\begin{align*}
1 &= t \times (8 – 1) \times \frac38 + t \\
&= t \times (7 \times \frac38 + 1) \\
&= t \times \frac{29}{8} \\
\Rightarrow t &= \frac{8}{29} \approx 0.276 \\
\end{align*}
$$

And indeed, if you count characters in the diagram, each bar is 8 long out of a total width of 29. Neat! All I have to do is make the bottom cells range from 0 to 0.276, the next row up range from 0.103 to 0.379 (the same size range, but moved up by \(\frac{8}{29} \times \frac38 = \frac{3}{29}\)), and so on.

Easy. Blog post done.

Except…

I wanted to use hearts. And hearts create two new issues.

The first is that I don’t know how big a heart would have to grow to cover the entire cell. For diamonds, that was easy: they’re symmetrical in the same way as squares, so it’s obvious that they just need to be big enough to touch the corners. But how big does a heart have to be to fit a square entirely inside it? Do I gauge it by hand in an image editor, or what?

The real problem there is that a heart is, presumably, a bitmap rather than a simple shape with properties I can examine mathematically. And even if it were a shape, the math would get pretty ugly pretty quickly.

But the second problem is worse. Hearts aren’t vertically symmetrical, which means a neighboring heart might poke into a cell and start covering pixels that the native heart hasn’t covered yet.

A 3×3 grid of hearts expanding out of their cells, showing that the top of a heart can grow into the cell above.

This complicates things considerably. If I took the naïve approach of gluing together a bunch of independent cells, then the top of each heart would reach the top of its cell and flatten out into a hard border! Sounds ugly, especially since the grid isn’t really supposed to be visible in the animation. (Technically this could happen with diamonds too, if the delay were high enough, but their symmetry makes it much harder to notice.)

Now, I could fudge my way through both of these problems with sufficient abuse of an imaging library. Draw a very tiny black heart, then draw a slightly bigger almost-black heart, and keep expanding until every pixel has a color, then either scale the colors or go back and do it again knowing the correct range.

But that’s not a very satisfying solution, and it’s not very precise — which is important when I only have 256 values to work with. I can do better!

Doing better

The approach I used for diamonds above is fairly promising. It’s most of the way to a blueprint for figuring out exactly what shade each pixel of the mask should be, independently of any other pixel. The position within a cell tells me how far along in the cell’s animation the pixel is (center black, corners white, everything else somewhere in the middle), and the delay tells me how to scale that to fit correctly in the full animation.

Those seem like reasonable steps. All I have to do is fix them to work with an arbitrary “particle” shape. Somehow.

Step 1: the stamp

Forgetting about the overall animation worked before, so I’ll do it again and concentrate on a single prototype cell. That cell will be repeated (with some adjustment) all over the final mask, so I call it a stamp.

I already know in advance that a single cell doesn’t need to scale from 0 to 1, since I’ll be adjusting it later anyway, so that frees me up to use any arbitrary quantity — as long as it’s scaled by some consistent factor I can eliminate later. A little thinking suggests that what I really want to know is: given a pixel \((x, y)\) within a cell, how big does the heart particle have to grow to hit that pixel? I can express that as a fraction of the particle’s original size (since it should grow proportionally), and then worry about scaling it down later.

The first thing I want to do is change my coordinate system. Consider: for a 10×10 cell, the center is at the point (5, 5), which neighbors pixels (4, 4) and (5, 5). But that would mean the heart would touch the pixel at (5, 5) immediately, whereas it would need to cross a whole pixel to reach (4, 4), even though both pixels touch the center!

Close zoom of the problem described above, with the top-left corners of pixels in red, and their centers in blue

Pixel coordinates refer to the top left corners of the pixels, indicated in red above. The center is a point, not a pixel, and it’s clearly much closer to one pixel coordinate than the other. The fix is to use the centers of pixels, indicated in blue, which are the same distance from where the heart starts. Phew!

(If you don’t do this, you’ll get a very noticeable diagonal gash where the particle touched lower-right pixels earlier than upper-left ones. Guess how I found that out!)

While I’m at it, pixel coordinates are relative to the upper-left corner of the cell, but the most interesting point here is the center. So let’s make them relative to that, too. That means (4, 4) and (5, 5) should really be (-½, -½) and (½, ½), or more generally: given a center at \((c_x, c_y)\), the point I’m actually interested in is \((x + \frac{1}{2} – c_x, y + \frac{1}{2} – c_y)\). Call this, I dunno, \((d_x, d_y)\).

Back to the actual problem, which is: how big does the particle need to grow to hit this point?

Like I said before, I could try scaling the particle up bit by bit (maybe binary search?) until it touches the point, but that still feels goofy and imprecise.

You know, it sucks that the particle is a two-dimensional shape. It would be swell if I could eliminate a dimension here, or something.

And here I borrow a couple techniques from collision detection. Scaling the particle up is equivalent to scaling the entire cell down. If I scaled the cell down towards the origin, the point would trace a straight line.

Animation of a grid scaling down towards the origin, showing that a point traces a straight line

This is very helpful. It means I can solve this problem with a raycast: fire a straight ray into the particle, towards its center, and check each pixel it hits until I find an opaque one. That’ll give me a perfect answer!

But where does the ray start? I have a point in the grid, but not a point on the particle. So the first question is: if the particle scaled up just enough that the edge of the particle image touched the point, where on the particle would that contact be?

The same point as before, but with the particle grown to barely touch it

Call the particle dimensions \(p_w\) by \(p_h\). (My heart is contained within a square, but that isn’t strictly necessary.) In order to reach x-coordinate \(d_x\), the particle would have to be twice as wide as the distance from the y-axis to that point — because it’s centered! — which is \(\left|2 d_x\right|\) pixels wide. Its scale, relative to its original size, would thus be \(\frac{\left|2 d_x\right|}{p_w}\). The scale for touching the y-coordinate would be computed the same way. To actually touch the point, the particle has to reach whichever coordinate is further away, so its scale must be:

$$
s = \max\left(\frac{\left|2 d_x\right|}{p_w}, \frac{\left|2 d_y\right|}{p_h}\right)
$$

A special case crops up here: for a cell with an odd width and height, the center pixel is exactly aligned with the origin, and the scale computes to zero. I’m doing some division in a moment, so that’s very bad — but the center pixel is effectively touched immediately, so I can say the final answer for this pixel is 0 and skip the rest of this anyway.

Now for the fun part! When the expanding particle hits the point of interest, it makes contact at some point on the original particle image. If the necessary scale is \(s\), the contact point is the center of the particle, offset by \(\left(\frac{d_x}{s}, \frac{d_y}{s}\right)\).

And now I raycast from that point to the center of the particle and check every pixel that ray crosses, using a modified Bresenham’s algorithm — originally intended for drawing pixel-perfect lines, but perfectly suited for casting a ray through a grid as well. (Conveniently, I’d already implemented this sort of raycast for collision detection for this very same game! Then I ended up not using it, hm.)

When I find an opaque-ish pixel (alpha of 0.5 or greater), I compute its distance from the center, divide by the distance from the contact point to the center — that tells me how much bigger the particle has to grow for the opaque-ish pixel I found to actually touch the point.

Multiply that ratio by the \(s\) I found earlier, and the result is exactly what I was looking for: the scale of the particle when it touches the point!

Now, raycasting for every pixel in the stamp — a thousand times even for a dinky cell size of 32×32 — is not exactly speedy. But it’s not unbearably slow, either. And this is something that’s generated once and played back a bunch of times, so why not spend a little CPU time upfront making it as high-quality as I can manage?

Anyway, that’s the hard part done! Now I can put the mask together.

Phase 2: the mask

With the stamp generated, I also know how big the particle has to grow for the entire cell to be covered: it’s just the highest scale in the stamp. For a simple full-screen effect, all I’d have to do at this point is scale the stamp values into the range [0, 1] and copy them to every cell in the grid.

But that’s boring; I wanted a wipe, which requires a couple more twiddles.

The wipe is essentially a second animation that controls when each cell’s individual animation starts. Above I considered a row-by-row wipe; for Cherry Kisses I ended up with a column-by-column wipe; Cave Story also has an “inwards” wipe. All of these can be generalized as numbered steps in a grid:

1
2
3
4
5
6
7
8
9
By row      By column   Inwards
77777777    76543210    01233210
66666666    76543210    12344321
55555555    76543210    23455432
44444444    76543210    34566543
33333333    76543210    34566543
22222222    76543210    23455432
11111111    76543210    12344321
00000000    76543210    01233210

The math is basically already done; I did it above. Given the number of steps \(n\) and the delay \(d\) (a fraction of the cell animation time), I can find the length of a cell animation \(t\) as follows:

$$
\begin{align*}
1 &= t \times (n – 1) \times d + t \\
&= t \times ((n – 1) \times d + 1) \\
\Rightarrow t &= \frac{1}{n d – d + 1}
\end{align*}
$$

The process for generating the whole mask is thus:

  1. Iterate over each pixel of the mask.
  2. Figure out what cell it’s in, and the step for that cell.
  3. Find the corresponding value in the stamp, scale it to the size of a cell animation, and add in the delay.
  4. Write that to the mask.

Once every pixel is done, the mask is complete!

Except… this didn’t handle the overlap issue. No problem, though; that’s surprisingly simple to fix.

First, expand the stamp to the size of a 3×3 block of cells. The maximum scale for a stamp should only be taken from the central cell; the others are for the following process.

Then, when reading a pixel’s stamp in step 3 above, read it from the central cell — and also from the neighboring cells. In those neighbors, I read from the stamp cell on the opposite side, in order to know how long it would take for the heart to grow out of that cell and into this one.

(Diagonal neighbors aren’t shown here, but you get the idea.)

Since different cells may have different start times, I may need to add/subtract some extra delay from the neighbors’ values. Then I take the smallest of all these samples to figure out the earliest time that any heart — either in this cell, or one of its neighbors — hits the pixel.

And hey, presto, we’re done! Here’s a (somewhat laggy) recording I took of the very first time I got this working for Cherry Kisses:

It ended up a little nicer-looking than that, of course. (Feel free to play the game to see it in action?) And if you’re curious, here’s the mask from the final game:

Caveats

There’s one teeny tiny problem still lingering in this approach. I assume that the whole animation ends when the last cell animation ends, but because of cell overlap, it might actually end earlier. And indeed, when I went back to check the Cherry Kisses mask, I found that it ends early — the brightest color in it is #e2e2e2. Oops. So much for that ludicrous accuracy!

That’s still fixable by taking overlap into account when finding the maximum value in the stamp, but I haven’t done it yet, and it’s a bit more complicated if the grid pattern has adjacent cells that are more than 1 step apart. (Those cases can also lead to particles mashing against the cell edge too early, which could be fixed by using a 5×5 or larger stamp…)

That’s it

Yep. The particle generator is just this logic with some knobs bolted on. It has a couple extra features, like a “halo” that highlights the transition point, and using all three color channels for extra precision, but it’s all built on the same basic idea.

There’s a lot of room for experimentation and variety here, and I’ve probably only scratched the surface. This is only a tiny subset of what can be done with a transition mask, too — it needn’t rely on a grid at all! See what you can come up with.

Oh, and here’s the exact shader I used in Cherry Kisses. It’s for LÖVE, so it has a couple non-standard #defines and globals, but you get the idea. The “ramp” is just a tolerance that adds a soft edge around the transition.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
extern Image mask;
extern float t;
extern float ramp;

vec4 effect(vec4 color, Image texture, vec2 tex_coords, vec2 screen_coords) {
    vec4 pixel = Texel(texture, tex_coords) * color;
    float discriminator = Texel(mask, screen_coords / love_ScreenSize.xy).r;
    float alpha = clamp((t - discriminator) / ramp + 0.5, 0.0, 1.0);
    pixel.a *= alpha;
    return pixel;
}

Happy transitioning!

Weekly roundup: Cherry Kisses

Post Syndicated from Eevee original https://eev.ee/dev/2019/04/01/weekly-roundup-cherry-kisses/

Hello hello! I finally finished that game I was working on, so now I might actually be back to some sort of normal dev schedule.

Except I’m gonna be kinda occupied for the next week or so. So.

  • cherry kisses: Hey hey I finished making Cherry Kisses, which is super duper NSFW! It’s probably the most well-designed and polished thing I/we have released, though. Whoops! I gotta stop accidentally making sex games.

    There’s some little niceties in there. Maybe I should, like, write about it sometime.

    Also attempted to get it to work on Android, which… is… non-trivial, despite LÖVE being “able” to target Android.

  • particle wipe generator: Cherry Kisses includes a cool heart transition between scenes, which took a surprising amount of effort to create, so I packaged up the code and put some dials and knobs on it. Now you can use the particle wipe generator to make your own particley transitions! Also hosted locally.

    Maybe I should, like, write about how this works sometime. Can’t wait for someone to tell me how I could’ve done it a thousand times more easily.

  • irl: We did some spring cleaning! Very exciting for anyone who doesn’t live here, I know. Our bedroom is no longer half-full of half-unpacked boxes, which is pretty nice.

  • sudoku: I occasionally waste time with a nice sudoku app, which has a free ad-supported version and a paid version. It has a sequel now, which is even better, but which only comes free with ads. I am incensed by this so I started writing my own JS player out of spite. Unclear whether my spite will last long enough to produce something usable.

Pretty happy to be back to makin’ things! I love that I could spin off some throwaway helper code into a little gamedev tool, and I’d definitely like to do more of that sort of thing in the future.

Weekly roundup: Strawberry Jam 3

Post Syndicated from Eevee original https://eev.ee/dev/2019/03/07/weekly-roundup-strawberry-jam-3/

Another double feature. Surprise, basically all I’ve been doing is working on a jam game. Should be back to normal once I get this out the door.

  • art: I doodled, like, once, but then was lost in gamedev all day every day.

  • cerise: I finally, like, planned out the progression of the game, and sat down to write it. Prose is harder for me than you’d think, and there’ve been a number of interruptions over the past month, so it’s taking a little while. I’m nearly done and am pretty happy with what I have so far, though.

    Actually, reading my notes here, I’m amazed how much of the game was done in only the last two weeks of February! I didn’t even have the passage of time working yet, and that’s the most fundamental part of the game.

    I did also cobble together a thing for generating particle wipes, which I think is pretty cool; I’ll probably write about it and release a web version in the near future.

Should be done in a day or two, but I’ve been saying that for a week, so who knows.

Weekly roundup: Off to a good start

Post Syndicated from Eevee original https://eev.ee/dev/2019/02/18/weekly-roundup-off-to-a-good-start/

I already missed a week! But I was mostly working on a jam game so that’s not too surprising.

  • art: Trying to keep up a semi-regular drawing schedule, with mixed success. I’m still working on my character lineup painting, though I’ve hit a few awkward spots that are proving difficult to fix. Sketched some stuff a few times. Drew Lexy with a sword.

  • music: I made part of a song, which was originally going to be for the Strawberry Jam game, and then I forgot about it, whoops. And now I will very much not have time to do music. It was coming out pretty okay, though, which is encouraging!

  • alice’s day off: Hey, remember this? Our wildly explicit VN for last year’s Strawberry Jam, which we intended to finish a few months later, and then, didn’t? Still working on it! I wrote out rough drafts for a couple more routes. This’ll probably become my main priority after this month, so I can actually get another thing done and off my plate.

  • cerise: Spent a lot of time on engine work (some of it not even necessary) and probably not enough on the game itself, and now I only have ten days left oops. Parallax layers are now actors, Tiled support is cleaned up a lot, sprites support four angles (or really, an arbitrary number of angles), physics were updated for top-down mode, and I finally implemented raycasting for realsies. Cleaned up dialogue code a lot, again, and put together basic dialogue UI.

    Fixing raycasting was a fun little problem, and free top-down movement offered an interesting little vector puzzle. Maybe I’ll write about those sometime.

    I also spent a little time porting some of my other LÖVE games to use my updated engine code, which also means they should run on LÖVE 11. I’m not finished yet, but once the month is over, I’d like to get updated releases out. It’ll only really matter for Linux users, since the Windows and Mac downloads include their own copy of the LÖVE runtime, but I’m a Linux user, so.

I better, uh, go get to work on this game.

Weekly roundup: Spectacular return

Post Syndicated from Eevee original https://eev.ee/dev/2019/02/05/weekly-roundup-spectacular-return/

Hey! Miss these? Great! I’m doing them again and no one can stop me.

  • art: I spent half the week rendering. Something, something, joke about rendering and EEVEE. No but really, I found out I’m kind of okay at this and set out to paint a whole lineup of all my Floraverse characters, which turned out to be really hard and time-consuming, but anyway here’s Lexy and slightly weirder Lexy and Cerise. Only, like, seven more to go.

    Hm. If only I’d constructed some sort of art website to put this kind of work on. If… only.

  • fox flux: This game keeps plodding along. I added a little blowing-a-kiss mechanic a while ago, and I finally gave it a real animation, which I then spruced up a bit more after recording that GIF. Also been cleaning up a big mess of half-finished features I left for myself, including particle effects for— well, that would spoil it!

  • strawberry jam: I’m running Strawberry Jam 3, the low-pressure month-long horny game jam! I haven’t gotten very far on my game yet, but most of the work is going to be upfront planning (I hope), so that’s not too worrisome. I just started writing code today, and hopefully will have some kinda rough skeleton done by the 25% point on Friday.

    This is gonna be most of my month! What an exciting topic to come back to.

More coming down the pipe; I’m accelerating all the time.

Eevee gained 2977 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2019/01/14/eevee-gained-2977-experience-points/

Eevee grew to level 32!

This has been a surreal and difficult year, but everything turned out much better in the end.

I can’t possibly do the whole story justice, and I’m not eager to rehash it anyway, so here’s the incredibly short version. The players are myself, my partner Ash (formerly Mel, aka glip), and their (at the time) husband Marl.

Helpful context: for years, Ash has been the target of a stalking slash gossip campaign. A group of folks on a forum infamous for this sort of thing likes to dig through our online footprints for dirt and compile lengthy lists of awful things we’ve allegedly done. Every time this happened, we dropped everything and investigated. It’s exhausting. Virtually everything we’ve been accused of has been some combination of long since resolved, wildly embellished, carefully trimmed to remove any explanatory context, completely misunderstood, distorted through rounds of telephone, or occasionally outright fabricated — and what’s not, we gladly apologize for and try to repair. But there are so many fractalline complaints that no casual observer could possibly double-check the evidence (it sometimes takes weeks for us to comb through it all), and we can’t respond effectively without producing a massive tome that no one will bother to read.

This is where we’re starting from.

In early April, someone posted logs from 2012 of Marl having horny chats with someone who was 15/16 and suggesting a variety of other shady behavior. The teenager in question was someone Marl had briefly hired to help him assemble con merch; Ash and I had barely interacted with her at all and didn’t even know they’d spoken outside of that. Nevertheless, “warnings” about all three of us began to circulate rapidly, Ash’s friends started getting doxxed, and folks bailed on us in droves — all while Ash and I were still trying to grasp what was even going on.

Marl offered a general apology, told us the logs were bogus, then became upset and withdrew. He didn’t keep logs of his own, so we had little else to go on and had to trust him. I found some oddities in the logs: enough to make me skeptical of them and more trusting of Marl, but nothing concrete.

Ash was completely exhausted with this, which was by no means the first accusation leveled at them over events they hadn’t even known about. They couldn’t take any more, were on the verge of a breakdown, and decided to abandon the internet altogether. That left me as the obvious conduit for anyone trying to get at Ash, and I am very bad at not grouching about something annoying, so this presented a very tangible risk. Ash is more important to me than being online, so I left as well.

For various reasons, not least of which is that the forum had our address and was still whipping a rather lot of people into a bloodthirsty frenzy, we no longer felt safe in our home. We left that too.

We stayed with Marl’s parents for a while, which gave Ash time to think. They started to feel the full weight of a lot of things, big and small, that Marl had done over the course of their ten-year marriage: lots of breaches of trust; stretching Ash’s patience as far as it would go and then promising to improve for just long enough; leaving us to deal with accusations levelled against him with zero information more than once.

He also eventually admitted that the logs were not entirely bogus, although he never clarified more specifically, so I have no way to know what he actually did or not. At the very least, he did slide into the DMs of a high schooler (who was also his employee, no less).

We subsequently evicted him from our lives, leaving him with his parents when we moved to a new place.


I’m told the teenager dropped off the forum (which she’d been posting on anonymously), and no one but Marl knows her identity, so she’s effectively vanished. We haven’t had contact with Marl in months. That just leaves us.

I’ve explained a lot of this in gratuitous detail on Twitter, and it’s been relatively quiet for a while now, but the initial confused mess can’t be undone. Gossip cannot be un-spread. To this day, we still get folks trying to warn people away from us, based primarily on what Marl did behind our backs.

Oh, well. Can’t please everyone, right? Does that actually apply here?

It drives me nuts to be misrepresented, but on the other hand, maybe it’s okay that people who take gossip at face value are self-selecting themselves out of my internet experience.

Anyway, that’s why my output was a bit low last year: I was chased from my home and thought I would be leaving the internet forever! Then I had to spend a few months getting settled. Plus I’ve been on and off ADHD meds since May, which has kind of thrown me for a loop, but I finally got that all sorted out just a few days ago. Now I can finally get back to, um, whatever it is that I do.


In lighter news!

We live in Colorado Springs now! It’s beautiful and lovely and actually has weather, which is a nice change after five years in Vegas.

I changed my name! It was in part to stay out of public records so we wouldn’t be doxxed again, but then they doxxed the name change, so, that didn’t work. Oh, well, I’m still happy I did it. I’m Evelyn Woods now. That’s right: I legally changed my name to Eevee.

Ash and I are engaged! Also I love them a lot. Marl injected a lot of invisible, ambiguous tension into the household; without that smothering us, we are flourishing. We went through hell together and made it out the other side. I’m… well, I’m really happy.

We got a new cat: Cheeseball, a Lykoi! He loves to make friends and also fight them, and his antics helped a lot over the summer. He’s very good.

So good, in fact, that over the summer I started working on Cheezball Rising, a game about Cheeseball for the Game Boy Color! It is hard and I am not very far along. Also I’ve been in outer space and haven’t worked on it much in several months. But I’ve been blogging the whole thing which is at least moderately interesting.

I also wrote a stub of a game for the GBA in Rust over the past week for a game jam, though it hasn’t gotten especially far either.

And, some other games? Probably? I think Alice’s Day Off was this most recent February, right? God, that feels like it was a decade ago. So much for finishing it by June.

I kinda-sorta kept up with art over the summer, but art requires a certain kind of mood for me, and I… wasn’t in it. Which is a shame, because I was starting to feel like I was getting somewhere.

I slopped together little Pelican-based art galleries for my SFW and NSFW art, which I’d been meaning to do for a while!

I don’t know. I stopped tracking what I was doing every day quite so closely, since I wasn’t doing much every day for a while there. Maybe I’ll start the weekly roundup posts back up? Did anyone read those?


What about 2019, then?

I feel unleashed and am absolutely certain this will be a fantastic year. Mostly I have to catch up on everything I didn’t do last year. Well, that’s fine. Let’s see, what do I even have in the air right now:

  • Cheezball Rising, the GBC game
  • fox flux advance, the GBA game, maybe
  • fox flux, the continuation of the desktop game
  • Alice’s Day Off, which was only released as a demo
  • idchoppers, the Rust Doom tool
  • art, writing, music
  • idk half a dozen other things, god

So, the usual: make stuff.

Cheezball Rising: Collision detection, part 1

Post Syndicated from Eevee original https://eev.ee/blog/2018/11/28/cheezball-rising-collision-detection-part-1/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I bash my head against a rock. Sorry, I mean I bash Star Anise against a rock. It’s about collision detection.

Previously: I draw some text to the screen.
Next: more collision detection, and fixed-point arithmetic.

Recap

Last time I avoided doing collision detection by writing a little dialogue system instead. It was cute, and definitely something that needed doing, but something much more crucial still looms.

Animation of the text box sliding up and scrolling out the text

I’ve put it off as long as I can. If I want to get anywhere with actual gameplay, I’m going to need some collision detection.

Background and upfront decisions

Collision detection is hard. It’s a lot of math that happens a few pixels at a time. Small mistakes can have dramatic consequences, yet be obscure enough that you don’t even notice them. Even using an off-the-shelf physics engine often requires dealing with a mountain of subtle quirks. And did I mention I have to do it on a Game Boy?

Someday I’ll write an article about everything I’ve picked up about collision detection, but I haven’t yet, so you get the quick version. The problem is that an object is moving around, and it should be unable to move into solid objects. There are two basic schools of thought about the solution.


Discrete collision observes that an object moves in steps — a little chunk of movement every frame — and simply teleports the object to its new location, then checks whether it now overlaps anything.

Illustration of an object attempting to move into a wall

(Note that all of these diagrams show very exaggerated motion. In most games, objects are slow and frames are short, so nothing moves more than a pixel or two at a time. That’s another reason collision detection is hard: the steps are so small that it can be difficult to see what’s actually going on.)

If it does overlap, you might might try to push it out of whatever it’s overlapping, or you might cancel the movement entirely and simply not move the object that frame.

Both approaches have drawbacks. Pushing an object out of an obstacle isn’t too difficult a problem, but it’s possible that the object will be pushed out into another obstacle, and now you have a complicated problem. (At this point, though, you could just give up and fall back to cancelling the movement.)

But cancelling the movement means that an object might get “stuck” a pixel or two away from a wall and never be able to butt up against it. The faster the object is trying to move, the bigger the risk that this might happen.

That said, this is exactly how the original Doom engine handles collision, and it seems to work well enough there. On the other hand, Doom is first-person so you can’t easily tell if you’re butting right up against a wall; a pixel gap is far more obvious in a game like this. On the other other hand, Doom also has bugs where a fast monster can open a locked door from its other side, because the initial teleport briefly moves the monster far enough into the door that it’s touching the other (unlocked) side.

Sorry. I have very conflicting feelings about this thicket of drawbacks and possible workarounds.

Either way, discrete collision has one other big drawback: tunnelling. Since the movement is done by teleporting, a very fast object might teleport right past a thin barrier. Only the new position is checked for collisions, so the barrier is never noticed. (This is how you travel to parallel universes in Mario 64 — by building up enough speed that Mario teleports through walls without ever momentarily overlapping them.)

Illustration of an object passing through a wall or erroneously pushing into one

There are some other potential gotchas, though they’re rare enough that I’ve never seen anyone mention them. One that stands out to me is that you don’t know the order that an object collided with obstacles, which might make a difference if the obstacles have special behavior when collided with and the order of that behavior matters.


Continuous collision detection observes that game physics are trying to simulate continuous motion, like happens in the real world, and tries to apply that to movement as well. Instead of teleporting, objects slide until they hit something. Tunnelling is thus impossible, and there’s no need to handle collisions since they’re prevented in the first place.

Illustration of an object sliding towards a wall and stopping when it touches

This has some clear advantages, in that it eliminates all the pitfalls of discrete collision! It even functions as a superset — if you want some object to act discretely, you could simply teleport it and then attempt to “move” it along the zero vector.

That said, continuous collision introduces some of its own problems. The biggest (for my purposes, anyway) is that it’s definitely more complicated to implement. “Sliding” means figuring out which obstacle would be hit first. You can do raycasting in the direction of movement and see what the ray hits first, though that’s imprecise and opens you up to new kinds of edge cases. If you’re lucky, you’re using something like Unity and can cast the entire shape as a single unit. Otherwise, well, you have to do a bunch of math to find everything in the swept path, then sort them in the order they’d be hit.

The other big problem is that it’s more work at runtime. With discrete collision, you only need to check for collisions in the new location. That only costs more time when a lot of objects are bunched together in one place, which is unlikely. With continuous collision, everything along the swept path needs to be examined, and that means that the faster an object moves, the more expensive its movement becomes.

So, not quite a golden bullet for the tunnelling problem. But that’s not a surprise; the only way to prevent tunnelling is to check for objects between the start and end positions.


Which, then, do I want to implement here?

For platforms without floating point (including the PICO-8 and Game Boy), there’s a third, hybrid option. If everything’s expressed with integers (or fixed point), then the universe has a Planck length: a minimum distance that every other distance must be an integral multiple of. You can thus fake continuous collision by doing repeated steps of discrete collision, one Planck length at a time. Objects will be collided with in the correct order, and you can simply stop at the first overlap.

Of course, this eats up a lot of time, since it involves doing collision detection numerous times per object per frame. So unless your Planck length is really big, I’m not sure it’s worth it.

Instead, I’m going to try for continuous collision. It’s closer to “correct” (whatever that means), and it’s what I did for all of my other games so far. It’s definitely harder, thornier, more complicated, and slower, but dammit I like it. It should also save me from encountering surprise bugs later on, which means I can write collision code once and then pretty much forget about it. Ideal.

Getting started

Star Anise is the only entity at the moment, so as a first pass, I’m only going to implement collision with the world.

World collision is much easier! Everything is laid out in a fixed grid, so I already know where the cells are. Finding potential overlaps is fairly simple, and best of all, I don’t need to sort anything to know what order the cells are in.

Right away, I find I have another decision to make. I would normally want to use vector math here — the motion is some distance in some direction, and hey, that’s a vector. But vectors take up twice as much space (read: twice as many registers), and a lot of vector operations rely on division or square roots which are non-trivial on this hardware.

With a great reluctant sigh, I thus commit to one more approximation, one made on 8-bit hardware since time immemorial. I won’t actually move in the direction of motion; instead, I’ll move along the x-axis, then move along the y-axis separately. Diagonal movement could theoretically cut across some corners (or be unable to fit through very tight gaps), but those are very minor and unlikely inconveniences. More importantly, this handwaving can’t allow any impossible motion.

I’ve already taken for granted that entities will all be axis-aligned rectangles. I’m definitely not dealing with slopes on a goddamn Game Boy. That was hard enough to do from scratch on a modern computer.

But I’m getting ahead of myself. First things first: you may recall that Star Anise’s movement is a bit of a hack. Pressing a direction button only adds to or subtracts from the sprite coordinates in the OAM buffer; his position isn’t actually stored in RAM anywhere. In fact, thanks to my slightly nonlinear storytelling across these posts, his movement isn’t stored anywhere either! The input-reading code writes directly to the OAM buffer. Whoops. I intended to fix that later, and now it’s later, so here we go.

1
2
3
4
5
; Somewhere in RAM, before anise_facing etc
anise_x:
    db
anise_y:
    db

So far, so good. OAM is populated in two places (and I should fix that later, too): once during setup, and once in the main game loop. Both will need to be updated to use these values.

Setup needs to initialize them first, of course:

1
2
3
4
    ld a, 64
    ld [anise_x], a
    ld [anise_y], a
    ; ... initialize anise_facing, etc ...

And now the OAM setup can be fixed. But, surprise! I left myself another hardcoded knot to untangle: even the relative positions of the sprites are hardcoded. Okay, so, those need to be put somewhere too. Eventually I’m going to need some kinda entity structure, but since there’s only one entity, I’ll just slap it into a constant somewhere.

(I guess my programming philosophy is leaking out a bit here. Don’t worry about structure until you need it, and you don’t need it until you need it twice. Once code works for one thing, it’s relatively straightforward to make it work for n things, and you have fewer things to worry about while you’re just trying to make something work.)

1
2
3
4
5
; In ROM somewhere
ANISE_SPRITE_POSITIONS:
    db -2, -20
    db -8, -14
    db 0, -14

It’s not immediately obvious from looking at these numbers, but I’m taking Star Anise’s position to mean the point on the ground between his feet. That’s the best approximation of where he is, after all.

(Early in game development, it seems natural to treat position as the upper-left corner of the sprite, so you can simply draw the sprite at the entity’s position — but that tangles the world model up with the sprite you happen to have at the moment. Imagine the havoc it’d wreak if you changed the size of the sprite later!)

Okay, now I can finally—

What? How does the code know there are exactly 3 sprites, on this byte-level platform? Because I’m hardcoding it. Shut up already I’ll fix it later

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
    ; Load the x and y coordinates into the b and c registers
    ld hl, anise_x
    ld b, [hl]
    inc hl
    ld c, [hl]
    ; Leave hl pointing at the sprite positions, which are
    ; ordered so that hl+ will step through them correctly
    ld hl, ANISE_SPRITE_POSITIONS

    ; ANTENNA
    ; x-coord
    ; The x coordinate needs to be added to the sprite offset,
    ; AND the built-in OAM offset (8, 16).  Reading the sprite
    ; offset first allows me to use hl+.
    ld a, [hl+]
    add a, b
    add a, 8
    ; Previously, hl pointed into the OAM buffer and advanced
    ; throughout this code, but now I'm using hl for something
    ; else, so I use direct addresses of positions within the
    ; buffer.  Obviously this is a kludge and won't work once
    ; I stop hardcoding sprites' positions in OAM, but, you
    ; know, I'll fix it later.
    ld [oam_buffer + 1], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 0], a
    ; This stuff is still hardcoded.
    ; chr index
    xor a
    ld [oam_buffer + 2], a
    ; attributes
    ld [oam_buffer + 3], a

    ; The rest of this is not surprising.

    ; LEFT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 5], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 4], a
    ; chr index
    ld a, 2
    ld [oam_buffer + 6], a
    ; attributes
    ld a, %00000001
    ld [oam_buffer + 7], a

    ; RIGHT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 9], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 8], a
    ; chr index
    ld a, 4
    ld [oam_buffer + 10], a
    ; attributes
    ld a, %00000001
    ld [oam_buffer + 11], a

Boot up the game, and… it looks the same! That’s going to be a running theme for a little bit here. Sorry, this isn’t a particularly screenshot-heavy post. It’s all gonna be math and code for a while.

Now I need to split apart the code that reads input and applies movement to OAM. Reading input gets much simpler, since it doesn’t have to do anything any more, just compute a dx and dy.

This code does still have looming questions, such as how to handle pressing two opposite directions (which is impossible on hardware but easy on an emulator), or whether diagonal movement should be fixed so that Anise doesn’t move at \(\sqrt{2}\) his movement speed.

Later. Seriously the actual code has so many XXX and TODO and FIXME comments that I edit out of these posts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    ; Anise update loop
    ; Stick dx and dy in the b and c registers.
    ld a, [buttons]
    ; b/c: dx/dy
    ld b, 0
    ld c, 0
    bit PADB_LEFT, a
    jr z, .skip_left
    dec b
.skip_left:
    bit PADB_RIGHT, a
    jr z, .skip_right
    inc b
.skip_right:
    bit PADB_UP, a
    jr z, .skip_up
    dec c
.skip_up:
    bit PADB_DOWN, a
    jr z, .skip_down
    inc c
.skip_down:

    ; For now just add b and c to Anise's coordinates.  This
    ; is where collision detection will go in a moment!
    ld a, [anise_x]
    add a, b
    ld [anise_x], a
    ld a, [anise_y]
    add a, c
    ld [anise_y], c

All that’s left is to more explicitly update the OAM buffer!

This code ends up looking fairly similar to the setup code. So similar, in fact, that I wonder if these blocks should be merged, but I’ll do that later:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
    ; Load x and y into b and c
    ld hl, anise_x
    ld b, [hl]
    inc hl
    ld c, [hl]
    ; Point hl at the sprite positions
    ld hl, ANISE_SPRITE_POSITIONS

    ; ANTENNA
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 1], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 0], a
    ; LEFT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 5], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 4], a
    ; RIGHT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 9], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 8], a

Phew! And the game plays exactly the same as before. Programming is so rewarding.

On to the main course!

Collision detection, sort of

So. First pass. Star Anise can only collide with the map.

Ah, but first, what size is Star Anise himself? I’ve only given him a position, not a hitbox. I could use his sprite as the hitbox, but with his helmet being much bigger than his body, that’ll make it seem like he can’t get closer than a foot to anything else. I’d prefer if he had an explicit radius.

1
2
3
; in ROM somewhere
ANISE_RADIUS:
    db 3

Remember, Star Anise’s position is the point between his feet. This describes his hitbox as a square, centered at that point, with sides 6 pixels long. The top and bottom edges of his hitbox are thus at y - r and y + r, which makes for some pleasing symmetry.

(Making hitboxes square doesn’t save a lot of effort or anything, but switching to rectangles later on wouldn’t be especially difficult either.)

The plan

My plan for moving rightwards, which I came up with after a lot of very careful and very messy sketching, looks like this:

  1. Figure out which rows I’m spanning.

  2. Move right until the next grid line. No new obstacle can possibly be encountered until then, so there’s nothing to check.

    (Unless I’m somehow already overlapping an obstacle, of course, but then I’d rather be able to move out of the obstacle than stay stuck and possibly softlock the game.)

  3. In the next grid column, check every cell that’s in a spanned row. If any of those cells block us, stop here. Otherwise, move to the next grid line (8 pixels).

  4. Repeat until I run out of movement.

    (It’s very unlikely the previous step would happen more than once; an entity would have to move more than 8 pixels per frame, which is 3 entire screen widths per second.)

Here’s a diagram. In this case, step 3 checks two cells for each column, but it might check more or fewer depending on how the entity is positioned. (It’ll never need to check more than one cell more than the entity’s height.)

Illustration of the above algorithm

Seems straightforward enough. But wait!

Edge case

I’ll save you a bunch of debugging anguish on my part and skip to the punchline: there’s an edge case.

I mean, literally, the case of when the entity’s edge is already against a grid line. That’ll happen fairly frequently — every time an entity collides with the map, it’ll naturally stop with its edge aligned to the grid.

The problem is all the way back in step 1. Remember, I said that to figure out which grid row or column a point belongs to, I need to divide by 8 (or shift right by 3). So the rows an entity spans must count from its top edge divided by 8, to its bottom edge divided by 8. Right?

Well…

Diagram showing division by 8 for several possible positions; when the bottom of the entity touches a grid line, it appears to be jutting into the row below

Everything’s fine until the entity’s bottom edge is exactly flush with the grid line, as in the last example. Then it seems to be jutting into the row below, even though no part of it is actually inside that row. If the entity tried to move rightwards from here, it might get blocked on something in row 1! Even worse, if row 1 were a solid wall that it had just run into, it wouldn’t be able to move left or right at all!

What happened here? There’s a hint in how I laid out the diagram.

There’s something akin to the fencepost problem here. I’ve been talking about rows and columns of the grid as if they were regions — “row 1” labels a rectangular strip of the world. But pixel coordinates don’t describe regions! They describe points. A pixel is a square area, but a pixel coordinate is the point at the upper left corner of that area.

In the incorrect example, the bottom of the entity is at y = 8, even though the row of pixels described by y = 8 doesn’t contain any part of the hitbox. I’m using the coordinate of the pixel’s top edge to describe a box’s bottom edge, and it falls apart when I try to reinterpret that coordinate as a region. In terms of area, y = 8 really names the first row of pixels that the entity doesn’t overlap.

To work around this, I need to adjust how I convert a coordinate to the corresponding grid cell, but only when that coordinate describes the right or bottom of a bounding box. Bottom pixel 8 should belong to row 0, but 9 should still end up in row 1.

As luck would have it, I’m using integers for coordinates, which means there’s a Planck length — a minimum distance of which all other distances are a multiple. That length is, of course, 1 pixel. If I subtract that length from a bottom coordinate, I get the next nearest coordinate going upwards. If the original coordinate was on a grid line, it’ll retreat back into the cell above; otherwise, it’ll stay in the same cell. You can check this with the diagram, if you need some convincing.

(This works for any fixed point system; integers are the special case of fixed point with zero fractional bits. It would not work so easily with floating point — subtracting the smallest possible float value will usually do nothing, because there’s not enough precision to express the difference. But then, if you have floating point, you probably have division and can write vector-based collision instead of taking grid-based shortcuts.)

All that is to say, I just need to subtract 1 before shifting. For clarity, I’ll write these as macros to convert a coordinate in a to a grid cell. I call the top or left conversion inclusive, because it includes the pixel the coordinate refers to; conversely, the bottom and right conversion is exclusive, like how a bottom of 8 actually excludes the pixels at y = 8.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
; Given a point on the top or left of a box, convert it to the
; containing grid cell.
ToInclusiveCell: MACRO
    ; This is just floor division
    srl a
    srl a
    srl a
ENDM
; Given a point on the bottom or right of a box, convert it to
; the containing grid cell.
ToExclusiveCell: MACRO
    ; Deal with the exclusive edge by subtracting the planck
    ; length, then flooring
    dec a
    srl a
    srl a
    srl a
ENDM

At last, I can write some damn code!

Some damn code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
    ; Here, b and c contain dx and dy, the desired movement.

    ; First, figure out which columns we might collide with.
    ; The NEAREST is the first one to our right that we're not
    ; already overlapping, i.e. the one /after/ the one
    ; containing our right edge.  That's Exc(x + r) + 1.
    ; The FURTHEST is the column that /will/ contain our right
    ; edge.  That's Exc(x + r + dx).
    ld hl, ANISE_RADIUS
    ; Put the NEAREST column in d
    ld a, [anise_x]             ; a = x
    add a, [hl]                 ; a = x + r
    ld e, a                     ; e = x + r
    ToExclusiveCell
    inc a                       ; a = Exc(x + r) + 1
    ld d, a                     ; d = Exc(x + r) + 1
    ; Put the FURTHEST column in e
    ld a, e                     ; a = x + r
    add a, b                    ; a = x + r + dx
    ToExclusiveCell
    ld e, a                     ; e = Exc(x + r + dx)

    ; Loop over columns in [d, e].
    ; If d > e, this movement doesn't cross a grid line, so
    ; nothing can stop us and we can skip all this logic.
    ld a, e
    cp d
    jp c, .done_x
    ; We don't need dx for now, so stash bc for some work space
    push bc
.x_row_scan:
    ; For each column we might cross: check whether any of the
    ; rows we span will block us.
    ; Hm.  This code probably should've been outside the loop.
    ld a, [anise_y]
    ld hl, ANISE_RADIUS
    sub a, [hl]
    ToInclusiveCell
    ld b, a                     ; b = minimum y
    ld a, [anise_y]
    add a, [hl]
    ToExclusiveCell
    ld c, a                     ; c = maximum/current y

.x_column_scan:
    ; Put the cell's row and column in bc, and call a function
    ; to check its "map flags".  I'll define that in a moment,
    ; but for now I'll assume that if bit 0 is set, that means
    ; the cell is solid.
    ; This is also why the inner loop counts down with c, not
    ; up with b: get_cell_flags wants the y coord in c, and
    ; this way, it's already there!
    push bc
    ld b, d
    call get_cell_flags
    pop bc
    ; If this produces zero, we can skip ahead
    and a, $01
    jr z, .not_blocked

    ; We're blocked!  Stop here.  Set x so that we're butted
    ; against this cell, which means subtract our radius from
    ; its x coordinate.
    ; Note that this can't possibly move us further than dx,
    ; because dx was /supposed/ to move us INTO this cell.
    ld a, d
    ; This is a /left/ shift three times, for cell -> pixel
    sla a
    sla a
    sla a
    sub a, [hl]
    ld [anise_x], a
    ; Somewhat confusing pop, to restore dx and dy.
    pop bc
    jp .done_x

.not_blocked:
    ; Not blocked, so loop to the next cell in this column
    dec c
    ld a, c
    cp b
    jr nc, .x_column_scan

    ; Finished checking one column successfully, so continue on
    ; to the next one
    inc d
    ld a, e
    cp d
    jr nc, .x_row_scan

    ; Done, and we never hit anything!  Update our position to
    ; what was requested
    pop bc
    ld a, [anise_x]
    add a, b
    ld [anise_x], a

I’ve also gotta implement get_cell_flags, which is slightly uglier than I anticipated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
; Fetches properties for the map cell at the given coordinates.
; In: bc = x/y coordinates
; Out: a = flags
get_cell_flags:
    push hl
    push de
    ; I have to figure out what char is at these coordinates,
    ; which means consulting the map, which means doing math.
    ; The map is currently 16 (big) tiles wide, or 32 chars,
    ; so the byte for the indicated char is at b + 32 * c.
    ld hl, TEST_MAP_1
    ; Add x coordinate.  hl is 16 bits, so extend b to 16 bits
    ; using the d and e registers separately, then add.
    ld d, 0
    ld e, b
    add hl, de
    ; Add y coordinate, with stride of 32, which we can do
    ; without multiplying by shifting left 5.  Alas, there are
    ; no 16-bit shifts, so I have to do this by hand.
    ; First get the 5 high bits by copying y into d, then
    ; shifting the 3 low bits off the right end.
    ld d, c
    srl d
    srl d
    srl d
    ; Then get the low 3 bits into the high 3 by swapping,
    ; shifting, and masking them off.
    ld a, c
    swap a
    sla a
    and a, $e0
    ld e, a
    ; Not sure that was really any faster than just shifting
    ; left through the carry flag 5 times.  Oh well.  Add.
    add hl, de

    ; At last, we know the char.  I don't have real flags at
    ; the moment, so I just hardcoded the four chars that make
    ; up the small rock tile.
    ld a, [hl]
    cp a, 2
    jr z, .blocking
    cp a, 3
    jr z, .blocking
    cp a, 12
    jr z, .blocking
    cp a, 13
    jr z, .blocking
    jr .not_blocking
    ; The rest should not be too surprising.
.blocking:
    ld a, 1
    jr .done
.not_blocking:
    xor a
.done:
    pop de
    pop hl
    ret

And that’s it!

That’s not it

The code I wrote only applies when moving right. It doesn’t handle moving left at all.

And here I run into a downside of continuous collision, at least in this particular case. Because of the special behavior of right/bottom edges, I can’t simply flip a sign to make this code work for leftwards movement as well. For example, the set of columns I might cross going rightwards is calculated exclusively, because my right edge is the one in front… but if I’m moving leftwards, it’s calculated inclusively. Those columns are also in reverse order and thus need iterating over backwards, so an inc somewhere becomes a dec, and so on.

I have two uncomfortable options for handling this. One is to add all the required conditional tests and jumps, but that adds a decent CPU cost to code that’s fairly small and potentially very hot, and complicates code that’s a bit dense and delicate to begin with. The other option is to copy-paste the whole shebang and adjust it as needed to go leftwards.

Guess which I did!

1
2
3
4
5
6
7
8
9
    ld a, b
    cp a, $80
    jp nc, .negative_x
.positive_x:
    ; ... everything above ...
    jp .done_x
.negative_x:
    ; ... everything above, flipped ...
.done_x:

Ugh. Don’t worry, though — it gets worse later on!

I could copy-paste for y movement too and give myself a total of four blocks of similar code, but I’ll hold off on that for now.

Ah.

You want the payoff, don’t you.

Well, I’m warning you now: the next post gets much hairier, and if I show you a GIF now, there won’t be any payoff next time.

You sure? Really?

No going back!

Star Anise walking around, but not through a rock!

I admit, this was pretty damn satisfying the first time it actually worked. Collision detection is a pain in the ass, but it’s the first step to making a game feel like a game. Games are about working within limitations, after all!

An aside: debugging

I’ve made this adventure seem much easier than it actually was by eliding all the mistakes. I made a lot of mistakes, and as I said upfront, it can be very difficult to notice heisenbugs or figure out exactly what’s causing them.

One thing that helped tremendously near the beginning was to hack Star Anise to have a fourth sprite: a solid black 6×6 square under his feet. That let me see where he was actually supposed to be able to stand. Highly recommend it. All I did was copy/paste everywhere that mentioned his sprites to add a fourth one, and position it centered under his feet.

(On any other system, I’d just draw collision rectangles everywhere, but the Game Boy is sprite-based so that’s not really gonna fly.)

I also had pretty good success with writing intermediate values to unused bytes in RAM, so I could inspect them in mGBA’s memory viewer even after the movement was finished. And of course, as an absolute last resort, bgb has an interactive graphical debugger. (Nothing against bgb per se; I just prefer not to rely on closed-source software running in Wine if I can at all get away with it.)

To be continued

Obviously, this isn’t anywhere near done. There’s no concept of collision with other entities, and before that’s even a possibility, I need a concept of other entities. I left myself a long trail of do-it-laters. There are even risks of overflow and underflow in a couple places, which I didn’t bother pointing out because I completely overhaul this code later.

But it’s a big step forward, and now I just need a few more big steps forward. (I say, four months later, long after all those steps are done.)

I already have some future ideas in mind, like: what if a map tile weren’t completely solid, but had its own radius? Could I implement corner cutting, where the game gently guides you if you get stuck on a corner by only a single pixel? What about having tiles that are 45° angles, just to cut down on the overt squareness of the map?

Well. Maybe, you know, later.

Anyway, that brings us up to commit da7478e. It’s all downhill from here.

Next time: more collision detection, and fixed-point arithmetic!

Cheezball Rising: Opening a dialogue

Post Syndicated from Eevee original https://eev.ee/blog/2018/10/09/cheezball-rising-opening-a-dialogue/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I draw some text!

Previously: I get a Game Boy to meow.
Next: collision detection, ohh nooo

Recap

The previous episode was a diversion (and left an open problem that I only solved after writing it), so the actual state of the game is unchanged.

Star Anise walking around a moon environment in-game, animated in all four directions

Where should I actually go from here? Collision detection is an obvious place, but that’s hard. Let’s start with something a little easier: displaying scrolling dialogue text. This is likely to be a dialogue-heavy game, so I might as well get started on that now.

Planning

On any other platform, I’d dive right into it: draw a box on the screen somewhere, fill it with text.

On the Game Boy, it’s not quite that simple. I can’t just write text to the screen; I can only place tiles and sprites.

Let’s look at how, say, Pokémon Yellow handles its menu.

Pokémon Yellow with several levels of menu open

This looks — feels — like it’s being drawn on top of the map, and that sub-menus open on top of other menus. But it’s all an illusion! There’s no “on top” here. This is a completely flat image made up of tiles, like anything else.

The same screenshot, scaled up, with a grid showing the edges of tiles

This is why Pokémon has such a conspicuously blocky font: all the glyphs are drawn to fit in a single 8×8 char, so “drawing” text is as simple as mapping letters to char indexes and drawing them onto the background. The map and the menu are all on the same layer, and the game simply redraws whatever was underneath when you close something. Part of the illusion is that the game is clever enough to hide any sprites that would overlap the menu — because sprites would draw on top! (The Game Boy Color has some twiddles for controlling this layering, but Yellow was originally designed for the monochrome Game Boy.)

A critical reason that this actually works is that in Pokémon, the camera is always aligned to the grid. It scrolls smoothly while you’re walking, but you can’t actually open the menu (or pick up an item, or talk to someone, or do anything else that might show text) until you’ve stopped moving. If you could, the menu would be misaligned, because it’s part of the same grid as the map!

This poses a slight problem for my game. Star Anise isn’t locked to the grid like the Pokémon protagonist is, and unlike Link’s Awakening, I do want to have areas larger than the screen that can scroll around freely.

I know offhand that there are a couple ways to do this. One is the window, an optional extra opaque layer that draws on top of the background, with its top-left corner anchored to any point on the screen. Another is to change some display registers in the middle of the screen redrawing. If you’re thinking of any games with a status bar at the bottom or right, chances are they use the window; games with a status bar at the top have to use display register tricks.

But I don’t want to worry about any of this right now, before I even have text drawing. I know it’s possible, so I’ll deal with it later. For now, drawing directly onto the background is good enough.

Font decisions

Let’s get back to the font itself. I’m not in love with the 8×8 aesthetic; what are my other options? I do like the text in Oracle of Ages, so let’s have a look at that:

Oracle of Ages, also scaled up with a grid, showing its taller text

Ah, this is the same approach again, except that letters are now allowed to peek up into the char above. So these are 8×16, but the letters all occupy a box that’s more like 6×9, offering much more familiar proportions. Oracle of Ages is designed for the Game Boy Color, which has twice as much char storage space, so it makes sense that they’d take advantage of it for text like this.

It’s not bad, but the space it affords is still fairly… limited. Only 16 letters will fit in a line, just as with Pokémon, and that means a lot of carefully wording things to be short and use mostly short words as well. That’s not gonna cut it for the amount of dialogue I expect to have.

(You may be wondering, as I did, how Oracle pulled off this grid-aligned textbox. In small buildings and the overworld, each room is exactly the size of the screen, so there’s no scrolling and no worry about misaligned text. But how does the game handle showing text inside a dungeon, where a room is bigger than the screen and can scroll freely? The answer is: it doesn’t! The textbox is just placed as close as possible to the position shown in this screenshot, so the edges might be misaligned by up to 4 pixels. In 20 years, I never noticed this until I thought to check how they were handling it. I’m sure there’s a lesson, here.)

What other options do I have? It seems like I’m limited to multiples of 8 here, surely. (The answer may be obvious to some of you, but shh, don’t read ahead.)

The answer lies in the very last game released for the Game Boy Color: Harry Potter and the Chamber of Secrets. Whatever deep secrets were learned during the Game Boy’s lifetime will surely be encapsulated within this, er, movie tie-in game.

Harry Potter and the Chamber of Secrets, also scaled up with a grid, showing its text isn't fixed to the grid

Hot damn. That is a ton of text in a relatively small amount of space! And it doesn’t fit the grid! How did they do that?

The answer is… exactly how you’d think!

Tile display for the above screenshot, showing that the text is simply written across consecutive tiles

With a fixed-width font like in Pokémon and Zelda games, the entire character set is stored in VRAM, and text is drawn by drawing a string of characters. With a variable-width font like in Harry Potter, a block of VRAM is reserved for text, and text is drawn into those chars, in software. Essentially, some chars are used like a canvas and have text rendered to them on the fly. The contents of the background layer might look like this in the two cases:

Illustration of fixed width versus variable width text

Some pros of this approach:

  • Since the number of chars required is constant and the font is never loaded directly into char memory, the font can have arbitrarily many glyphs in it. Multiple fonts could be used at the same time, even. (Of course, if you have more than 256 glyphs, you’ll have to come up with a multi-byte encoding for actually storing the text…)

  • A lot more text can fit in one line while still remaining readable.

  • It has the potential to look very cool. I definitely want to squeeze every last drop of fancy-pants graphical stuff that I can from this hardware.

And, cons:

  • It’s definitely more complicated! But I only have to write the code once, and since the game won’t be doing anything but drawing dialogue while the box is up, I don’t think I’ll be in danger of blowing my CPU budget.

  • Colored text becomes a bit trickier. But still possible, so, we can worry about that later.

  • Fixed text that doesn’t scroll, like on menus and whatnot, will be something of a problem — this whole idea relies on amortizing the text rendering across multiple frames. On the other hand, this game shouldn’t have too much of that, and this sounds like a good excuse to hand-draw fixed text (which can then be much more visually interesting). At worst, I could just render the fixed text ahead of time.

Well, I’m sold. Let’s give it a shot.

First pass

Well, I want to do something on a button press, so, let’s do that.

A lot of games (older ones especially) have bugs from switching “modes” in the same frame that something else happens. I don’t entirely understand why that’s so common and should probably ask some speedrunners, but I should be fine if I do mode-switching first thing in the frame, and then start over a new frame when switching back to “world” mode. Right? Sure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; ... button reading code in main loop ...
    bit BUTTON_A, a
    jp nz, .do_show_dialogue

    ; ... main loop ...

    ; Loop again when done
    jp vblank_loop

.do_show_dialogue:
    call show_dialogue
    jp vblank_loop

The extra level of indirection added by .do_show_dialogue is just so the dialogue code itself isn’t responsible for knowing where the main loop point is; it can just ret.

Now to actually do something. This is a first pass, so I want to do as little as possible. I’ll definitely need a palette for drawing the text — and here I’m cutting into my 8-palette budget again, which I don’t love, but I can figure that out later. (Maybe with some shenanigans involving changing the palettes mid-redraw, even.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
PALETTE_TEXT:
    ; Black background, white text...  then gray shadow, maybe?
    dcolor $000000
    dcolor $ffffff
    dcolor $999999
    dcolor $666666

show_dialogue:
    ; Have to disable the LCD to do video work.  Later I can do
    ; a less jarring transition
    DisableLCD

    ; Copy the palette into slot 7 for now
    ld a, %10111000
    ld [rBCPS], a
    ld hl, PALETTE_TEXT
    REPT 8
    ld a, [hl+]
    ld [rBCPD], a
    ENDR

I also know ahead of time what chars will need to go where on the screen, so I can fill them in now.

Note that I really ought to blank them all out, especially since they may still contain text from some previous dialogue, but I don’t do that yet.

An obvious question is: which tiles? I think I said before that with 512 chars available, and ¾ of those still being enough to cover the entire screen in unique chars, I’m okay with dedicating a quarter of my space to UI stuff, including text. To keep that stuff “out of the way”, I’ll put them at the “end” — bank 1, starting from $80.

I’m thinking of having characters be about the same proportions as in the Oracle games. Those games use 5 rows of tiles, like this:

1
2
3
4
5
top of line 1
bottom of line 1
top of line 2
bottom of line 2
blank

Since the font is aligned to the bottom and only peeks a little bit into the top char, the very top row is mostly blank, and that serves as a top margin. The bottom row is explicitly blank for a bottom margin that’s nearly the same size. The space at the top of line 2 then works as line spacing.

I’m not fixed to the grid, so I can control line spacing a little more explicitly. But I’ll get to that later and do something really simple for now, where $ff is a blank tile:

1
2
3
4
5
6
7
8
9
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|80|82|84|86|88|8a|8c|8e|90|92|94|96|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|81|83|85|87|89|8b|8d|8f|91|93|95|97|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+

This gives me a canvas for drawing a single line of text. The staggering means that the first letter will draw to adjacent chars $80 and $81, rather than distant cousins like $80 and $a0.

You may notice that the below code updates chars across the entire width of the grid, not merely the screen. There’s not really any good reason for that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
    ; Fill text rows with tiles (blank border, custom tiles)
    ; The screen has 144/8 = 18 rows, so skip the first 14 rows
    ld hl, $9800 + 32 * 14
    ; Top row, all tile 255
    ld a, 255
    ld c, 32
.loop1:
    ld [hl+], a
    dec c
    jr nz, .loop1

    ; Text row 1: 255 on the edges, then middle goes 128, 130, ...
    ld a, 255
    ld [hl+], a
    ld a, 128
    ld c, 30
.loop2:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop2
    ld a, 255
    ld [hl+], a

    ; Text row 2: same as above, but middle is 129, 131, ...
    ld a, 255
    ld [hl+], a
    ld a, 129
    ld c, 30
.loop3:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop3
    ld a, 255
    ld [hl+], a

    ; Bottom row, all tile 255
    ld a, 255
    ld c, 32
.loop4:
    ld [hl+], a
    dec c
    jr nz, .loop4

Now I need to repeat all of that, but in bank 1, to specify the char bank (1) and palette (7) for the corresponding tiles. Those are the same for the entire dialogue box, though, so this part is easier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    ; Switch to VRAM bank 1
    ld a, 1
    ldh [rVBK], a

    ld a, %00001111  ; bank 1, palette 7
    ld hl, $9800 + 32 * 14
    ld c, 32 * 4  ; 4 rows
.loop5:
    ld [hl+], a
    dec c
    jr nz, .loop5

    EnableLCD

Time to get some real work done. Which raises the question: how do I actually do this?

If you recall, each 8-pixel row of a char is stored in two bytes. The two-bit palette index for each pixel is split across the corresponding bit in each byte. If the leftmost pixel is palette index 01, then bit 7 in the first byte will be 0, and bit 7 in the second byte will be 1.

Now, a blank char is all zeroes. To write a (left-aligned) glyph into a blank char, all I need to do is… well, I could overwrite it, but I could just as well OR it. To write a second glyph into the unused space, all I need to do is shift it right by the width of the space used so far, and OR it on top. The unusual split layout of the palette data is actually handy here, because it means the size of the shift matches the number of pixels, and I don’t have to worry about overflow.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
0 0 0 0 0 0 0 0  <- blank glyph

1 1 1 1 0 0 0 0  <- some byte from the first glyph
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 0 0 0  <- ORed together to display first character

          1 1 1 1 0 0 0 0  <- some byte from the second glyph,
                              shifted by 4 (plus a kerning pixel)
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 1 1 1  <- ORed together to display first two characters

The obvious question is, well, what happens to the bits from the second character that didn’t fit? I’ll worry about that a bit later.

Oh, and finally, I’ll need a font, plus some text to display. This is still just a proof of concept, so I’ll add in a couple glyphs by hand.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
; somewhere in ROM
font:
; A
    ; First byte indicates the width of the glyph, which I need
    ; to know because the width varies!
    db 6
    dw `00000000
    dw `00000000
    dw `01110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11111000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
; B
    db 6
    dw `00000000
    dw `00000000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000

text:
    ; Shakespeare it ain't.
    ; Need to end with a NUL here so I know where the text
    ; ends.  This isn't C, there's no automatic termination!
    db "ABABAAA", 0

And here we go!

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
    ; ----------------------------------------------------------
    ; Setup done!  Real work begins here
    ; b: x-offset within current tile
    ; de: text cursor + current character tiles
    ; hl: current VRAM tile being drawn into
    ld b, 0
    ld de, text
    ld hl, $8800

    ; This loop waits for the next vblank, then draws a letter.
    ; Text thus displays at ~60 characters per second.
.next_letter:
    ; This is probably way more LCD disabling than is strictly
    ; necessary, but I don't want to worry about it yet
    EnableLCD
    call wait_for_vblank
    DisableLCD

    ld a, [de]                  ; get current character
    and a                       ; if NUL, we're done!
    jr z, .done
    inc de                      ; otherwise, increment

    ; Get the glyph from the font, which means computing
    ; font + 33 * a.
    ; A little register juggling.  hl points to the current
    ; char in VRAM being drawn to, but I can only do a 16-bit
    ; add into hl.  de I don't need until the next loop,
    ; since I already read from it.  So I'm going to push de
    ; AND hl, compute the glyph address in hl, put it in de,
    ; then restore hl.
    push de
    push hl
    ; The text is written in ASCII, but the glyphs start at 0
    sub a, 65
    ld hl, font
    ld de, 33                   ; 1 width byte + 16 * 2 tiles
    ; This could probably be faster with long multiplication
    and a
.letter_stride:
    jr z, .skip_letter_stride
    add hl, de
    dec a
    jr .letter_stride
.skip_letter_stride:
    ; Move the glyph address into de, and restore hl
    ld d, h
    ld e, l
    pop hl

    ; Read the first byte, which is the character width.  This
    ; overwrites the character, but I have the glyph address,
    ; so I don't need it any more
    ld a, [de]
    inc de

    ; Copy into current chars
    ; Part 1: Copy the left part into the current chars
    push af                     ; stash width
    ; A glyph is two chars or 32 bytes, so row_copy 32 times
    ld c, 32
    ; b is the next x position we're free to write to.
    ; Incrementing it here makes the inner loop simpler, since
    ; it can't be zero.  But it also means two jumps per loop,
    ; so, ultimately this was a pretty silly idea.
    inc b
.row_copy:
    ld a, [de]                  ; read next row of character

    ; Shift right by b places with an inner loop
    push bc                     ; preserve b while shifting
    dec b
.shift:                         ; shift right by b bits
    jr z, .done_shift
    srl a
    dec b
    jr .shift
.done_shift:
    pop bc

    ; Write the updated byte to VRAM
    or a, [hl]                  ; OR with current tile
    ld [hl+], a
    inc de
    dec c
    jr nz, .row_copy
    pop af                      ; restore width

    ; Part 2: Copy whatever's left into the next char
    ; TODO  :)

    ; Cleanup for next iteration
    ; Undo the b increment from way above
    dec b
    ; It's possible I overflowed into the next column, in which
    ; case I want to leave hl where it is: pointing at the next
    ; column.  Otherwise, I need to back it up to where it was.
    ; Of course, I also need to update b, the x offset.
    add a, b                    ; a <- new x offset
    ; If the new x offset is 8 or more, that's actually the next
    ; column
    cp a, 8
    jr nc, .wrap_to_next_tile
    ld bc, -32                  ; a < 8: back hl up
    add hl, bc
    jr .done_wrap
.wrap_to_next_tile:
    sub a, 8                    ; a >= 8: subtract tile width
    ld b, a
.done_wrap:
    ; Either way, store the new x offset into b
    ld b, a

    ; And loop!
    pop de                      ; pop text pointer
    jr .next_letter

.done:
    ; Undo any goofy stuff I did, and get outta here
    EnableLCD
    ; Remember to reset bank to 0!
    xor a
    ldh [rVBK], a
    ret

Phew! That was a lot, but hopefully it wasn’t too bad. I hit a few minor stumbling blocks, but as I recall, most of them were of the “I get the conditions backwards every single time I use cp augh” flavor. (In fact, if you look at the actual commit the above is based on, you may notice that I had the condition at the very end mixed up! It’s a miracle it managed to print part of the second letter at all.)

There are a lot of caveats in this first pass, including that there’s nothing to erase the dialogue box and reshow the map underneath it. (But I might end up using the window for this anyway, so there’s no need for that.)

As a proof of concept, though, it’s a great start!

Screenshot of Anise, with a black dialogue box that says: A|

That’s the letter A, followed by the first two pixels of the letter B. I didn’t implement the part where letters spill into the next column, yet.

Guess I’d better do that!

Second pass

One of the big problems with the first pass was that I had to turn the screen off to do the actual work safely. Shifting a bunch of bytes by some amount is a little slow, since I can only shift one bit at a time and have to do it within a loop, and vblank only lasts for about 6.5% of the entire duration of the frame. If I continued like this, the screen would constantly flicker on and off every time I drew a new letter. Yikes.

I’ll solve this the same way I solve pretty much any other vblank problem: do the actual work into a buffer, then just copy that buffer during vblank. Since I intend to draw no more than one character per frame, and each character glyph is no wider than a single char column, I only need a buffer big enough to span two columns. Text covers two rows, also, so that’s four tiles total.

I also need to zero out the tile buffer when I first start drawing text — otherwise it may still have garbage left over from the last time text was displayed! — and this seems like a great opportunity to introduce a little fill function. Maybe then I’ll do the right damn thing and clear out other stuff on startup.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
; Utility code section

; fill c bytes starting at hl with a
; NOTE: c must not be zero
fill:
    ld [hl+], a
    dec c
    jr nz, fill
    ret

; ...

; Stick this at a fixed nice address for now, just so it's easy
; for me to look at and debug
SECTION "Text buffer", WRAM0[$C200]
text_buffer:
    ; Text is up to 8x16 but may span two columns, so carve out
    ; enough space for four tiles
    ds $40

show_dialogue:
    DisableLCD
    ; ... setup stuff ...
    EnableLCD

    ; Zero out the tile buffer
    xor a
    ld hl, text_buffer
    ld c, $40
    call fill

That first round of disabling and enabling the LCD is still necessary, because the setup work takes a little time, but I can get rid of that later too. For now, the priority is fixing the text scroll (and supporting text that spans more than one tile).

The code is the same up until I start copying the glyph into the tiles. Now it doesn’t go to VRAM, but into the buffer.

There’s another change here, too. Previously, I shifted the glyph right, letting bits fall off the right end and disappear. But the bits that drop off the end are exactly the bits that I need to draw to the next char. I could do a left shift to retrieve them, but I had a different idea: rotate the glyph instead.

Say I want to draw a glyph offset by 3 pixels. Then I want to do this:

1
2
3
4
5
6
7
8
abcdefgh  <- original glyph bits
fghabcde  <- rotate right 3
00011111  <- mask, which is just $ff shifted right 3

000abcde  <- rotated glyph AND mask gives the left part

11100000  <- mask, inverted
fgh00000  <- rotated glyph AND inverted mask gives the right part

The time and code savings aren’t huge, exactly, and nothing else is going on while text is rendering so it’s not like time is at a premium here. But hey this feels clever so let’s do it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
    ; Copy into current chars
    push af                     ; stash width
    ld c, 32                    ; 32 bytes per row
    ld hl, text_buffer          ; new!
    ; This is still silly.
    inc b
.row_copy:
    ld a, [de]                  ; read next row of character
    ; Rotate right by b - 1 pixels -- remember, b contains the
    ; x-offset within the current tile where to start drawing
    push bc                     ; preserve b while shifting
    ld c, $ff                   ; initialize the mask
    dec b
    jr z, .skip_rotate
.rotate:
    ; Rotate the glyph (a), but shift the mask (c), so that the
    ; left end of the mask fills up with zeroes
    rrca
    srl c
    dec b
    jr nz, .rotate
.skip_rotate:
    push af                     ; preserve glyph
    and a, c                    ; mask right pixels
    ; Draw to left half of text buffer
    or a, [hl]                  ; OR with current tile
    ld [hl+], a
    ; Write the remaining bits to right half
    ld a, c                     ; put mask in a...
    cpl                         ; ...to invert it
    ld c, a                     ; then put it back
    pop af                      ; restore unmasked glyph
    and a, c                    ; mask left pixels
    ld [hl+], a                 ; and store them!
    ; Clean up after myself, and loop to the next row
    inc de                      ; next row of glyph
    pop bc                      ; restore counter!
    dec c
    jr nz, .row_copy
    pop af                      ; restore width

The use of the stack is a little confusing (and don’t worry, it only gets worse in later posts). Note for example that c is used as the loop counter, but since I don’t actually need its value within the body of the loop, I can push it right at the beginning and use c to hold the mask, then pop the loop counter back into place at the end.

This is where I first started to feel register pressure, especially when addresses eat up two of them. My options are pretty limited: I can store stuff on the stack, or store stuff in RAM. The stack is arguably harder to follow (and easier to fuck up, which I’ve done several times), but either way there’s the register ambiguity.

Which is shorter/faster? Well:

  • A push/pop pair takes 2 bytes and 7 cycles.

  • Immediate writing to RAM and immediate reading back from it takes 6 bytes and 8 cycles, and can only be done with a, so I’d probably have to copy into and out of some other register too.

  • Putting an address in hl, writing to it, then reading from it takes 5 bytes and 7 cycles, but requires that I can preserve hl. (On the other hand, if I can preserve the value of hl across a loop or something, then it’s amortized away and the read/write is only 2 bytes and 3 cycles. But if that’s the case, chances are that I’m not under enough register pressure to need using RAM in the first place.)

  • Parts of high RAM ($ff80 and up) are available for program use, and they can be read or written with the same instructions that operate on the control knobs starting at $ff00. A high RAM read and write takes 4 bytes and 6 cycles, which isn’t too bad, but once again I have to go through the a register so I’ll probably need some other copies.

Stack it is, then.

Anyway! Where were we. I need to now copy the buffer into VRAM.

You may have noticed that the buffer isn’t quite populated in char format. Instead, it’s populated like one big 16-pixel char, with the first 16 bits corresponding to the 16 pixels spanning both columns. VRAM, of course, expects to get all the pixels from the first column, then all the pixels from the second column. If that’s not clear, here’s what I have (where the bits are in order from left to right, top to bottom):

1
2
3
AAAAAAAA BBBBBBBB  <- high bits for first row of pixels
aaaaaaaa bbbbbbbb  <- low bits for first row of pixels
... other rows ...

And here’s what I need to put in VRAM:

1
2
3
4
5
6
AAAAAAAA  <- high bits for first row of left column of pixels
aaaaaaaa  <- low bits for first row of left column of pixels
... other rows of left column ...
BBBBBBBB  <- high bits for first row of right column of pixels
bbbbbbbb  <- low bits for first row of right column of pixels
... other rows of right column ...

I hope that makes sense! To fix this, I use two loops (one for each column), and in each loop I copy every other byte into VRAM. That deinterlaces the buffer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
    ; Draw the buffered tiles to vram
    ; The text buffer is treated like it's 16 pixels wide, but
    ; VRAM is of course only 8 pixels wide, so we need to do
    ; this in two iterations: the left two tiles, then the right
    pop hl                      ; restore hl (VRAM)
    push af                     ; stash width, again
    call wait_for_vblank        ; always wait before drawing
    push bc
    push de
    ; Draw the left two tiles
    ld c, $20
    ld de, text_buffer
.draw_left:
    ld a, [de]
    ; This double inc fixes the interlacing
    inc de
    inc de
    ld [hl+], a
    dec c
    jr nz, .draw_left
    ; Draw the right two tiles
    ld c, $20
    ; This time, start from the SECOND byte, which will grab
    ; all the bytes skipped by the previous loop
    ld de, text_buffer + 1
.draw_right:
    ld a, [de]
    inc de
    inc de
    ld [hl+], a
    dec c
    jr nz, .draw_right
    pop de
    pop bc
    pop af                      ; restore width, again

Just about done! There’s one last thing to do before looping to the next character. If this character did in fact span both columns, then the buffer needs to be moved to the left by one column. Here’s a simplified diagram, pretending chars are 5×5 and I just drew a B:

1
2
3
4
5
6
7
+-----+-----+.....+
| A  B|B    |     .
|A A B| B   |     .
|AAA B|B    |     .
|A A B| B   |     .
|A A B|B    |     .
+-----+-----+.....+

The left column is completely full, so I don’t need to buffer it any more. The next character wants to draw in the last partially full column, which here is the one containing the B; it’ll also want an empty right column to overflow into if necessary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
    ; Increment the pixel offset and deal with overflow
    add a, b                    ; a <- new x offset
    ; Regardless of whether this glyph overflowed, the VRAM
    ; pointer was left at the beginning of the next (empty)
    ; column, and it needs rewinding to the right column
    ld bc, -32                  ; move the VRAM pointer back...
    add hl, bc                  ; ...to the start of the char
    cp a, 8
    jr nc, .wrap_to_next_char
    ; The new offset is less than 8, so this character didn't
    ; actually draw anything in the right column.  Move the
    ; VRAM pointer back a second time, to the left column,
    ; which still has space left
    add hl, bc
    jr .done_wrap
.wrap_to_next_char:
    ; The new offset is 8 or more, so this character drew into
    ; the next char.  Subtract 8, but also shift the text buffer
    ; by copying all the "right" chars over the "left" chars
    sub a, 8                    ; a >= 8: subtract char width
    push hl
    push af
    ; The easy way to do this is to walk backwards through the
    ; buffer.  This leaves garbage in the right column, but
    ; that's okay -- it gets overwritten in the next loop,
    ; before the buffer is copied into VRAM.
    ld hl, text_buffer + $40 - 1
    ld c, $20
.shift_buffer:
    ld a, [hl-]
    ld [hl-], a
    dec c
    jr nz, .shift_buffer
    pop af
    pop hl
.done_wrap:
    ld b, a                     ; either way, store into b

    ; Loop
    pop de                      ; pop text pointer
    jp .next_letter

And the test run:

Screenshot of Anise, with a black dialogue box that says: ABABAAA

Hey hey, success!

Quick diversion: Anise corruption

I didn’t mention it above because I didn’t actually use it yet, but while doing that second pass, I split the button-polling code out into its own function, read_input. I thought I might need it in dialogue as well (which has its own vblank loop and thus needs to do its own polling), but I didn’t get that far yet, so it’s still only called from the main loop.

While testing out the dialogue, I notice a teeny tiny problem.

A screenshot similar to the above, but with some mild graphical corruption on Anise

Well, yes, obviously there’s the problem of the textbox drawing underneath the player. Which is mostly a problem because the textbox doesn’t go away, ever. I’ll worry about that later.

The other problem is that Anise’s sprite is corrupt. Again. Argh!

A little investigation suggests that, once again, I’m blowing my vblank budget. But this time, it’s a little more reasonable. Remember, I’m overwriting Anise’s sprite after handling movement. That means I do a bunch of logic followed by writing to char data. No wonder there’s a problem. I must’ve just slightly overrun vblank when I split out read_input (or checked for the dialogue button press in the first place?), since call has a teeny tiny bit of overhead.

That approach is a little inconsistent, as well. Remember how I handle OAM: I write to a buffer, which is then copied to real OAM during the next vblank. But I’m updating the sprite immediately. That means when Anise turns, the sprite updates on the very next frame, but the movement isn’t visible until the frame after that. Whoops.

So, a buffer! I could make this into a more general mechanism later, but for now I only care about fixing Anise. I can revisit this when I have, uh, a second sprite.

1
2
3
4
; in ram somewhere

anise_sprites_address:
    dw

Now, Anise is composed of three objects, which is six chars, which is 96 bytes. The fastest way to copy bytes by hand is something like this:

1
2
3
4
5
6
7
8
9
    ld hl, source
    ld de, destination
    ld c, 96
.loop:
    ld a, [hl+]
    ld [de], a
    inc de
    dec c
    jr nz, .loop

Each iteration of the loop copies 1 byte and takes 7 cycles. (It’s possible to shave a couple cycles off in some specific cases, and unrolling would save some time, but let’s stay general for now.) That’s 672 cycles, plus 10 for the setup, minus one on the final jr, for 681 total. But vblank only lasts 1140 cycles! That’s more than half the budget blown for updating a single entity. This can’t possibly work.

Enter a feature exclusive to the Game Boy Color: GDMA, or general DMA. This is similar to OAM DMA, except that it can copy (nearly) anything to anywhere. Also (unlike OAM DMA), the CPU pauses while the copy is taking place, so there’s no need to carefully time a busy loop. It’s configured by writing to five control registers (which takes 5 cycles each), and then it copies two bytes per cycle, for a total of 73 cycles. That’s 9.3 times faster. Seems worth a try.

(Note that I’m not using double-speed CPU mode yet, as an incentive to not blow my CPU budget early on. Turning that on would halve the time taken by the manual loop, but wouldn’t affect GDMA.)

GDMA has a couple restrictions: most notably, it can only copy multiples of 16 bytes, and only to/from addresses that are aligned to 16 bytes. But each char is 16 bytes, so that works out just fine.

The five GDMA registers are, alas, simply named 1 through 5. The first two are the source address; the next two are the destination address; the last is the amount to copy. Or, well, it’s the amount to copy, divided by 16, minus 1. (The high bit is reserved for turning on a different kind of DMA that operates a bit at a time during hblanks.) Writing to the last register triggers the copy.

Plugging in this buffer is easy enough, then:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    ; Update Anise's current sprite.  Use DMA here because...
    ; well, geez, it's too slow otherwise.
    ld hl, anise_sprites_address
    ld a, [hl+]
    ld [rHDMA1], a
    ld a, [hl]
    ld [rHDMA2], a
    ; I want to write to $8000 which is where Anise's sprite is
    ; hardcoded to live, and the top three bits are ignored so
    ; that the destination is always in VRAM, so $0000 works too
    ld a, HIGH($0000)
    ld [rHDMA3], a
    ld a, LOW($0000)
    ld [rHDMA4], a
    ; And copy!
    ld a, (32 * 3) / 16 - 1
    ld [rHDMA5], a

Finally, instead of actually overwriting Anise’s sprite, I write the address of the new sprite into the buffer:

1
2
3
4
5
    ; Store the new sprite address, to be updated during vblank
    ld a, h
    ld [anise_sprites_address], a
    ld a, l
    ld [anise_sprites_address + 1], a

And done! Now I can walk around just fine. It looks basically like the screenshot from the previous section, so I don’t think you need a new one.

Note that this copy will always happen, since there’s no condition for skipping it when there’s nothing to do. That’s fine for now; later I’ll turn this into a list, and after copying everything I’ll simply clear the list.

Crisis averted, or at least deferred until later. Back to the dialogue!

Interlude: A font

Writing out the glyphs by hand is not going to cut it. It was fairly annoying for two letters, let alone an entire alphabet.

Nothing about this part was especially interesting. I used LÖVE’s font format, which puts all glyphs in a single horizontal strip. The color of the top-left pixel is used as a sentinel; any pixel in the top row that’s the same color indicates the start of a new glyph.

(I note that LÖVE actually recommends against using this format, but the alternatives are more complicated and require platform-specific software — whereas I can slop this format together in any image editor without much trouble.)

I then turned this into Game Boy tiles much the same way as with the sprite loader, except with the extra logic to split on the sentinel pixels and pad each glyph to eight pixels wide. I won’t reproduce the whole script here, but it’s on GitHub if you want to see it.

The font itself is, well, a font? I initially tried to give it a little personality, but that made some of the characters weirdly wide and was a bit hard to read, so I revisited it and ended up with this:

Pixel font covering all of ASCII

I like it, at least! The characters all have shadows built right in, and you can see at the end that I was starting to play with some non-ASCII characters. Because I can do that!

Third pass

One major obstacle remains: I can only have one line of text right now, when there’s plenty of space for two.

The obvious first thing I need to do is alter the dialogue box’s char map. It currently has a whole char’s worth of padding on every side. What a waste. I want this instead:

1
2
3
4
5
6
7
8
9
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|80|82|84|86|88|8a|8c|8e|90|92|94|96|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|81|83|85|87|89|8b|8d|8f|91|93|95|97|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|a8|aa|ac|ae|b0|b2|b4|b6|b8|ba|bc|be|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|a9|ab|ad|af|b1|b3|b5|b7|b9|bb|bd|bf|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+

The second row begins with char $a8 because that’s $80 + 40.

Obviously I’ll need to change the setup code to make the above pattern. But while I’m in here… remember, the setup code is the only remaining place that disables the LCD to do its work. Can I do everything within vblank instead?

I’m actually not sure, but there’s an easy way to reduce the CPU cost. Instead of setting up the whole dialogue box at once, I can do it one row at a time, starting from the bottom. That will cut the vblank pressure by a factor of four, and it’ll create a cool slide-up effect when the dialogue box opens!

Let’s give it a try. I’ll move the real code into a function, since it’ll run multiple times now. I’ll also introduce a few constants, since I’m getting tired of all the magic numbers everywhere.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
SCREEN_WIDTH_TILES EQU 20
CANVAS_WIDTH_TILES EQU 32
SCREEN_HEIGHT_TILES EQU 18
CANVAS_HEIGHT_TILES EQU 32
BYTES_PER_TILE EQU 16
TEXT_START_TILE_1 EQU 128
TEXT_START_TILE_2 EQU TEXT_START_TILE_1 + SCREEN_WIDTH_TILES * 2

; Fill a row in the tilemap in a way that's helpful to dialogue.
; hl: where to start filling
; b: tile to start with
fill_tilemap_row:
    ; Populate bank 0, the tile proper
    xor a
    ldh [rVBK], a

    ld c, SCREEN_WIDTH_TILES
    ld a, b
.loop0:
    ld [hl+], a
    ; Each successive tile in a row increases by 2!
    add a, 2
    dec c
    jr nz, .loop0

    ; Populate bank 1, the bank and palette
    ld a, 1
    ldh [rVBK], a
    ld a, %00001111  ; bank 1, palette 7
    ld c, SCREEN_WIDTH_TILES
    dec hl
.loop1:
    ld [hl-], a
    dec c
    jr nz, .loop1

    ret

Now replace the setup code with four calls to this function, waiting for vblank between successive calls.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
    ; Row 4
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 1)
    ld b, TEXT_START_TILE_2 + 1
    call fill_tilemap_row

    ; Row 3
    call wait_for_vblank
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 2)
    ld b, TEXT_START_TILE_2
    call fill_tilemap_row

    ; Row 2
    call wait_for_vblank
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 3)
    ld b, TEXT_START_TILE_1 + 1
    call fill_tilemap_row

    ; Row 1
    call wait_for_vblank
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 4)
    ld b, TEXT_START_TILE_1
    call fill_tilemap_row

Cool. I have a full font now, too, so I might as well try it out with some more interesting text.

1
2
3
SECTION "Font", ROMX
text:
    db "The quick brown fox jumps over the     lazy dog's back.  AOOWWRRR!!!!", 0

Now I just need to— oh, hang on.

Animation of the text box sliding up and scrolling out the text

Hey, it already works! Magic.

(I did also change the initial value for the x-offset to 4 rather than 0, so the text doesn’t start against the left edge of the screen.)

Well. Not really. The code I wrote doesn’t actually know when to stop writing, so it continues off the end of the first line and onto the second. You may notice the conspicuous number of extra spaces in the new text.

Still, it looks right, and this was a lot of effort already, and it’s not actually plugged into anything yet, so I called this a success and shelved it for now. Quit while you’re ahead, right?

Future work

Obviously this is still a bit rough.

That thing where the player can walk on top of the textbox is a bit of a problem, since the same thing happens if the textbox opens while the player is near the bottom of the screen. There are a couple solutions to this, and they’ll really depend on how I end up deciding to display the box.

I actually wanted the glyphs to be drawn a little lower than normal on the top line, to add half a char or so of padding around them, but I tried it and got a buffer overrun that I didn’t feel like investigating. That’s an obvious thing to fix next time I touch this code.

What about word wrapping? I’ve written about that before and clearly have strong opinions about it, but I really don’t want to do dynamic word wrapping with a variable-width font on a Game Boy. Instead, I’ll probably store dialogue in some other format and use another converter script to do the word-wrapping ahead of time. That’ll also save me from writing large amounts of dialogue in, um, assembly. And if/when I want any fancy-pants special effects within dialogue, I can describe them with a human-readable format and then convert that to more assembly-friendly bytecode instructions.

The dialogue box still doesn’t go away, partly because it draws right on top of the map, and I don’t have any easy way to repair the map right now. I’ll probably switch to one of those other mechanisms for showing the box later that won’t require clobbering the map, and then this problem will pretty much solve itself.

What about menus? Those will either have to go inside the dialogue box (which means the question being asked isn’t visible, oof), or they’ll have to go in a smaller box above it like in Pokémon. But the latter solution means I can’t use the window or display trickery — both of those only work reliably for horizontal splits. I’m not quite sure how to handle this, yet.

And then, what of portraits? Most games get away without them by having a silent protagonist, which makes it obvious who’s talking. But Anise is anything but silent, so I need a stronger indicator. I obviously can’t overlay a big transparent portrait on the background, like I do in my LÖVE games. I think I can reseve space for them in the status bar, which will go underneath the dialogue box. I’ll have to see how it works out. Maybe I could also use a different text color for every speaker?

After all that, I can start worrying about other frills like colored text and pauses and whatever. Phew.

To be continued

That brings us up to commit a173db, which is slightly beyond the second release (which includes a one-line textbox)! Also that was three months ago oh dear. I think I’ll be putting out a new release soon, stay tuned!

Next time: collision detection! I am doomed.

Cheezball Rising: Opening a dialogue

Post Syndicated from Eevee original https://eev.ee/blog/2018/09/08/cheezball-rising-opening-a-dialogue/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I draw some text!

Previously: I get a Game Boy to meow.
Next: collision detection, ohh nooo

Recap

The previous episode was a diversion (and left an open problem that I only solved after writing it), so the actual state of the game is unchanged.

Star Anise walking around a moon environment in-game, animated in all four directions

Where should I actually go from here? Collision detection is an obvious place, but that’s hard. Let’s start with something a little easier: displaying scrolling dialogue text. This is likely to be a dialogue-heavy game, so I might as well get started on that now.

Planning

On any other platform, I’d dive right into it: draw a box on the screen somewhere, fill it with text.

On the Game Boy, it’s not quite that simple. I can’t just write text to the screen; I can only place tiles and sprites.

Let’s look at how, say, Pokémon Yellow handles its menu.

Pokémon Yellow with several levels of menu open

This looks — feels — like it’s being drawn on top of the map, and that sub-menus open on top of other menus. But it’s all an illusion! There’s no “on top” here. This is a completely flat image made up of tiles, like anything else.

The same screenshot, scaled up, with a grid showing the edges of tiles

This is why Pokémon has such a conspicuously blocky font: all the glyphs are drawn to fit in a single 8×8 char, so “drawing” text is as simple as mapping letters to char indexes and drawing them onto the background. The map and the menu are all on the same layer, and the game simply redraws whatever was underneath when you close something. Part of the illusion is that the game is clever enough to hide any sprites that would overlap the menu — because sprites would draw on top! (The Game Boy Color has some twiddles for controlling this layering, but Yellow was originally designed for the monochrome Game Boy.)

A critical reason that this actually works is that in Pokémon, the camera is always aligned to the grid. It scrolls smoothly while you’re walking, but you can’t actually open the menu (or pick up an item, or talk to someone, or do anything else that might show text) until you’ve stopped moving. If you could, the menu would be misaligned, because it’s part of the same grid as the map!

This poses a slight problem for my game. Star Anise isn’t locked to the grid like the Pokémon protagonist is, and unlike Link’s Awakening, I do want to have areas larger than the screen that can scroll around freely.

I know offhand that there are a couple ways to do this. One is the window, an optional extra opaque layer that draws on top of the background, with its top-left corner anchored to any point on the screen. Another is to change some display registers in the middle of the screen redrawing. The Oracle games combine both features to have a status bar at the top of the screen but a scrolling map underneath.

But I don’t want to worry about any of this right now, before I even have text drawing. I know it’s possible, so I’ll deal with it later. For now, drawing directly onto the background is good enough.

Font decisions

Let’s get back to the font itself. I’m not in love with the 8×8 aesthetic; what are my other options? I do like the text in Oracle of Ages, so let’s have a look at that:

Oracle of Ages, also scaled up with a grid, showing its taller text

Ah, this is the same approach again, except that letters are now allowed to peek up into the char above. So these are 8×16, but the letters all occupy a box that’s more like 6×9, offering much more familiar proportions. Oracle of Ages is designed for the Game Boy Color, which has twice as much char storage space, so it makes sense that they’d take advantage of it for text like this.

It’s not bad, but the space it affords is still fairly… limited. Only 16 letters will fit in a line, just as with Pokémon, and that means a lot of carefully wording things to be short and use mostly short words as well. That’s not gonna cut it for the amount of dialogue I expect to have.

What other options do I have? It seems like I’m limited to multiples of 8 here, surely. (The answer may be obvious to some of you, but shh, don’t read ahead.)

The answer lies in the very last game released for the Game Boy Color: Harry Potter and the Chamber of Secrets. Whatever deep secrets were learned during the Game Boy’s lifetime will surely be encapsulated within this, er, movie tie-in game.

Harry Potter and the Chamber of Secrets, also scaled up with a grid, showing its text isn't fixed to the grid

Hot damn. That is a ton of text in a relatively small amount of space! And it doesn’t fit the grid! How did they do that?

The answer is… exactly how you’d think!

Tile display for the above screenshot, showing that the text is simply written across consecutive tiles

With a fixed-width font like in Pokémon and Zelda games, the entire character set is stored in VRAM, and text is drawn by drawing a string of characters. With a variable-width font like in Harry Potter, a block of VRAM is reserved for text, and text is drawn into those chars, in software. Essentially, some chars are used like a canvas and have text rendered to them on the fly. The contents of the background layer might look like this in the two cases:

Illustration of fixed width versus variable width text

Some pros of this approach:

  • Since the number of chars required is constant and the font is never loaded directly into char memory, the font can have arbitrarily many glyphs in it. Multiple fonts could be used at the same time, even. (Of course, if you have more than 256 glyphs, you’ll have to come up with a multi-byte encoding for actually storing the text…)

  • A lot more text can fit in one line while still remaining readable.

  • It has the potential to look extremely cool and maybe even vaguely technically impressive.

And, cons:

  • It’s definitely more complicated! But I only have to write the code once, and since the game won’t be doing anything but drawing dialogue while the box is up, I don’t think I’ll be in danger of blowing my CPU budget.

  • Colored text becomes a bit trickier. But still possible, so, we can worry about that later.

Well, I’m sold. Let’s give it a shot.

First pass

Well, I want to do something on a button press, so, let’s do that.

A lot of games (older ones especially) have bugs from switching “modes” in the same frame that something else happens. I don’t entirely understand why that’s so common and should probably ask some speedrunners, but I should be fine if I do mode-switching first thing in the frame, and then start over a new frame when switching back to “world” mode. Right? Sure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; ... button reading code in main loop ...
    bit BUTTON_A, a
    jp nz, .do_show_dialogue

    ; ... main loop ...

    ; Loop again when done
    jp vblank_loop

.do_show_dialogue:
    call show_dialogue
    jp vblank_loop

The extra level of indirection added by .do_show_dialogue is just so the dialogue code itself isn’t responsible for knowing where the main loop point is; it can just ret.

Now to actually do something. This is a first pass, so I want to do as little as possible. I’ll definitely need a palette for drawing the text — and here I’m cutting into my 8-palette budget again, which I don’t love, but I can figure that out later. (Maybe with some shenanigans involving changing the palettes mid-redraw, even.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
PALETTE_TEXT:
    ; Black background, white text...  then gray shadow, maybe?
    dcolor $000000
    dcolor $ffffff
    dcolor $999999
    dcolor $666666

show_dialogue:
    ; Have to disable the LCD to do video work.  Later I can do
    ; a less jarring transition
    DisableLCD

    ; Copy the palette into slot 7 for now
    ld a, %10111000
    ld [rBCPS], a
    ld hl, PALETTE_TEXT
    REPT 8
    ld a, [hl+]
    ld [rBCPD], a
    ENDR

I also know ahead of time what chars will need to go where on the screen, so I can fill them in now.

Note that I really ought to blank them all out, especially since they may still contain text from some previous dialogue, but I don’t do that yet.

An obvious question is: which tiles? I think I said before that with 512 chars available, and ¾ of those still being enough to cover the entire screen in unique chars, I’m okay with dedicating a quarter of my space to UI stuff, including text. To keep that stuff “out of the way”, I’ll put them at the “end” — bank 1, starting from $80.

I’m thinking of having characters be about the same proportions as in the Oracle games. Those games use 5 rows of tiles, like this:

1
2
3
4
5
top of line 1
bottom of line 1
top of line 2
bottom of line 2
blank

Since the font is aligned to the bottom and only peeks a little bit into the top char, the very top row is mostly blank, and that serves as a top margin. The bottom row is explicitly blank for a bottom margin that’s nearly the same size. The space at the top of line 2 then works as line spacing.

I’m not fixed to the grid, so I can control line spacing a little more explicitly. But I’ll get to that later and do something really simple for now, where $ff is a blank tile:

1
2
3
4
5
6
7
8
9
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|80|82|84|86|88|8a|8c|8e|90|92|94|96|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|81|83|85|87|89|8b|8d|8f|91|93|95|97|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+

This gives me a canvas for drawing a single line of text. The staggering means that the first letter will draw to adjacent chars $80 and $81, rather than distant cousins like $80 and $a0.

You may notice that the below code updates chars across the entire width of the grid, not merely the screen. There’s not really any good reason for that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
    ; Fill text rows with tiles (blank border, custom tiles)
    ; The screen has 144/8 = 18 rows, so skip the first 14 rows
    ld hl, $9800 + 32 * 14
    ; Top row, all tile 255
    ld a, 255
    ld c, 32
.loop1:
    ld [hl+], a
    dec c
    jr nz, .loop1

    ; Text row 1: 255 on the edges, then middle goes 128, 130, ...
    ld a, 255
    ld [hl+], a
    ld a, 128
    ld c, 30
.loop2:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop2
    ld a, 255
    ld [hl+], a

    ; Text row 2: same as above, but middle is 129, 131, ...
    ld a, 255
    ld [hl+], a
    ld a, 129
    ld c, 30
.loop3:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop3
    ld a, 255
    ld [hl+], a

    ; Bottom row, all tile 255
    ld a, 255
    ld c, 32
.loop4:
    ld [hl+], a
    dec c
    jr nz, .loop4

Now I need to repeat all of that, but in bank 1, to specify the char bank (1) and palette (7) for the corresponding tiles. Those are the same for the entire dialogue box, though, so this part is easier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    ; Switch to VRAM bank 1
    ld a, 1
    ldh [rVBK], a

    ld a, %00001111  ; bank 1, palette 7
    ld hl, $9800 + 32 * 14
    ld c, 32 * 4  ; 4 rows
.loop5:
    ld [hl+], a
    dec c
    jr nz, .loop5

    EnableLCD

Time to get some real work done. Which raises the question: how do I actually do this?

If you recall, each 8-pixel row of a char is stored in two bytes. The two-bit palette index for each pixel is split across the corresponding bit in each byte. If the leftmost pixel is palette index 01, then bit 7 in the first byte will be 0, and bit 7 in the second byte will be 1.

Now, a blank char is all zeroes. To write a (left-aligned) glyph into a blank char, all I need to do is… well, I could overwrite it, but I could just as well OR it. To write a second glyph into the unused space, all I need to do is shift it right by the width of the space used so far, and OR it on top. The unusual split layout of the palette data is actually handy here, because it means the size of the shift matches the number of pixels, and I don’t have to worry about overflow.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
0 0 0 0 0 0 0 0  <- blank glyph

1 1 1 1 0 0 0 0  <- some byte from the first glyph
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 0 0 0  <- ORed together to display first character

          1 1 1 1 0 0 0 0  <- some byte from the second glyph,
                              shifted by 4 (plus a kerning pixel)
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 1 1 1  <- ORed together to display first two characters

The obvious question is, well, what happens to the bits from the second character that didn’t fit? I’ll worry about that a bit later.

Oh, and finally, I’ll need a font, plus some text to display. This is still just a proof of concept, so I’ll add in a couple glyphs by hand.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
; somewhere in ROM
font:
; A
    ; First byte indicates the width of the glyph, which I need
    ; to know because the width varies!
    db 6
    dw `00000000
    dw `00000000
    dw `01110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11111000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
; B
    db 6
    dw `00000000
    dw `00000000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000

text:
    ; Shakespeare it ain't.
    ; Need to end with a NUL here so I know where the text
    ; ends.  This isn't C, there's no automatic termination!
    db "ABABAAA", 0

And here we go!

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
    ; ----------------------------------------------------------
    ; Setup done!  Real work begins here
    ; b: x-offset within current tile
    ; de: text cursor + current character tiles
    ; hl: current VRAM tile being drawn into
    ld b, 0
    ld de, text
    ld hl, $8800

    ; This loop waits for the next vblank, then draws a letter.
    ; Text thus displays at ~60 characters per second.
.next_letter:
    ; This is probably way more LCD disabling than is strictly
    ; necessary, but I don't want to worry about it yet
    EnableLCD
    call wait_for_vblank
    DisableLCD

    ld a, [de]                  ; get current character
    and a                       ; if NUL, we're done!
    jr z, .done
    inc de                      ; otherwise, increment

    ; Get the glyph from the font, which means computing
    ; font + 33 * a.
    ; A little register juggling.  hl points to the current
    ; char in VRAM being drawn to, but I can only do a 16-bit
    ; add into hl.  de I don't need until the next loop,
    ; since I already read from it.  So I'm going to push de
    ; AND hl, compute the glyph address in hl, put it in de,
    ; then restore hl.
    push de
    push hl
    ; The text is written in ASCII, but the glyphs start at 0
    sub a, 65
    ld hl, font
    ld de, 33                   ; 1 width byte + 16 * 2 tiles
    ; This could probably be faster with long multiplication
    and a
.letter_stride:
    jr z, .skip_letter_stride
    add hl, de
    dec a
    jr .letter_stride
.skip_letter_stride:
    ; Move the glyph address into de, and restore hl
    ld d, h
    ld e, l
    pop hl

    ; Read the first byte, which is the character width.  This
    ; overwrites the character, but I have the glyph address,
    ; so I don't need it any more
    ld a, [de]
    inc de

    ; Copy into current chars
    ; Part 1: Copy the left part into the current chars
    push af                     ; stash width
    ; A glyph is two chars or 32 bytes, so row_copy 32 times
    ld c, 32
    ; b is the next x position we're free to write to.
    ; Incrementing it here makes the inner loop simpler, since
    ; it can't be zero.  But it also means two jumps per loop,
    ; so, ultimately this was a pretty silly idea.
    inc b
.row_copy:
    ld a, [de]                  ; read next row of character

    ; Shift right by b places with an inner loop
    push bc                     ; preserve b while shifting
    dec b
.shift:                         ; shift right by b bits
    jr z, .done_shift
    srl a
    dec b
    jr .shift
.done_shift:
    pop bc

    ; Write the updated byte to VRAM
    or a, [hl]                  ; OR with current tile
    ld [hl+], a
    inc de
    dec c
    jr nz, .row_copy
    pop af                      ; restore width

    ; Part 2: Copy whatever's left into the next char
    ; TODO  :)

    ; Cleanup for next iteration
    ; Undo the b increment from way above
    dec b
    ; It's possible I overflowed into the next column, in which
    ; case I want to leave hl where it is: pointing at the next
    ; column.  Otherwise, I need to back it up to where it was.
    ; Of course, I also need to update b, the x offset.
    add a, b                    ; a <- new x offset
    ; If the new x offset is 8 or more, that's actually the next
    ; column
    cp a, 8
    jr nc, .wrap_to_next_tile
    ld bc, -32                  ; a < 8: back hl up
    add hl, bc
    jr .done_wrap
.wrap_to_next_tile:
    sub a, 8                    ; a >= 8: subtract tile width
    ld b, a
.done_wrap:
    ; Either way, store the new x offset into b
    ld b, a

    ; And loop!
    pop de                      ; pop text pointer
    jr .next_letter

.done:
    ; Undo any goofy stuff I did, and get outta here
    EnableLCD
    ; Remember to reset bank to 0!
    xor a
    ldh [rVBK], a
    ret

Phew! That was a lot, but hopefully it wasn’t too bad. I hit a few minor stumbling blocks, but as I recall, most of them were of the “I get the conditions backwards every single time I use cp augh” flavor. (In fact, if you look at the actual commit the above is based on, you may notice that I had the condition at the very end mixed up! It’s a miracle it managed to print part of the second letter at all.)

There are a lot of caveats in this first pass, including that there’s nothing to erase the dialogue box and reshow the map underneath it. (But I might end up using the window for this anyway, so there’s no need for that.)

As a proof of concept, though, it’s a great start!

Screenshot of Anise, with a black dialogue box that says: A|

That’s the letter A, followed by the first two pixel of the letter B. I didn’t implement the part where letters spill into the next column, yet.

Guess I’d better do that!

Second pass

One of the big problems with the first pass was that I had to turn the screen off to do the actual work safely. Shifting a bunch of bytes by some amount is a little slow, since I can only shift one bit at a time and have to do it within a loop, and vblank only lasts for about 6.5% of the entire duration of the frame.

SECTION “Text buffer”, WRAM0[$C200]
text_buffer:
; Text is up to 8×16 but may span two columns, so carve out
; enough space for four tiles
ds $40

SECTION “Text rendering”, ROM0
PALETTE_TEXT:
dcolor $000000
dcolor $ffffff
dcolor $999999
dcolor $666666

show_dialogue:
; TODO blank out the second half of bank 1 before all this, maybe on the fly to average out the cpu time

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
; TODO get rid of this with a slide-up effect
DisableLCD

; Set up palette
ld a, %10111000
ld [rBCPS], a
ld hl, PALETTE_TEXT
REPT 8
ld a, [hl+]
ld [rBCPD], a
ENDR

; Fill text rows with tiles (blank border, custom tiles)
ld hl, $9800 + 32 * 14
; Top row, all tile 255
ld a, 255
ld c, 32

.loop1:
ld [hl+], a
dec c
jr nz, .loop1
; Text row 1: 255 on the edges, then middle goes 128, 130, …
ld a, 255
ld [hl+], a
ld a, 128
ld c, 30
.loop2:
ld [hl+], a
add a, 2
dec c
jr nz, .loop2
ld a, 255
ld [hl+], a
; Text row 2: same as above, but middle is 129, 131, …
ld a, 255
ld [hl+], a
ld a, 129
ld c, 30
.loop3:
ld [hl+], a
add a, 2
dec c
jr nz, .loop3
ld a, 255
ld [hl+], a
; Bottom row, all tile 255
ld a, 255
ld c, 32
.loop4:
ld [hl+], a
dec c
jr nz, .loop4

1
2
3
4
5
6
7
; Repeat all of the above, but in bank 1, which specifies the character bank and palette.  Luckily, that's the same for everyone.
ld a, 1
ldh [rVBK], a
ld a, %00001111  ; bank 1, palette 7
ld hl, $9800 + 32 * 14
; Top row, all tile 255
ld c, 32 * 4  ; 4 rows

.loop5:
ld [hl+], a
dec c
jr nz, .loop5

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
EnableLCD

; Zero out the tile buffer
xor a
ld hl, text_buffer
ld c, $40
call fill

; ----------------------------------------------------------
; Setup done!  Real work begins here
; b: x-offset within current tile
; de: text cursor + current character tiles
; hl: current VRAM tile being drawn into + buffer pointer
ld b, 0
ld de, text
ld hl, $8800

; The basic problem here is to shift a byte and split it
; across two other bytes, like so:
;      yyyyy YYY
;   xxx00000 00000000
;           ↓
;   xxxyyyyy YYY00000
; To do this, we rotate the byte, mask the low bits, OR them
; with the first byte, restore it, mask the high bits, and
; then store that directly as the second byte (which should
; be all zeroes anyway).

.next_letter:
ld a, [de] ; get current character
and a ; if NUL, we’re done!
jp z, .done
inc de ; otherwise, increment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
; Get the font character
push de                     ; from here, de is tiles
; Alas, I can only add to hl, so I need to compute the font
; character address in hl and /then/ put it in de.  But I
; already pushed de, so I can use that as scratch space.
push hl
sub a, 65   ; TODO temporary
ld hl, font
ld de, 33                   ; 1 width byte + 16 * 2 tiles
; TODO can we speed striding up with long mult?
and a

.letter_stride:
jr z, .skip_letter_stride
add hl, de
dec a
jr .letter_stride
.skip_letter_stride:
ld d, h ; move char tile addr to de
ld e, l

1
2
3
4
5
6
7
8
ld a, [de]                  ; read width
inc de

; Copy into current tiles
push af                     ; stash width
ld c, 32                    ; 32 bytes per row
ld hl, text_buffer
inc b   ; FIXME? this makes the loop simpler since i only test after the dec, but it also is the 1px kerning between characters...

.row_copy:
ld a, [de] ; read next row of character
; Rotate right by b – 1 pixels
push bc ; preserve b while shifting
ld c, $ff ; create a mask
dec b
jr z, .skip_rotate
.rotate:
rrca
srl c
dec b
jr nz, .rotate
.skip_rotate:
push af
and a, c ; mask right pixels
; Draw to left half of text buffer
or a, [hl] ; OR with current tile
ld [hl+], a
; Write the remaining bits to right half
ld a, c ; put mask in a…
cpl ; …to invert it
ld c, a ; then put it back
pop af ; restore unmasked pixels
and a, c ; mask left pixels
ld [hl+], a ; and store them!
; Loop and cleanup
inc de ; next row of character
pop bc ; restore counter!
dec c
jr nz, .row_copy
pop af ; restore width

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
; Draw the buffered tiles to vram
; The text buffer is treated like it's 16 pixels wide, but
; VRAM is of course only 8 pixels wide, so we need to do
; this in two iterations: the left two tiles, then the right
; TODO explain this with a fucking diagram because i feel
; like i'm wrong about it anyway
pop hl                      ; restore hl (VRAM)
push af                     ; stash width, again
call wait_for_vblank        ; always wait before drawing
push bc
push de
; Draw the left two tiles
ld c, $20
ld de, text_buffer

.draw_left:
ld a, [de]
inc de
inc de
ld [hl+], a
dec c
jr nz, .draw_left
; Draw the right two tiles
ld c, $20
ld de, text_buffer + 1
.draw_right:
ld a, [de]
inc de
inc de
ld [hl+], a
dec c
jr nz, .draw_right
pop de
pop bc
pop af ; restore width, again

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
; Increment the pixel offset and deal with overflow
; TODO it's possible we're at 9 pixels wide, thanks to the
; kerning pixel, uh oh.  but that pixel would be empty,
; right?  wait, no, it comes /before/...  well fuck
; TODO actually that might make something weird happen due
; to the inc b above, maybe...?
add a, b                    ; a <- new x offset
ld bc, -32                  ; move the VRAM pointer back...
add hl, bc                  ; ...to the start of the tile
cp a, 8
jr nc, .wrap_to_next_tile
; The new offset is less than 8, so this character didn't
; draw into the next tile.  Move the VRAM pointer back
; another two tiles, to the column we started in
add hl, bc
jr .done_wrap

.wrap_to_next_tile:
; The new offset is 8 or more, so this character drew into
; the next tile. Subtract 8, but also shift the text buffer
; by copying all the “right” tiles over the “left” tiles
sub a, 8 ; a >= 8: subtract tile width
push hl
push af
ld hl, text_buffer + $40 – 1
ld c, $20
.shift_buffer:
ld a, [hl-]
ld [hl-], a
dec c
jr nz, .shift_buffer
pop af
pop hl
.done_wrap:
ld b, a ; either way, store into b

1
2
3
; Loop
pop de                      ; pop text pointer
jp .next_letter

.done:
EnableLCD ; TODO get rid of me with a buffer
; Remember to reset bank to 0!
xor a
ldh [rVBK], a ret

wait_for_vblank:
xor a ; clear the vblank flag
ld [vblank_flag], a
.vblank_loop:
halt ; wait for interrupt
ld a, [vblank_flag] ; was it a vblank interrupt?
and a
jr z, .vblank_loop ; if not, keep waiting ret

  • future ideas: how will this work with a status bar, how do i do portraits, how do i hide sprites behind this, how do i handle the map not being aligned (contrast with pokemon which draws the entire menu on the background)

lingering problems
– note on word wrapping

  • alignment, window
  • prompts will probably have to go inside the text box? hmm. that’s tricky.
  • portraits!

content/2016-10-20-word-wrapping-dialogue.markdown
– the dialogue box does not actually go away. but i think the window will solve this

To be continued

This work doesn’t correspond to a commit at all; it exists only as a local stash. I’ll clean it up later, once I figure out what to actually do with it.

Next time: dialogue! With moderately less suffering along the way!

Cheezball Rising: Resounding failure

Post Syndicated from Eevee original https://eev.ee/blog/2018/09/06/cheezball-rising-resounding-failure/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I cannot get a goddamn Game Boy to meow at me.

Previously: maps and sprites.
Next: text!

Recap

With the power of Aseprite, Tiled, and some Python I slopped together, the game has evolved beyond Test Art and into Regular Art.

Star Anise walking around a moon environment in-game, animated in all four directions

I’ve got so much work to do on this, so it’s time to prioritize. What is absolutely crucial to this game?

The answer, of course, is to make Anise meow. Specifically, to make him AOOOWR.

Brief audio primer

What we perceive as sound is the vibration of our eardrums, caused by vibration of the air against them. Eardrums can only move along a single axis (in or out), so no matter what chaotic things the air is doing, what we hear at a given instant is flattened down to a single scalar number: how far the eardrum has displaced from its normal position.

(There’s also a bunch of stuff about tiny hairs in the back of your ear, but, close enough. Also it’s really two numbers since you have two ears, but stereo channels tend to be handled separately.)

Digital audio is nothing more than a sequence of those numbers. Of course, we can’t record the displacement at every single instant, because there are infinitely many instants; instead, we take measurements (samples) at regular intervals. The interval is called the sample rate, is usually a very small fraction of a second, and is generally measured in Hertz/Hz (which just means “per second”). A very common sample rate is 44100 Hz, which means a measurement was taken every 0.0000227 seconds.

I say “measurement” but the same idea applies for generating sounds, which is what the Game Boy does. Want to make a square wave? Just generate a block of all the same positive sample, then another block of all the same negative sample, and alternate back and forth. That’s why it’s depicted as a square — that’s the graph of how the samples vary over time.

Okay! I hope that was enough because it’s like 80% of everything I know about audio. Let’s get to the Game Boy.

Game Boy audio

The Game Boy contains, within its mysterious depths, a teeny tiny synthesizer. It offers a vast array of four whole channels (instruments) to choose from: a square wave, also a square wave, a wavetable, and white noise. They can each be controlled with a handful of registers, and will continually produce whatever tone they’re configured for. By changing their parameters at regular intervals, you can create a pleasing sequence of varying tones, which you humans call “music”.

Making music is, I’m sure, going to be an absolute nightmare. What music authoring tools am I possibly going to dig up that exactly conform to the Game Boy hardware? I can’t even begin to imagine what this pipeline might look like.

Luckily, that’s not what this post is about, because I chickened out and tried something way easier instead.

Before I set out into the wilderness myself, I did want to get an emulator to create any kind of noise at all, just to give myself a starting point. There are an awful lot of audio twiddles, so I dug up a Game Boy sound tutorial.

I became a little skeptical when the author admitted they didn’t know what a square wave was, but they did provide a brief snippet of code at the end that’s claimed to produce a sound:

1
2
3
4
5
6
7
8
9
NR52_REG = 0x80;
NR51_REG = 0x11;
NR50_REG = 0x77;

NR10_REG = 0x1E;
NR11_REG = 0x10;
NR12_REG = 0xF3;
NR13_REG = 0x00;
NR14_REG = 0x87;

That’s C, written for the much-maligned GBDK, which for some reason uses regular assignment to write to a specific address? It’s easy enough to translate to rgbasm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
    ; Enable sound globally
    ld a, $80
    ldh [rAUDENA], a
    ; Enable channel 1 in stereo
    ld a, $11
    ldh [rAUDTERM], a
    ; Set volume
    ld a, $77
    ldh [rAUDVOL], a

    ; Configure channel 1.  See below
    ld a, $1e
    ldh [rAUD1SWEEP], a
    ld a, $10
    ldh [rAUD1LEN], a
    ld a, $f3
    ldh [rAUD1ENV], a
    ld a, $00
    ldh [rAUD1LOW], a
    ld a, $85
    ldh [rAUD1HIGH], a

It sounds like this.

Some explanation may be in order. This is a big ol’ mess and you could just as well read the wiki’s article on the sound controller, so feel free to skip ahead a bit.

First, the official names for all of the sound registers are terrible. They’re all named “NRxy” — “noise register” perhaps? — where x is the channel number (or 5 for master settings) and y is just whatever. Thankfully, hardware.inc provides some aliases that make a little more sense, and those are what I’ve used above.

The very first thing I have to do is set the high bit of AUDENA (NR52), which toggles sound on or off entirely. The sound system isn’t like the LCD, which I might turn off temporarily while doing a lot of graphics loading; when the high bit of AUDENA is off, all the other sound registers are wiped to zero and cannot be written until sound is enabled again.

The other important master registers are AUDVOL (NR50) and AUDTERM (NR51). Both of them are split into two identical nybbles, each controlling the left or right output channel. AUDVOL controls the master volume, from 0 to 7. (As I understand it, the high bit is used to enable audio output from extra synthesizer hardware on the cartridge, a feature I don’t believe any game ever actually used.) AUDTERM enables channels/instruments, one bit per channel. The above code turns on channel 1, the square wave, at max volume in stereo.

Then there’s just, you know, sound stuff.

AUD1HIGH (NR14) and AUD1LOW (NR13) are a bit of a clusterfuck, and one shared by all except the white noise channel. The high bit of AUD1HIGH is the “init” bit and triggers the sound to actually play (or restart), which is why it’s set last. The second highest bit, bit 6, controls timing: if it’s set, then the channel will only play for as long as a time given by AUD1LEN; if not, the channel will play indefinitely.

Finally, the interesting part: the lower three bits of AUD1HIGH and the entirety of AUD1LOW combine to make an 11-bit frequency. Or, rather, if those 11 bits are \(n\), then the frequency is \(\frac{131072}{2048-n}\). (Since their value appears in the denominator, they really express… inverse time, not frequency, but that’s neither here nor there.) The code above sets that 11-bit value to $500, for a frequency of 171 Hz, which in A440 is about an F3.

AUD1SWEEP (NR10) can automatically slide the frequency over time. It distinguishes channel 1 from channel 2, which is otherwise identical but doesn’t have sweep functionality. The lower three bits are the magnitude of each change; bit 3 is a sign bit (0 for up, 1 for down), and bits 6–4 are a time that control how often the frequency changes. (Setting the time to zero disables the sweep.) Given a magnitude of \(n\) and time \(t\), every \(\frac{t}{128}\) seconds, the frequency is multiplied by \(1 ± \frac{1}{2^n}\).

Note that when I say “frequency” here, I’m referring to the 11-bit “frequency” value, not the actual frequency in Hz. A “frequency” of $400 corresponds to 128 Hz, but halving it to $200 produces 85 Hz, a decrease of about a third. Doubling it is impossible, because $800 doesn’t fit in 11 bits. This setup seems, ah, interesting to make music with. Can’t wait!

The above code sets this register to $1e, so \(t = 1\), \(n = 6\), and the frequency is decreasing; thus every \(\frac{1}{128}\) seconds, the “frequency” drops by \(\frac{1}{64}\).

Next is AUD1LEN (NR11), so named because its lower six bits set how long the sound will play. Again we have inverse time: given a value \(t\) in the low six bits, the sound will play for \(\frac{64-t}{256}\) seconds. Here those six bits are &x#24;10 or 16, so the sound lasts for \(\frac{48}{256} = \frac{3}{16} = 0.1875\) seconds. Except… as mentioned above, this only applies if bit 6 of AUD1HIGH is set, which it isn’t, so this doesn’t apply at all and there’s no point in setting any of these bits. Hm.

The two high bits of AUD1LEN select the duty cycle, which is how long the square wave is high versus low. (A “normal” square wave thus has a duty of 50%.) Our value of 0 selects 12.5% high; the other values are 25% for 1, 50% for 2, or 75% for 3. I do wonder if the author of this code meant to use 50% duty and put the bit in the wrong place? If so, AUD1LEN should be $80, not $10.

Finally, AUD1ENV selects the volume envelope, which can increase or decrease over time. Curiously, the resolution is higher here than in AUDVOL — the entire high nybble is the value of the envelope. This value can be changed automatically over time in increments of 1: bit 3 controls the direction (0 to decrease, 1 to increase) and the low three bits control how often the value changes, counted in \(\frac{1}{64}\) seconds. For our value of $f3, the volume starts out at max and decreases every \(\frac{3}{64}\) seconds, so it’ll stop completely (or at least be muted?) after fifteen steps or \(\frac{45}{64} ≈ 0.7\) seconds.

And hey, that’s all more or less what I see if I record mGBA’s output in Audacity!

Waveform of the above sound

Boy! What a horrible slog. Don’t worry; that’s a good 75% of everything there is to know about the sound registers. The second square wave is exactly the same except it can’t do a frequency sweep. The white noise channel is similar, except that instead of frequency, it has a few knobs for controlling how the noise is generated. And the waveform channel is what the rest of this post is about—

Hang on!” I hear you cry. “That’s a mighty funny-looking ‘square’ wave.”

It sure is! The Game Boy has some mighty funny sound hardware. Don’t worry about it. I don’t have any explanation, anyway. I know the weird slope shapes are due to a high-pass filter capacitor that constantly degrades the signal gradually towards silence, but I don’t know why the waveform isn’t centered at zero. (Note that mGBA has a bug and currently generates audio inverted, which is hard to notice audibly but which means the above graph is upside-down.)

The thing I actually wanted to do

Right, back to the thing I actually wanted to do.

I have a sound. I want to play it on a Game Boy. I know this is possible, because Pokémon Yellow does it.

Channel 3 is a wavetable channel, which means I can define a completely arbitrary waveform (read: sound) and channel 3 will play it for me. The correct approach seems obvious: slice the sound into small chunks and ask channel 3 to play them in sequence.

How hard could this possibly be?

Channel 3

Channel 3 plays a waveform from waveform RAM, which is a block of 16 bytes in register space, from $FF30 through $FF3F. Each nybble is one sample, so I have 32 samples whose values can range from 0 to 15.

32 samples is not a whole lot; remember, a common audio rate is 44100 Hz. To keep that up, I’d need to fill the buffer almost 1400 times per second. I can use a lower sample rate, but what? I guess I’ll figure that out later.

First things first: I need to take my sound and cram it into this format, somehow. Here’s the sound I’m starting with.

The original recording was a bit quiet, so I popped it open in Audacity and stretched it to max volume. I only have 4-bit samples, remember, and trying to cram a quiet sound into a low bitrate will lose most of the detail.

(A very weird thing about sound is that samples are really just measurements of volume. Every feature of sound is nothing more than a change in volume.)

Now I need to turn this into a sequence of nybbles. From previous adventures, I know that Python has a handy wave module for reading sample data directly from a WAV file, and so I wrote a crappy resampler:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import wave

TARGET_RATE = 32768

with wave.open('aowr.wav') as w:
    nchannels, sample_width, framerate, nframes, _, _ = w.getparams()
    outdata = bytearray()
    gbdata = bytearray()

    frames_per_note = framerate // TARGET_RATE
    nybble = None
    while True:
        data = w.readframes(frames_per_note)
        if not data:
            break

        n = 0
        total = 0
        # Left and right channels are interleaved; this will pick up data from only channel 0
        for i in range(0, len(data), nchannels * sample_width):
            frame = int.from_bytes(data[i : i + sample_width], 'little', signed=True)
            n += 1
            total += frame

        # Crush the new sample to a nybble
        crushed_frame = int(total / n) >> (sample_width * 8 - 4)
        # Expand it back to the full sample size, to make a WAV simulating how it should sound
        encoded_crushed_frame = (crushed_frame << (sample_width * 8 - 4)).to_bytes(2, 'little', signed=True)
        outdata.extend(encoded_crushed_frame * (nchannels * frames_per_note))

        # Combine every two nybbles together.  The manual shows that the high nybble plays first.
        # WAV data is signed, but Game Boy nybbles are not, so add the rough midpoint of 7
        if nybble is None:
            nybble = crushed_frame + 7
        else:
            byte = (nybble << 4) | (crushed_frame + 7)
            gbdata.append(byte)
            nybble = None

    with wave.open('aowrcrush.wav', 'wb') as wout:
        wout.setparams(w.getparams())
        wout.writeframes(outdata)

with open('build/aowr.dat', 'wb') as f:
    f.write(gbdata)

This is incredibly bad. It integer-divides the original rate by the target rate, so if I try to resample 44100 to 32768, I’ll end up recreating the same sound again.

I don’t know why I started with 32768, either. The resulting data is too big to even fit in a section! Kicking it down to 8192 is a bit better (5 samples to 1, so the real final rate is 8820), but if I get any smaller, too many samples cancel each other out and I end up with silence! I have no idea what I am doing help.

The aowrcrush.wav file sounds a little atrocious, fair warning.

But it seems to be correct, if I open it alongside the original:

Waveforms of the original sound and its bitcrushed form; the latter is very blocky

Crushing it to four bits caused the graph to stay fixed to only 16 possible values, which is why it’s less smooth. Reducing the sample rate made each sample last longer, which is why it’s made up of short horizontal chunks. (I resampled it back to 44100 for this comparison, so really it’s made of short horizontal chunks because each sample appears five times; Audacity wouldn’t show an actual 8192 Hz file like this.)

It doesn’t sound great, but maybe it’ll be softened when played through a Game Boy. Worst case, I can try cleaning it up later. Let’s get to the good part: playing it!

Playing with channel 3

Here we go! First the global setup stuff I had before.

1
2
3
4
5
6
7
8
9
    ; Enable sound globally
    ld a, $80
    ldh [rAUDENA], a
    ; Map instruments to channels
    ld a, $44
    ldh [rAUDTERM], a
    ; Set volume
    ld a, $77
    ldh [rAUDVOL], a

Then some bits specific to channel 3.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ld a, $80
    ldh [rAUD3ENA], a
    ld a, $ff
    ldh [rAUD3LEN], a
    ld a, $20
    ldh [rAUD3LEVEL], a
SAMPLE_RATE EQU 8192
CH3_FREQUENCY set 2048 - 65536/(SAMPLE_RATE / 32)
    ld a, LOW(CH3_FREQUENCY)
    ldh [rAUD3LOW], a
    ld a, $80 | HIGH(CH3_FREQUENCY)
    ldh [rAUD3HIGH], a

Channel 3 has its own bit for toggling it on or off in AUD3ENA (NR30); none of the other bits are used. The other new register is AUD3LEVEL (NR32), which is sort of a global volume control. The only bits used are 6 and 5, which make a two-bit selector. The options are:

  • 00: mute
  • 01: play nybbles as given
  • 10: play nybbles shifted right 1
  • 11: play nybbles shifted right 2

Three of those are obviously useless, so 01 it is! That’s where I get the $20.

Figuring out the frequency is a little more clumsy. I used some rgbasm features here to do it for me, and it took a bit of fiddling to get it right. For example, why am I using 65536 instead of 131072, the factor I said was used for the square wave?

The answer is that for the longest time I kept getting this absolutely horrible output, recorded directly from mGBA:

I had no idea what this was supposed to be. Turns out it’s, well, roughly what happens when you halve the Game Boy’s idea of frequency. I finally found out this coefficient was different from the gbdev wiki. I’m guessing the factor of 2 has something to do with there being two nybbles per byte?

Then there’s the division by 32, which neither the manual nor the gbdev wiki mention. The frequency isn’t actually the time it takes to play one sample, but the time it takes to play the entire buffer. Which does make some sense — the “normal” use for the channel 3 is as a custom instrument, so you’d want to apply the frequency to the entire waveform to get the right notes out. This was even more of a nightmare to figure out, since it produced… well, mostly just garbage. I’ll leave it to your imagination.

1
2
3
4
    ld a, 256 - 4096 / (SAMPLE_RATE / 32)
    ldh [rTMA], a
    ld a, 4
    ldh [rTAC], a

Oho! TMA and TAC are new.

The CPU has a timer register, TIMA, which counts up every… well, every so often. It’s only a single byte, and when it overflows, it generates a timer interrupt. It then resets to the value of TMA.

TAC is the timer controller. Bit 2 enables the timer, and the lower two bits select how fast the clock counts up.

Above, I’m using clock speed 00, which is 4096 Hz. The expression for TMA computes SAMPLE_RATE / 32, which is the number of times per second that the entire waveform should play, and then divides that into 4096 to get the number of timer ticks that the waveform plays for. Subtract that from 256, and I have the value TIMA should start with to ensure that it overflows at the right intervals.

I note that this will cause a timer interrupt 256 times per second, which sounds like a lot on a CPU-constrained system. It’s only 4 or 5 interrupts per frame, though, so maybe it won’t intrude too much. I’ll burn down that bridge when I come to it.

Now I just need to enable timer interrupts:

1
2
3
4
start:
    ; Enable interrupts
    ld a, IEF_TIMER | IEF_VBLANK
    ldh [rIE], a

And of course do a call in the timer interrupt, which you may remember is a fixed place in the header:

1
2
3
SECTION "Timer overflow interrupt", ROM0[$0050]
    call update_aowr
    reti

One last gotcha: I discovered that timer interrupts can fire during OAM DMA, a time when most of the memory map is inaccessible. That’s pretty bad! So I also added di and ei around my DMA call.

Okay! I’m so close! All that’s left is the implementation of update_aowr.

Updating the waveform

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
aowr:
INCBIN "build/aowr.dat"
aowr_end:

; ...

update_aowr:
    push hl
    push bc
    push de
    push af

    ; The current play position is stored in music_offset, a
    ; word in RAM somewhere.  Load its value into de
    ld hl, music_offset
    ld d, [hl]
    inc hl
    ld e, [hl]

    ; Compare this to aowr_end.  If it's >=, we've reached the
    ; end of the sound, so stop here.  (Note that the timer
    ; interrupt will keep firing!  This code is a first pass.)
    ld hl, aowr_end
    ld a, d
    cp a, h
    jr nc, .done
    jr nz, .continue
    ld a, e
    cp a, l
    jr nc, .done
    jr z, .done
.continue:

    ; Copy the play position back into hl, and copy 16 bytes
    ; into waveform RAM.  This unrolled loop is as quick as
    ; possible, to keep the gap between chunks short.
    ld h, d
    ld l, e
_addr = _AUD3WAVERAM
    REPT 16
    ld a, [hl+]
    ldh [_addr], a
_addr = _addr + 1
    ENDR

    ; Write the new play position into music_offset
    ld d, h
    ld e, l
    ld hl, music_offset
    ld [hl], d
    inc hl
    ld [hl], e
.done:
    pop af
    pop de
    pop bc
    pop hl
    ret

Perfect! Let’s give it a try.

Hey, that’s not too bad! I can see wiring that up to a button and pressing it relentlessly. It’s a bit rough, but it’s not bad for this first attempt.

That was mGBA, though, and I’ve had surprising problems before because I was reading or writing when the actual hardware wouldn’t let me. I guess it wouldn’t hurt to try in bgb. (warning: very bad)

OH NO

What has happened.

Tragedy

A lot of fussing around, reading about obscure trivia, and being directed to SamplePlayer taught me a valuable lesson: you cannot write to waveform RAM while the wave channel is playing.

Okay. No problem. I’ll just turn it off, write to wave RAM, then turn it back on. Turning it off clears the frequency, but that’s fine, I can just write it again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; Disable channel 3 to allow writing to wave RAM
    xor a
    ldh [rAUD3ENA], a

    ; ... do the copy ...

    ld a, $80
    ldh [rAUD3ENA], a
    ld a, LOW(CH3_FREQUENCY)
    ldh [rAUD3LOW], a
    ld a, $80 | HIGH(CH3_FREQUENCY)
    ldh [rAUD3HIGH], a

Okay! Perfect! I’m so ready for a meow!!!

why god why

This is what I get in mGBA and SameBoy. Ironically, it plays fine in bgb.

It seems I have come to an impasse.

Why

After a Herculean amount of debugging and discussion with people who actually know what they’re talking about, here’s what I understand to be happening.

When the wave channel first starts playing, it doesn’t correctly read the very first nybble; instead, it uses the high nybble of whatever was already in its own internal buffer.

Disabling the wave channel sets its internal buffer to all zeroes.

I disable the wave channel every time it plays. Effectively, every 32nd sample starting with the first is treated as zero, which is the most extreme negative value, which is why the playback looks like this (bearing in mind that mGBA’s audio is currently upside-down):

The above sound's waveform, which resembles the original, but with regularly spaced spikes

For whatever reason, bgb doesn’t emulate this spiking, so it plays fine. I’m told the spiking also happens on actual hardware, but the speakers are cheap so it’s harder to notice.

SamplePlayer isn’t much help here, because it’s subject to the same problem.

A ray of hope, dashed

But wait! There’s one last thing I can try. Pokémon Yellow has freeform sounds in it, and it doesn’t have this spiking! There’s even a fan disassembly of it!

Alas. Pokémon Yellow doesn’t use channel 3 to play back sounds. It uses channel 1.

How, you ask? Remember when I said earlier that hearing is really just detecting changes in volume? Pokémon Yellow plays a constant square wave and simply toggles it on and off, very rapidly. Channel 3 is 4-bit; the sounds Pokémon Yellow plays are 1-bit, on or off. It’s baffling, but it does work.

I don’t think it’ll work for me, since that means 32 times as many interrupts. In fact, Pokémon Yellow uses a busy loop as a timer, so it effectively freezes the entire rest of the game anytime it plays a Pikachu sound. I’d rather not do that, but… I don’t seem to have a lot of options.

And so I’ve reached a dead end. The spiking seems to be a fundamental bug with the Game Boy sound hardware. I’ve found evidence that it may even still exist in the GBA, which uses a superset of the same hardware. I can’t fix it, I don’t see how to work around it, and it sounds really incredibly bad.

After days of effort trying to get this to work, I had to shelve it.

The title of this post is a sort of pun, you see, a play on words—

To be continued

This work doesn’t correspond to a commit at all; it exists only as a local stash.

Next time: dialogue! And this time it works!

Cheezball Rising: Maps and sprites

Post Syndicated from Eevee original https://eev.ee/blog/2018/07/15/cheezball-rising-maps-and-sprites/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I get a little asset pipeline working and finally have a real map.

Previously: spring cleaning.
Next: resounding failure.

Recap

The last post only covered some minor problems (including, I grant you, being totally broken), so the current state of the game is basically unchanged from before.

A space cat roams around on a grassy background

That grass pattern, the grass sprite itself, and the color scheme are all hardcoded — written directly into the source code, by hand. If this game is going to get very far at all, I urgently need a better way to inject some art.

Constraints

The Game Boy imposes some fairly harsh constraints on the artwork — which is part of the charm! But now I have to figure out how to work within those constraints most effectively. Here’s what I’ve got to work with.

Bear in mind that I intend for the game to be based around 16×16, um, tiles. Okay, it’s extremely confusing that “tile” might refer either to the base size of the artwork or to the Game Boy’s native 8×8 tiles, so I’m going to call the art tiles and the Game Boy’s basic unit a character (which is what the manual does).

  • The background layer is a grid of 8×8 characters, each of which uses one of eight 4-color background palettes.

  • The object layer is a set of 8×16 character pairs, each of which uses one of eight 3-color object palettes. These palettes are 3-color because color 0 is always transparent.

  • No more than 40 objects can appear on screen at the same time. (There is a way to weasel past this limit, but it requires considerable trickery.)

  • No more than 10 objects can appear in the same row of pixels. (I believe this is a hard limit.)

  • There are three blocks of 256 chars each. I can divide this between the background and objects more or less however I want, though neither can have more than two blocks (= 512 chars).

I’m intending for the game to be based around a 16×16 grid, a fairly common size for the Game Boy. That makes me a little concerned about the per-row object limit — each entity will need to have two Game Boy objects side by side, so I’m really limited to only five entities sharing the same row of pixels. I can’t do much about that quite yet (and only have one entity anyway), but it’s likely to affect how I design maps and draw sprites.

The next biggest problem is colors. Each object palette can only have three colors, which in practice means a shadow/outline color, a highlight color, and a base color. This is why every NPC and overworld critter in Pokémon GSC and the Zeldas is basically monochromatic. They pull it off really well by making very effective use of the highlight and shadow colors.

Since 16×16 sprites are composed of multiple Game Boy objects, it’s possible to overcome this limit by giving each part of the sprite a different palette. Unfortunately, objects being 8×16 means the sprites are split vertically, when it would be most useful to have different colors for e.g. the head and body. I wish the Game Boy supported 16×8 objects! That’d help a ton with the per-row limit, too. Alas, a few decades too late to change it now.


As for the number of chars… well, let’s see. The whole screen is only 160×144, which is 20×18 or 360 chars, so I could allocate two blocks to the background and have 512 — more than enough to cover the entire screen in unique chars! (I expect one block to be more than enough for objects, since I can only show 80 object chars at once anyway.)

On the other hand, I’ll need to reserve some of that space for text and UI and whatnot, and each 16×16 tile is composed of four chars. If I very generously allocate a whole block to window dressing (enough for all of ISO-8859-1?), that leaves 256 chars, which is 64 tiles, which is a tileset that fits in an eight-by-eight square.

For comparison’s sake, even fox flux’s relatively limited tileset is a sixteen-tile square — four times as big. This feels a little dire.

But how can it be dire, when I have enough sprite space to fill the screen and then some?

Let’s see here. A pretty good chunk of the fox flux tileset is unused or outright blank. Some of these tiles are art for moving objects that happened to fit in the grid, and those wouldn’t be in the background tileset. And while all of the tiles are distinct, a lot of the basic terrain has some significant overlap:

A set of dirt tiles from fox flux, colored to indicate where different tiles have identical corners

All of the regions of the same color are identical. These 9 distinct tiles could fit into 20 chars if they shared the common parts, rather than the 36 required to naïvely cutting each one into four dedicated chars.

(The fox flux grid is 32×32, so everything is twice as big as it will be on the Game Boy, but you get the idea.)

I’m feeling a little better about this, especially knowing I do have enough space to cover the whole screen. Worst case, I could draw the map as though it were a single bitmap. I don’t want to have to rely on that if I can get away with it, though — I suspect I’d need to constantly load chars on the fly, and copying stuff around eats into my CPU budget surprisingly quickly.

Research

That does get me wondering: what, exactly, do the Oracle games do? I haven’t done any precise measurements, but I’m pretty sure they have more than sixty-four distinct map tiles throughout their large connected worlds. Let’s have a look!

Oracle of Ages and its live tilemap, in the graveyard, showing the graveyard tileset

Here I am in the graveyard near the start of Oracle of Ages. The “creepy tree” here is distinct and doesn’t really appear anywhere else, so I found it in the tile viewer (lower right) and will be keeping an eye on it. Note that only the left half of the face is visible; the right half is using the same tiles, flipped horizontally. (The colors are different because the tile viewer shows the literal colors, whereas the game itself is being drawn with a shader.)

Let’s walk left one screen.

Oracle of Ages and its live tilemap, outside of the graveyard

Now, this is interesting. The creepy tree is still on the screen here, so its tiles are naturally still loaded. But a bunch of tiles on the left — parts of the dungeon entrance and other graveyard things — have been replaced by town tiles. I’m several screens away from the town!

The next screen up has no creepy trees, but its tiles remain. Of course, they’d have to, since the creepy tree is still visible during a transition. I have to go left from there before the tree disappears:

Oracle of Ages and its live tilemap, with tiles spelling SHOP clearly visible

Wow! At a glance, this looks like enough tiles to draw the entire town.

This is fascinating. The Oracle games have several transitions between major areas, marked by fade-outs or palette changes — the purple-tinted graveyard is an obvious example. But it looks like there are also minor transitions that update the tileset while I’m still several screens away from where those tiles are used. The screens around the transition only use common tiles like grass and regular trees, so I never notice anything is happening.

That’s cute, clever, and an easy way to make screen transitions work without having to figure out what tiles are becoming unused as they slide off the screen!

At this point I realize I may be getting ahead of myself. Screen transitions? I don’t have a map yet! Hell, I don’t even have a camera. Time to back up and make something I can build on.

Designing a tileset

I’m pretty tired of manually translating art into bits. It’s 2018, dammit. I want to use all the regular tools I would use for this, I want the Game Boy’s limitations to be expressed as simply as possible, and I want minimal friction between the source artwork and the game.

Here’s my idea. I know I only have 8 palettes to work with, so I’m decreeing that tilesets will be stored as paletted PNGs. The first four colors in the image palette will become the first Game Boy palette; the next four colors become the second Game Boy palette; and so on. If I then resize Aseprite’s palette panel to be four colors wide, I’ll have an instant view of all my available combinations of colors.

This already has some problems — for starters, if the same color appears in multiple palettes (which will almost certainly happen, for the sake of cohesion), I’m very likely to confuse the hell out of myself. I also have no idea how to extend this into multiple tilesets, but for now I’ll pretend the entire game world only uses a single tileset.

I could instead dynamically infer the palettes based on what combinations of colors are actually used, but after more than a couple tiles, it would be a nightmare for a human to keep track of what those combinations are. With this approach, all a human needs to do is color-drop a pixel from a particular tile and look at what row the color’s in.

After a quick jaunt into the pixel mines, here are some tiles.

A small set of pastel yellow moon tiles

Or, as viewed in Aseprite:

The same set of tiles, as seen in an editor, with the four-color palette visible

That’s only one palette, but hopefully you can see what I’m going for here. It’s enough to get started.

At this point, I started writing a little Python script that used Pillow to inspect the colors and pixels and dump them out to rgbasm-flavored source code. The script itself is not especially interesting: run through each 8×8 block of pixels, look at each pixel’s palette index, mod 4 to get the index within the Game Boy palette, print out as backtick literals. (I could spit out raw binary data, but I wanted to be able to inspect the intermediate form easily. Maybe later.)

The results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
SECTION "Map dumping test", ROM0
TEST_PALETTES:
    dw %0101011110111101
    dw %0101011100011110
    dw %0100101010111100
    dw %0100011001111000
    ; ... enough zeroes to make eight palettes ...
; sorry, in the script I was calling them "tiles", not "chars"
TEST_TILES:
    ; tile 0 at 0, 0
    dw `00001000
    dw `00000000
    dw `00100000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `20000000
    dw `20000002
    ; ... etc ...

And hey, I already have code that can load palettes and chars, so all I have to do is swap out the old labels for these ones.

Now I have a tileset I can load into the game, which is very exciting, except that I can’t see any of them because I still don’t have a map. I could draw a test map by hand, I suppose, but the whole point of this exercise was to avoid ever doing that again.

Drawing a map

In keeping with the “it’s 2018 dammit” approach, I elect to use Tiled for drawing the maps. I’ve used it for several LÖVE games, and while its general-purposeness makes it a little clumsy at times, it’s flexible enough to express basically anything.

I make a tileset and create a map. I choose 256×256 pixels (16×16 tiles), the same size as the Game Boy screen buffer, and fill it with arbitrary terrain. In retrospect, I probably should’ve made it the size of the screen, since I still don’t have a camera. Oh, well.

Here, I hit a minor roadblock. I want to do as much work as possible upfront, so I want to store the map in the ROM as chars, not tiles. That means I need to know what chars make up each tile, which is determined by the script that converts the image to char data. Multiple maps might use the same tileset, and a map might use multiple tilesets, so it seems like I’ll need some intermediate build assets with this information…

(In retrospect again, I realize that the game may need to know about tiles rather than just chars, since there’ll surely be at least a few map tiles that act like entities — switches and the like — and those need to function as single units. I guess I’ll work that out later.)

This is all looking like an awful lot of messing around (and a lot of potential points of failure) before I can get anything on the dang screen. I waffle for a bit, then decide to start with a single step that simultaneously dumps the tiles and the map. I can split it up when I actually have more than one of either.

You can check out the resulting script if you like, but again, I don’t think it’s particularly interesting. It enforces a few more constraints than before, and adds a TEST_MAP_1 label containing all the char data, row by row. Loading that into VRAM is almost comically simple:

1
2
3
4
5
    ; Read from the test map
    ld hl, $9800
    ld de, TEST_MAP_1
    ld bc, 1024
    call copy16

The screen buffer is 32×32 chars, or 1024 bytes. As you may suspect, copy16 is like copy, but it takes a 16-bit count in bc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
; copy bc bytes from de to hl
; NOTE: bc must not be zero
copy16:
    ld a, [de]
    inc de
    ld [hl+], a
    dec bc
    ; dec bc doesn't set flags, so gotta check by hand
    ld a, b
    or a, c
    jr nz, copy16
    ret

Hm. It’s a little harder to justify the bc = 0 case as a feature here, since that would try to overwrite every single byte in the entire address space. Don’t do that, then.

Anise, in-game, walking on the moon tiles

Now, at long long last, I have a background with some actual art! It’s starting to feel like something! I’ve even got something resembling a workflow.

My desktop, showing the moon tiles in an image editor, the map put together in Tiled, and the game running in mGBA

All in a day’s work. Good time to call it, right?

Except

I just wrote this char loading code…

And there’s still one thing still hardcoded…

I wonder if I could do something about that…?

Sprites

Above, I conspicuously did not mention how I integrated the Python script into the build system. And, well, I didn’t do that. I ran it manually and put it somewhere and committed it all as-is. You currently (still!) can’t actually build the game without repeating my steps. You can’t even just put the output in the right place, because you also have to delete some debug output from the middle of the file.

It gets worse! Here’s how.

I have some Anise walking sprites, too, drawn in Aseprite. They’re pretty cute and I’d love to have them in the game, now that I have some Real Art™ for the background.

Star Anise, walking forwards

Why not throw these at the same script and hack them into animating?

Unfortunately, this introduces a bit of manual work, as animation often does. (My kingdom for a way to embed a small simple animation in a larger spritesheet in Aseprite!) I’ve typically animated every critter in its own Aseprite file — or stacked several vertically in the same file when their animations are similar enough — and then exported as a sheet with the frames running off horizontally. You can see this at work in fox flux, e.g. on its critter sheet.

But Star Anise introduces a wrinkle that prevents even that slightly clumsy workflow from working.

You may have noticed that the walking sprite above blows the color budget considerably, using a whopping five colors. The secret is that Anise himself fits in a 16×16 square, and then his antenna is a third 8×16 sprite drawn on top. I can’t simply export him as a spritesheet, because the antenna needs to be separate, and it’s not even aligned to the grid. It doesn’t even stay in the same place consistently!

I could maybe hack something together that would automatically pull the incompatible pixels into a separate sprite. I might need to, since — spoiler alert — there are an awful lot of Lunekos in this game. For now, though, I did the dumbest thing that works and copied his frames to their own sheet by hand.

Star Anise's walking frames laid out in a spritesheet

The background is actually cyan, not transparent. I had to do this because my setup expects multiple sets of four colors — the first color in an object palette is still there, even if it’s ignored — and only one color in an indexed PNG can be transparent. (Don’t @ me about PNG pixel formats.) I could’ve adjusted it to work with sets of three colors and put the transparent one at the end so the palette column trick still worked, but… this was easier.

Here’s the best part: I took the main function from my tile loading script, copy-pasted it within the same file, and edited the copy to dump these sprites sans map. So now not only is there no build system, but half of the loading script is inaccessible! Sorry. We’re getting into experiment territory and I am going to start making a lot of messes while I figure out what I actually want.

Using these within the game was just as easy as before — replace some labels with new ones — and the only real change was to use a third OAM slot for the antenna. (The antenna has to appear first; when sprites overlap, the one with the lowest index appears on top.)

That did make updating OAM a little clumsy; you may recall that before, I loaded the x and y positions into b and c, updated them, then wrote them back into OAM:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    ; set b/c to the y/x coordinates
    ld hl, oam_buffer
    ld b, [hl]
    inc hl
    ld c, [hl]
    bit BUTTON_LEFT, a
    jr z, .skip_left
    dec c
.skip_left:
    bit BUTTON_RIGHT, a
    jr z, .skip_right
    inc c
.skip_right:
    bit BUTTON_UP, a
    jr z, .skip_up
    dec b
.skip_up:
    bit BUTTON_DOWN, a
    jr z, .skip_down
    inc b
.skip_down:
    ld [hl], c
    dec hl
    ld [hl], b
    ld a, c
    add a, 8
    ld hl, oam_buffer + 5
    ld [hl], a
    dec hl
    ld [hl], b

The above approach required that I hardcode the 8-pixel offset between the left and right halves. With the antenna in the mix, I would’ve had to hardcode another more convoluted offset, and I didn’t like the sound of that. So I changed it to inc and dec the OAM coordinates directly and immediately:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    ; Anise update loop
    ; set b/c to the y/x coordinates
    ld bc, 4
    bit BUTTON_LEFT, a
    jr z, .skip_left
    ld hl, oam_buffer + 1
    dec [hl]
    add hl, bc
    dec [hl]
    add hl, bc
    dec [hl]
.skip_left:
    ; ... etc ...

Eventually I should stop doing this and have an actual canonical x/y position for Anise somewhere. But I didn’t do that yet.

I did also take this opportunity to change my LCDC flags so that object chars start counting from zero at $9000, fixing the misunderstanding I had before. That’s nice.

Anyway, tada, Star Anise can slide around, but now with his antenna.

Not good enough.

Animating

It’s time to animate something. And this time around, all I’ve got are bytes to work with. Oh, boy!

Right out of the gate, I have two options. I could load all of Anise’s sprites into VRAM upfront and change the char numbers in OAM to animate him, or I could reserve some specific chars and overwrite them to animate him.

The first choice makes sense for an entity that might exist multiple times at once, like enemies or… virtually anything in the game world, really. But there’s only ever one player, and he’s likely to have a whole lot of spritework, which I would prefer not to have clogging up my char space for the entire duration of the game. So while I might use the other approach for most other things, I’m going to animate Anise by overwriting the actual graphics. Every frame.

First things first. I’m going to need some state, which I’ve been avoiding by relying on OAM. At the very least, I need to know which way Anise is facing — which isn’t necessarily the direction he’s moving, because he should keep his facing when he stops. I also need to know which animation frame he’s on, and how many LCD frames are left until he should advance to the next one.

Let’s refer to the time between vblanks as a “tic” for now, to avoid the ambiguity of a “frame” when talking about animation.

A good start, then, would be some constants.

1
2
3
4
5
6
FACING_DOWN   EQU 0
FACING_UP     EQU 1
FACING_RIGHT  EQU 2
FACING_LEFT   EQU 3

ANIMATION_LENGTH EQU 5

ANIMATION_LENGTH is the length of every frame. I don’t especially want to give every frame its own distinct duration if I can avoid it; this will be complicated enough as it is. I fiddled with the frame duration in Aseprite for a bit and landed on 83ms as a nice speed, and that’s 5 tics.

I also need a place for this state, so I add some more stuff to my RAM block.

1
2
3
4
5
6
anise_facing:
    db
anise_frame:
    db
anise_frame_countdown:
    db

And initialize it in setup.

1
2
3
4
    ld a, FACING_DOWN
    ld [anise_facing], a
    ld a, ANIMATION_LENGTH
    ld [anise_frame_countdown], a

Presumably, one day, I’ll have multiple entities, and they’ll all share a similar structure, which I’ll have to traverse manually. For now, it’s easier to follow the code if I give every field its own label.

I have four levels of hierarchy here: the spriteset (which for now is always Anise’s), the pose (I only have one: walking), the facing, and the frame. I need to traverse all four, but luckily I can ignore the first two for now.

I don’t want to animate Anise when he’s not moving, so I changed the OAM updating code to also ld d, 1 if there’s any movement at all, and skip over all the animation stuff if d is still zero.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; ... read input ...

    ; This was before I knew the 'or a' trick; these two ops
    ; could be replaced with 'xor a; or d'
    ld a, d
    cp a, 0
    jp z, .no_movement

    ; ... all the animation code will go here ...

.no_movement:
    ; and after this we repeat the main loop

This does have the side effect that Anise will simply freeze in mid-walk when stopped, rather than returning to his standing pose. I still haven’t fixed that; I could special-case it, but I usually treat “standing” as its own one-frame animation, so it feels like something that ought to come when I implement poses.

Next I decrement the countdown, which is the number of tics left until the frame ought to change. If this is nonzero, I don’t need to do anything.

1
2
3
4
5
6
    ld a, [anise_frame_countdown]
    dec a
    ld [anise_frame_countdown], a
    jp nz, .no_movement
    ld a, ANIMATION_LENGTH
    ld [anise_frame_countdown], a

Again, this isn’t actually right. If Anise’s state changes, such as between standing and walking, then this should be ignored because he’s switching to a new animation. But this is a pose thing again, so I’m deferring it until later.

Next I need to advance the current frame. I don’t have modulo on hand and even simple ifs are kind of annoying, so I was naughty here and used bitops to roll from frame 3 to frame 0. This would obviously not work if the number of frames were not a power of two.

1
2
3
4
    ld a, [anise_frame]
    inc a
    and a, 4 - 1
    ld [anise_frame], a

Yet again, if Anise changes direction, the frame should be reset to zero… but it ain’t.

Now, let’s think for a second. I know what frame I want. I have a label for the upper-left corner of the spritesheet, and I want to get to the upper-left corner of the appropriate frame. Each frame has 3 objects; each object has 2 chars; each char is 16 bytes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
    ld hl, ANISE_TEST_TILES
    ; Skip ahead 3 sprites * the current frame
    ld bc, 3 * 2 * 16
    ; Remember, zero iterations is also possible
    or a
    jr z, .skip_advancing_frame
.advance_frame:    
    add hl, bc
    dec a
    jr nz, .advance_frame
.skip_advancing_frame:
    ; Copy the sprites into VRAM
    ; They're consecutive in both the data and VRAM, so only
    ; one copy is necessary.  And bc is already right!
    ld d, h
    ld e, l
    ld hl, $8000
    call copy16

Hey, look at that!

Star Anise walking around in-game, now animated

Only one small problem: I forgot about facing, so Anise will always face forwards no matter how he moves. Whoops!

Facing

I need to actually track which way Anise is facing, which is a surprisingly subtle question. He might even be facing away from his own direction of movement, if for example he was thrown backwards by some external force.

A decent first approximation is to use the last button that was pressed. (That’s still not quite right — if you hold down, hold down+right, and then release right, he should obviously face down. But it’s a start.)

I don’t yet track which buttons were pressed this frame, but it’s easy enough to add. While I’m at it, I might as well track which buttons were released, too. I amend the input reading code thusly, based on the straightforward insight that a button was pressed this frame iff it is currently 1 and was previously 0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; a now contains the current buttons
    ld hl, buttons
    ld b, [hl]                  ; b <- previous buttons
    ld [hl], a                  ; a -> current buttons
    cpl
    and a, b
    ld [buttons_released], a    ; a = ~new & old, i.e. released
    ld a, [hl]                  ; a <- current buttons
    cpl
    or a, b
    cpl
    ld [buttons_pressed], a     ; a = ~(~new | old), i.e. pressed

I like that cute trick for getting the pressed buttons. I need a & ~b, but cpl only works on a, so I would’ve had to juggle a bunch of registers. But applying De Morgan’s law produces ~(~a | b), which only requires complementing a. (Full disclosure: I didn’t actually try register juggling, and for all I know it could end up shorter somehow.)

Next I check the just-pressed buttons and updating facing accordingly. It looks a lot like the code for checking the currently-held buttons, except that I only use the first button I find.

1
2
3
4
5
6
7
8
    ld hl, anise_facing
    ld a, [buttons_pressed]
    bit BUTTON_LEFT, a
    jr z, .skip_left2
    ld [hl], FACING_LEFT
    jr .skip_down2
.skip_left2:
    ; ... you get the idea ...

And finally, amend the sprite choosing code to pick the right facing, too.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
    ld hl, ANISE_TEST_TILES

    ; Skip ahead a number of /rows/, corresponding to facing
    ld a, [anise_facing]
    and a, %11                      ; cap to 4, just in case
    jr z, .skip_stride_row
    ; This is like before, but times 4 frames
    ld bc, 4 * 3 * 2 * 16
.stride_row:
    add hl, bc
    dec a
    jr nz, .stride_row
.skip_stride_row:

    ; Bumping the frame here is convenient, since it leaves the
    ; frame in a for the next part
    ld a, [anise_frame]
    inc a
    and a, 4 - 1
    ld [anise_frame], a

    ; ... continue on with picking the frame ...

Hardcoding the number of frames here is… unfortunate. I should probably flip the spritesheet so the frames go down and each column is a facing; then there’ll always be a fixed number of columns to skip over.

But who cares about that? Look at Anise go! Yeah!

Star Anise walking around in-game, now animated in all four directions

Well, yes, there is one final problem, which is that the antenna is misaligned when walking left or right… because its positioning is different than when walking up or down, and I don’t have any easy way to encode that at the moment. It’s still like that, in fact. I’m sure I’ll fix it eventually.

More vblank woes

I didn’t run into this problem until a little while later, but I might as well mention it now. The above code writes into VRAM in the middle of updating entities — updating them very simply, perhaps, but updating nonetheless. If that updating takes longer than vblank, the write will fail.

I expected this, though not quite so soon. It’s a disadvantage of swapping the char data rather than the char references: 32× more writing to do, which will take 32× longer. The solution is similar to what I do for OAM: defer the write until the next vblank. I’m already doing that with Anise’s position, anyway, and it makes no sense to have his position and animation updated on different frames.

I ended up special-casing this for Anise, though it wouldn’t be too hard to extend this into a queue of tiles to copy. It’s nothing too world-shaking; I just store the address of Anise’s current sprite in RAM, then copy it over during vblank, just after the OAM DMA.

I did try doing this with one of the Game Boy Color’s new features, general-purpose DMA, which can copy from basically anywhere in ROM or RAM to basically anywhere in VRAM. It involves five registers: you write the source address in the first two, the destination in the next two, and the length in the fifth, which triggers the copy. The CPU simply freezes until the copy is done, so there are no goofy timing issues here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ld hl, anise_sprites_address
    ld a, [hl+]
    ld [rHDMA1], a
    ld a, [hl]
    ld [rHDMA2], a
    ld a, HIGH($0000)
    ld [rHDMA3], a
    ld a, LOW($0000)
    ld [rHDMA4], a
    ; To copy X bytes, write X / 16 - 1 to this register
    ld a, (32 * 3) / 16 - 1
    ld [rHDMA5], a

General-purpose DMA can copy 16 bytes every 8 cycles, or ½ cycle per byte. The fastest possible manual copy would be an unrolled series of ld a, [hl+]; ld [bc], a; inc bc which takes a whopping 6 cycles per byte — twelve times slower! This is a neat feature.

FYI, it’s also possible to have a copy done piecemeal during hblanks, though that sounds a bit fragile to me.

Future work

I’ve laid some very basic groundwork here, and there’s plenty more to do, which I will get back to later! It’s just me hacking all this together, after all, and I like flitting between different systems.

I will definitely need to figure out how the heck multiple tilesets work and when they get switched out. How do I even use multiple tilesets, each with its own set of palettes? What’s the workflow if I want to use the same tiles with several different palettes, like how the graveyard in Oracle of Ages is tinted purple? And I didn’t even implement character de-duplication yet… which will require some metadata for each tile… aw, geez.

And I still haven’t fixed the build system! Maybe you can understand why I’m hesitant to impose more structure on this idea quite yet.

To be continued

That brings us to commit 59ff18. Except for a commit about the build that I skipped. Whatever. This post has been a little more draining to write, perhaps because it forced me to confront and explain a bunch of hokey decisions.

Next time: resounding failure!