Tag Archives: gamedev

Gamedev from scratch 1: Scaffolding

Post Syndicated from Eevee original https://eev.ee/blog/2021/01/26/gamedev-from-scratch-1-scaffolding/

Welcome to part 1 of this narrative series about writing a complete video game from scratch, using the PICO-8. This is actually the second part, because in this house (unlike Lua) we index from 0, so if you’re new here you may want to consult the introductory stuff and table of contents in part zero.

If you’ve been following along, welcome back, and let’s dive right in!

← Part 0: Groundwork

Recap and short-term plans

So far, I have… this. Which is something, and certainly much more than nothing, but all told not a lot.

Star Anise walking around the screen and turning to face the way he's moving

Most conspicuously, this is going to be a platformer, so I need gravity. The problem with gravity is that it means things are always moving downwards, and if there’s nothing to stop them, they will continue off indefinitely into the void.

What I am trying to say here is that I feel the looming spectre of collision detection hanging over me. I’m going to need it, and I’m going to need it real soon.

And, hey, that sucks. Collision detection is a real big pain in the ass to write, so needing it this early is a hell of a big spike in the learning curve. Luckily for you, someone else has already written it: me!

Before I can get to that, though, I need to add some structure to the code I have so far. Everything I’ve written is designed to work for Star Anise and only Star Anise. That’s perfectly fine when he’s the only thing in the game, but I don’t expect he’ll stay alone for long! Collision detection in particular is a pretty major component of a platformer, so I definitely want to be able to reuse it for other things in the game. Also, collision detection is a big fucking hairy mess, so I definitely want to be able to shove it in a corner somewhere I’ll never have to look at it again.

A good start would be to build towards having a corner to shove it into.

Adding some structure

As of where I left off last time, my special _update() and _draw() functions are mostly full of code for updating and drawing Star Anise. That doesn’t really sit right with me; as the main entry points, they should be about updating and drawing the game itself. Star Anise is part of the game, but he isn’t the whole game. All that code that’s specific to him should be put off in a little box somewhere. Cats love to be in little boxes, you see.

This raises the question of how I want to structure this project in general. And, I note: structuring a software project is hard, and you only really get a good sense of how to do it from experience. I’m still not sure I have a good sense of how to do it. Hell, I’m not convinced anyone has a good sense of how to do it.

Thankfully, this is a game, so it’s pretty obvious how to break it into pieces. (The tradeoff is that everything in a game ends up entangled with everything else no matter how you structure it, alas.) Star Anise is a separate thing in the game, so he might as well be a separate thing in the code. Later on I’ll need some more abstract structuring, but as an extremely rough guideline: if I can give it a name, it’s a good candidate to be made into a thing.

But what, exactly, is a thing in code? Most commonly (but not always), a thing is implemented with what’s called an object — a little bundle of data (what it is) with code (what it can do). I already have both of these parts for Star Anise: he has data like his position and which way he’s facing, and he has code for doing things like updating or drawing himself. A great first step would be to extract that stuff into an object, after which some other structure might reveal itself.

I do need to do one thing before I can turn get to that, though. You see, Lua is one of the few languages in common use today that doesn’t quite have built-in support for objects. Instead, it has all the building blocks you need to craft your own system for making objects. On the one hand, the way it does that is very slick and clever. On the other hand, it means you can’t write much Lua without cobbling together some arcane nonsense first, and also no one’s code quite works the same way.

Which brings me to the following magnificent monstrosity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function nop(...) return ... end

--------------------------------
-- simple object type
local obj = {init = nop}
obj.__index = obj

function obj:__call(...)
    local o = setmetatable({}, self)
    return o, o:init(...)
end

-- subclassing
function obj:extend(proto)
    proto = proto or {}

    -- copy meta values, since lua doesn't walk the prototype chain to find them
    for k, v in pairs(self) do
        if sub(k, 1, 2) == "__" then
            proto[k] = v
        end
    end

    proto.__index = proto
    proto.__super = self

    return setmetatable(proto, self)
end

How does this work? What does this mean? What is a prototype chain, anyway? Dearest reader: it extremely does not matter. No one cares. I would have to stare at this for ten minutes to even begin to explain it. Every line is oozing with subtlety. To be honest, even though I describe this series as “from scratch”, this is one of the very few things that I copy/pasted wholesale from an earlier game. I know this does the bare minimum I need and I absolutely do not want to waste time reinventing it incorrectly. To drive that point home: I wrote collision detection from scratch, but I copy/pasted this. (But if you really want to know, I’ll explain it in an appendix.)

Feel free to copy/paste mine, if you like. You can also find a number of tiny Lua object systems floating around online, but with tokens at a premium, I wanted something microscopic. This basically does constructors, inheritance, and nothing else.

(Oh, I don’t think I mentioned, but the -- prefix indicates a Lua comment. Comments are ignored by the computer and tend to contain notes that are helpful for humans to follow. They don’t count against the PICO-8 token limit, but they do count against the total size limit, alas.)

The upshot is that I can now write stuff like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
local vec = obj:extend{}

function vec:init(x, y)
    self.x = x or 0
    self.y = y or 0
end

function vec:__add(v)
    return vec(self.x + v.x, self.y + v.y)
end

function vec:__sub(v)
    return vec(self.x - v.x, self.y - v.y)
end

function vec:iadd(v)
    self.x += v.x
    self.y += v.y
end

This creates a… well, terminology is tricky, but I’ll call it a type while doing air-quotes and glancing behind me to see if any Haskell programmers are listening. (It’s not much like the notion of a type in many other languages, but it’s the closest I’m going to get.) Now I can combine an x- and y-coordinate together as a single object, a single thing, without having to juggle them separately. I’m calling that kind of thing a vec, short for vector, the name mathematicians give to a set of coordinates. (More or less. That’s not quite right, but don’t worry about it yet.)

After the above incantation, I can create a vec by calling it like a function. Note that the arguments ultimately arrive in vec:init, loosely called a constructor, which stores them in self.x and self.y — where self is the vec being created.

1
2
3
-- this is example code, not part of the game
local a = vec(1, 2)
print("x = ", a.x, " y = ", a.y)  -- x = 1 y = 2

That iadd thing is a method, a special function that I can call on a vec. It’s like every vec carries around its own little bag of functions anywhere it appears — and since they’re specific to vec, I don’t have to worry about reusing names. (In fact, reusing names can be very helpful, as we’ll see later!)

The name iadd is (very!) short for “in-place add”, suggesting that the first vector adds the second vector to itself rather than creating a new third vector. That’s something I expect to be doing a lot, and making a method for it saves me some precious tokens.

1
2
3
4
5
-- example code
local v = vec(1, 2)
local w = vec(3, 4)
v:iadd(w)
print("x = ", v.x, " y = ", v.y)  -- x = 4 y = 6

Finally, those funny __add and __sub methods are special to Lua (if enchanted correctly, which is part of what the obj gobbledygook does) — they let me use + and - on my vecs just like they were numbers.

1
2
3
4
5
-- example code
local q = vec(1, 2)
local r = vec(3, 4)
local s = q + r
print("x = ", s.x, " y = ", s.y)  -- x = 4 y = 6

This is the core idea of objects. A vec has some data — x and y — and some code — for adding another vec to itself. If I later discover some new thing I want a vec to be able to do, I can add another method here, and it’ll be available on every vec throughout my game. I can repeat myself a little bit less, and I can keep these related ideas together, separate from everything else.

Get the basic jist? I hope so, because I’ve really gotta get a move on here.

Objectifying Star Anise

Now that I have a way to define objects, I can turn Star Anise into one.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
function b2n(b)
    return b and 1 or 0
end

local t = 0
local player

local anise_stand = {1, 2, 17, 18, 33, 34}
local anise_jump = {3, 2, 17, 18, 19, 35}
local anise = obj:extend{
    move = vec(),
    left = false,
}

function anise:init(pos)
    self.pos = pos
end

function anise:update()
    if self.move.x > 0 then
        self.left = false
    elseif self.move.x < 0 then
        self.left = true
    end

    self.pos:iadd(self.move)
end

function anise:draw()
    local pose = anise_stand
    if (self.move.x ~= 0 or self.move.y ~= 0) and t % 8 < 4 then
        pose = anise_jump
    end
    local y = self.pos.y
    local x0 = self.pos.x
    local dx = 8
    if self.left then
        dx = -8
        x0 += 8
    end
    local x = x0
    for i = 1, #pose do
        spr(pose[i], x, y, 1, 1, self.left)
        if i % 2 == 0 then
            x = x0
            y += 8
        else
            x += dx
        end
    end
end

function _init()
    player = anise(vec(64, 64))
end

function _update()
    t += 1
    t %= 120
    player.move = vec(
        b2n(btn(➡️)) - b2n(btn(⬅️)),
        b2n(btn(⬇️)) - b2n(btn(⬆️)))
    player:update()
end

function _draw()
    cls()
    player:draw()
end

What a mouthful! But for the most part, this is the same code as before, just rearranged. For example, the new anise:draw() method has basically been cut and pasted from my old _draw() — all except the cls() call, since that has nothing to do with drawing Star Anise.

I’ve combined the px and py variables into a single vector, pos (short for “position”), which I now have to refer to as self.pos — that’s so PICO-8 knows whose pos I’m talking about. After all, it’s theoretically possible for me to create more than one Star Anise now. I won’t, but PICO-8 doesn’t know that!

A Star Anise object is created and assigned to player when the game starts, and then _update() calls player:update() and _draw() calls player:draw() to get the same effects as before.

I did make one moderately dramatic change in this code. The wordy code I had for reading buttons has become much more compact and inscrutable, and the moving variable is gone. A big part of the reason for this is that I consider Star Anise’s movement to be part of himself, but reading input to be part of the game, so I wanted to split them up. That means moving is a bit awkward, since I previously updated it as part of reading input. Instead, I’ve turned Star Anise’s movement into another vector, which I set in _update() using this mouthful:

1
2
3
4
5
6
7
8
9
-- top-level
function b2n(b)
    return b and 1 or 0
end

-- in _update()
    player.move = vec(
        b2n(btn(➡️)) - b2n(btn(⬅️)),
        b2n(btn(⬇️)) - b2n(btn(⬆️)))

The b2n() function turns a button into a number, and I only use it here. It turns true into 1 and false into 0. Think of it as measuring “how much” the button is held down, from 0 to 1, except of course there can’t be any answer in the middle.

Unpacking that a bit further, b2n(btn(➡️)) - b2n(btn(⬅️)) means “how much we’re holding right, minus how much we’re holding left”. If the player is only holding the right button, that’s 1 – 0 = 1. If they’re only holding the left button, that’s 0 – 1 = -1. If they’re holding both or neither, that’s 0. The results are the same as before, but the code is smaller.

Once Star Anise’s move is set, the rest works similarly to before: I update left based on horizontal movement (but leave it alone when there isn’t anyway), I alter his position (now using :iadd()), and I use the walk animation when he’s moving at all. And that’s it!

From one to many

I like to use the term “actor” to refer to a distinct thing in the game world; it conjures a charming and concrete image of various characters performing on a stage. I think I picked it up from the Doom source code. “Entity” is more common and is used heavily in Unity, but can be confused with an “entity–component–system” setup, which Unity also supports. And then there are heretics who refer to game things as “objects” even though that’s also a programming term.

This code is a fine start, but it’s not quite what I want. There’s nothing here actually called an actor, for starters. My setup still only works for Star Anise!

I’d better fix that. The notion of an “actor” is pretty vague, so a generic actor won’t do much by itself, but it’s nice to define one as a template for how I expect real actors to work.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
local actor = obj:extend{}

function actor:init(pos)
    self.pos = pos
end

function actor:update()
end

function actor:draw()
end

How does a blank actor update or draw itself? By doing nothing.

(I do assume that every actor has a position; this may not necessarily be the case in games with very broad ideas about what an “actor” is, but it’s reasonable enough for my purposes.)

Now, to link this with Star Anise, I’ll have anise inherit from actor. That means he’ll become a specialized kind of actor, and in particular, all the methods on actor will also appear on anise. You may notice that anise was previously a specialized kind of obj (like actor and vec) — in fact, the only reason I can call vec(x, y) like a function is that it inherits some magic stuff from obj. Surprise!

1
local anise = actor:extend{

I can now delete anise:init(), since it’s identical to actor:init(). I still have anise:update() and anise:draw(), which override the methods on actor, so those don’t need changing.

Everything still only works for Star Anise, but I’m getting closer! I only need one more change. Instead of having only player, I will make a list of actors.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
-- at the top
local actors = {}

function _init()
    player = anise(vec(64, 64))
    add(actors, player)
end

function _update()
    -- ...mostly same as before...
    for actor in all(actors) do
        actor:update()
    end
end

function _draw()
    cls()
    for actor in all(actors) do
        actor:draw()
    end
end

This does pretty much what it reads like. The add() function, specific to PICO-8, adds an item to the end of a list. The all() function, also specific to PICO-8, helps go through a list. And the for blocks mean, for each thing in this list, run this code.

Now, at last, I have something that could work for actors other than Star Anise. All I need to do is define them and add them to the actors list, and they’ll automatically be updated and drawn, just like him!

Admittedly, this hasn’t gotten me anywhere concrete. The game still plays exactly the same as it did when I started. I’m betting that I’ll eventually have more than one actor, though, so I might as well lay the groundwork for that now while it’s easy. It doesn’t take much effort, and I find that if I give myself little early inroads like this, it feels like less of a slog to later come back and expand on the ideas. This is the sort of thing I meant by more structure revealing itself — once I have one actor, a natural next step is to allow for several actors.

Preparing for collision detection

I’ve put it off long enough. I can’t avoid it any longer. But it’s complicated enough to deserve its own post, so I don’t quite want to do it yet.

Instead, I’ll write as much code as possible except for the actual collision detection. There’s a bit more work to do to plug it in.

For example: what am I going to collide with? The only thing in the universe, currently, is Star Anise himself. It would be nice to have, say, some ground. And that’s a great excuse to toodle around a bit in the sprite editor.

A set of simple ground tiles, drawn in the PICO-8 sprite editor

I went through several iterations before landing on this. Star Anise lives on a moon, so that was my guiding principle. The moon is gray and dusty and pitted, so at first I tried drawing a tile with tiny craters in it. Unfortunately, that was a busy mess to look at when tiled, and I didn’t think I’d have enough tile space for having different variants of tiles. I’m already using 9 tiles here just to have neat edges.

And so I landed on this simple pattern with just enough texture to be reminiscent of something, which is all you really need with low-res sprite art. It worked out well enough to survive, nearly unchanged, all the way to the final game. It was inspired by a vague memory of Starbound’s moondust tiles, which I was pretty sure had diagonal striping, though I didn’t actually look at them to be sure.

You may notice I drew these on the second tab of sprites. I want to be able to find tiles quickly when drawing maps, so I thought I’d put “terrain” on a dedicated tab and reserve the first one for Star Anise, other actors, special effects, and other less-common tiles. That turned out to be a good idea.

You may also notice that one of those dots on the middle right is lit up. How mysterious! We’ll get to that next time.

With a few simple tiles drawn, I can sprinkle a couple in the map tab. I know I want Metroid-style discrete screens, so I’m not worried about camera scrolling yet; the top-left corner (16×16 tiles) is enough to play with for now.

I draw two rows of tiles at the bottom of that screen. It’s a little hard to gauge since the toolbar and status bar get in the way, but the bottom row of the screen will be at y = 15. You can also hold Spacebar to get a grid, with squares indicating every half-screen.

PICO-8's map editor, showing two rows of moon tiles

Finally, to make this appear in the game, I need only ask PICO-8 to draw the map before I draw actors on top of it.

1
2
3
4
5
6
7
function _draw()
    cls()
    map(0, 0, 0, 0, 32, 32)
    for actor in all(actors) do
        actor:draw()
    end
end

The PICO-8 map() function takes (at least) six arguments: the top-left corner of the map to start drawing from, measured in tiles; the top-left corner on the screen to draw to, measured in pixels; and the width/height of the rectangle to draw from the map, measured in tiles. This will draw a 32×32 block of tiles from the top-left corner of the map to the top-left corner of the screen.

Of course, with no collision detection, those tiles are nothing more than background pixels, and the game treats them as such.

Star Anise standing in front of the moon tiles

No problem. I can fix that. Sort of.

Not quite collision detection

I’m not going into collision detection yet, but I can give you a taste, to give you an idea of the goals.

The core of it comes down to this line, from the end of anise:update().

1
    self.pos:iadd(self.move)

That moves Star Anise by one pixel in each direction the player is holding. What I want to do is stop him when he hits something solid.

Hm, sounds hard. Let’s think for a moment about a simpler problem: how can I stop him falling through the ground, in the dumbest way possible?

The ground is flat, and it takes up the bottow two rows of tiles. That means its top edge is 14 tiles, or 112 pixels, below the top of the screen. Thus, Star Anise should not be able to move below that line.

But wait! Star Anise’s position is a single point at his top left, not even inside his helmet. What I really want is for his feet to not pass below that line, and the bottom of his feet is three tiles (24 pixels) below his position. Thus, his position should not pass below y = 112 – 24 = 88.

That sounds doable.

1
2
3
4
    self.pos:iadd(self.move)
    if self.pos.y > 88 then
        self.pos.y = 88
    end

And sure enough, it works!

Star Anise walking through the air, but not through the floor

This isn’t going to get us very far, of course. He still walks through the air, he can still walk off the screen, and if I change the terrain then the code won’t be right any more. I’m also pretty sure I didn’t actually write this in practice. But hopefully it gives you the teeniest idea of the problem we’re going to solve next time.

Part 2: Collision → (coming soon!)

Appendix: the Lua object model

Really, really, really quickly, here’s how that obj snippet works.

Lua’s primary data structure is the table. It can be used to make ordered lists of things, as I did above with actors, but it can also be used for arbitrary mappings. I can assign some value to a particular key, then quickly look that key up again later. Kind of like a Rolodex.

1
2
3
4
5
local lunekos = {
    anise = "star anise is the best",
    purrl = "purrl is very lovely",
}
print(lunekos['anise'])

Note that the values (and keys!) don’t have to be strings; they can be anything you like, even other tables. But for string keys, you can do something special:

1
print(lunekos.anise)  -- same as above

Everywhere you see a dot (or colon) used in Lua, that’s actually looking up a string in a table.

With me so far? Hope so.

Any Lua table can also be assigned a metatable, which is another table full of various magic stuff that affects the first table’s behavior. Most of the magic stuff takes the form of a special key, starting with two underscores, whose value is a function that will be called in particular circumstances. That function is then called a metamethod. (There’s a whole section on this in the Lua book, and a summary of metamethods on the Lua wiki.)

One common use for metamethods is to make normal Lua operators work on tables. For example, you can make a table that can be called like a function by providing the __call metamethod.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
local t = {
    stuff = 5678,
}
local meta = {
    -- this is just a regular table key with a function for its value
    __call = function(tbl)
        print("my stuff is", tbl['stuff'])
    end,
}
setmetatable(t, meta)
t()  -- my stuff is 5678
t['stuff'] = "yoinky"
t()  -- my stuff is yoinky

One especially useful metamethod is __index, which is called when you try to read a key from the table, but the key doesn’t exist.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
local counts = {
    apples = 5,
    bananas = 3,
}
setmetatable(counts, {
    __index = function(tbl, key)
        return 0
    end,
})
print(counts.bananas)  -- 3
print(counts.mangoes)  -- 0
print(counts.apples)  -- 5

Instead of a function, __index can also be another (third!) table, in which case the key will be looked up in that table instead. And if that table has a metatable with an __index, Lua will follow that too, and keep on going until it gets an answer.

This is essentially what’s called prototypical inheritance, as seen in JavaScript (and more subtly in Python): an object consists of its own values plus a prototype, and if code tries to fetch something from the object that doesn’t exist, the prototype is checked instead. Since the prototype might have its own prototype, the whole sequence is called the prototype chain.

That’s all you need to know to follow the obj snippet, so here it is again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
function nop(...) return ... end

local obj = {init = nop}
obj.__index = obj

function obj:__call(...)
    local o = setmetatable({}, self)
    return o, o:init(...)
end

-- subclassing
function obj:extend(proto)
    proto = proto or {}

    -- copy meta values, since lua doesn't walk the prototype chain to find them
    for k, v in pairs(self) do
        if sub(k, 1, 2) == "__" then
            proto[k] = v
        end
    end

    proto.__index = proto
    proto.__super = self

    return setmetatable(proto, self)
end

The idea is that types are used both as metatables and prototypes — they are always their own __index. At first, we have only obj, which looks like this:

1
2
3
4
5
6
local obj = {
    init = nop,
    __index = obj,
    __call = function() ... end,
    extend = function() ... end,
}

Now we use obj:extend{} to create a new type. Follow along and see what happens. Lua only looks for metamethods like __call directly in the metatable and ignores __index, so I copy them into the new prototype. Then I make the prototype its own __index, as with obj, and also remember the “superclass” as __super (though I never end up using it). Finally I set the “superclass” as the prototype’s metatable.

(Oh, by the way: in Lua, if you call a function with only a single table or string literal as its argument, you can leave off the parentheses. So foo{} just means foo({}).)

That produces something like the following, noting that this is not quite real Lua syntax:

1
2
3
4
5
6
7
local vec = {
    __index = vec,
    __super = obj,
    __call = obj.__call,

    METATABLE = obj,
}

Remember this syntax?

1
2
3
4
function vec:init(x, y)
    self.x = x or 0
    self.y = y or 0
end

That is exactly equivalent to:

1
2
3
4
vec.init = function(self, x, y)
    self.x = x or 0
    self.y = y or 0
end

So after all is said and done, we have:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
local vec = {
    __index = vec,
    __super = obj,
    __call = obj.__call,

    init = function() ... end,
    __add = function() ... end,
    __sub = function() ... end,
    iadd = function() ... end,

    METATABLE = obj,
}

Now for the magic part. When I call vec(), Lua checks the metatable. (The __call in the main table does nothing!) The metatable is obj, which does have a __call, so Lua calls that function and inserts vec as the first argument. Then obj.__call creates an empty table, assigns self (which is the first argument, so vec) as the empty table’s metatable, and calls the new table’s init method.

Ah, but the new table is empty, so it doesn’t have an init method. No problem: it has a metatable with an __index, so Lua consults that instead. The metatable’s __index is vec, and vec does contain an init, so that’s what gets called. (If there were no vec.init, then Lua would see that vec also has a metatable with an __index, and continued along. That’s why I didn’t need an anise.init.)

That’s also why defining vec:__add works — it puts the __add metamethod into vec, which becomes the metatable for all vector objects, thus automatically making + work on them.

That’s all there is to it. It’s possible to get much more elaborate with this in a number of ways, but this is the bare minimum — and it could still be trimmed down further.

Note that you can’t actually call obj itself. Pop quiz: why not?

Gamedev from scratch 0: Groundwork

Post Syndicated from Eevee original https://eev.ee/blog/2020/11/30/gamedev-from-scratch-0-groundwork/

You may recall that I once had the ambitious idea to write a book on game development, walking the reader through making simple games from scratch in a variety of different environments, starting from simple level editors and culminating in some “real” engine.

That never quite materialized. As it turns out, writing a book is a huge slog, publishers want almost all of the proceeds, and LaTeX is an endless rabbit hole of distractions that probably consumed more time than actually writing. Also, a book about programming with no copy/paste or animations or hyperlinks kind of sucks.

I thus present to you Plan B: a series of blog posts. This is a narrative reconstruction of a small game I made recently, Star Anise Chronicles: Oh No Wheres Twig??. It took me less than two weeks and I kept quite a few snapshots of the game’s progress, so you’ll get to see a somewhat realistic jaunt through the process of creating a small game from very nearly nothing.

And unlike your typical programming tutorial, I can guarantee that this won’t get you as far as a half-assed Mario clone and then abruptly end. The game has original art and sound, a title screen, an ending, cutscenes, dialogue, UI, and more — so this series will necessarily cover how all of that came about. I will tell you why I made particular decisions, mention planned features I cut, show you the tradeoffs I made, and confess when I made life harder for myself. You know, all the stuff you actually go through when doing game development (or, frankly, any kind of software development).

The target audience is (ideally) anyone who knows what a computer is, so hopefully you can follow along no matter what your experience level. Enjoy!


This is part zero, and it’s mostly introductory stuff. Please don’t skip it! I promise there’s some meat in the latter half.

Table of contents

Here’s what you have to look forward to (though it is of course a WIP until the series is done). Occasionally there’ll be a snapshot of the game, but these were made on a whim during development and aren’t particularly meaningful as milestones.

For reference, I started working on the game the morning of April 29, and I released it the night of May 10, for a total of twelve days.

  • Part 0 (you are here): introduction, tour of PICO-8, putting something on the screen, moving around, measuring time, simple sprite animation

Introduction

This is not a tutorial. Please set your expectations accordingly. Honestly, I don’t even like tutorials — too many of them are framed as something that will teach you a skill, but then only tell you what buttons to press to recreate what the author already made, with no insight as to why they made their decisions or even why they pressed those particular buttons. They often leave you hanging, with no clear next steps, no explanation of what to adjust to get different results.

I’ve never seen a platformer tutorial that actually produced a finished game. Most of them give you just enough to have a stock sprite (poorly) jump around on the screen, perhaps collect some coins, and that’s it. How do you fix the controls, add cutscenes, even make a damn title screen? That’s all left up to you.

This is something much better than a tutorial: a story. I made a video game — a real, complete video game — and I will tell you everything I can remember doing and thinking along the way. Every careful decision, every rushed tradeoff, every boneheaded mistake, every weird diversion. I don’t guarantee that anything I did is necessarily a good idea, but everything I did is an idea, and sometimes that’s all you need to get the gears turning.

If you’re interested in making a video games, I don’t promise that this series will teach you anything. But with a little effort, you can probably learn something. And to be frank, if you’re starting with zero knowledge but still manage to muddle through the whole series, you’ve got more than enough curiosity and determination to succeed at whatever you feel like doing.

The game in question is Star Anise Chronicles: Oh No Wheres Twig??, which I made with the PICO-8. (If you are from the future, I specifically used version 0.2.0i; later versions may have added conveniences I’m not using.) This is not a whizbang fully-featured game engine like Godot or Unity. If I want to draw something, I have to draw it myself. If I want physics, I have to write them myself. If I want shaders… well, that’s not going to happen, but a little ingenuity can still go a long way.

And that kind of ingenuity is what makes game development appealing to me in the first place. It’s one big puzzle: given the tools I have, what’s the most interesting thing I can make with the least amount of hapless flailing? That question will come up a number of times in this series.

If any of this sounds appealing to you, keep reading! Follow along if you can. You can get the PICO-8 (tragically not open source) for $15, and chances are you already own it — it was in the itch.io BLM bundle, so if you bought that, you’re free to download it whenever you want.

Conventions

In order to replicate the experience of reading the book, I’m porting these little “admonition” boxes from what I’d started. I have a somewhat meandering writing style, and hopefully these will help get tangents out of the main text, while also better highlighting warnings and gotchas.

Here they are, in no particular order:

I reserve the right to invent more, if they’re needed and/or funny.

Setting expectations, again

Game development is about a lot more than programming, but this will contain an awful lot of programming. The PICO-8 in particular tends to blur the lines between code and assets if you want to do anything fancy.

That puts me in a tricky position as an author. I want this to be accessible to people with little or no programming experience, but I can’t realistically explain every single line of code I write, or this series will never end (and will be more noise than signal for intermediate programmers).

Thus, I’m trusting you to look up basic concepts on your own if you need to. I’m writing this to fill a perceived gap, so I’ll try to focus on the gaps — finding resources on from-scratch collision detection is a crapshoot, but the web is awash in explanations of what a “variable” is. PICO-8 uses a programming language called Lua which is pretty simple and easy to pick up, so if you’re having trouble, maybe thumb through the Programming in Lua book a bit too.

Of course, if you’re just here for the ride and not too worried about writing your own game, you can skip ahead whenever you like. I’m not your mom.

(Oh, and if you’ve used Lua before, you should know that PICO-8’s Lua has been modified from stock Lua. The precise list of changes would be a big block of stuff in the middle of this already too long intro, so I’ve put it at the bottom. The upshot is: numbers are fixed-point instead of floating-point, you can use compound assignment, and the standard library is almost completely different.)

That’s probably enough words with no pictures. Time to get started.

The PICO-8

A fresh PICO-8 window, with white old-school text on a small black screen and a command prompt

As mentioned, this is a game built with the PICO-8. I promised I’d tell you a story, but I can’t even explain why I chose PICO-8 if you don’t know what the thing is.

PICO-8 is a “fantasy console” — a genre that it pioneered. It has a fixed screen size, its own palette, its own font, a little chiptune synthesizer, its own idea of what buttons the player can press, and so on. It’s like an emulator for an 8-bit handheld that doesn’t actually exist, plus a bunch of relatively friendly tools for making cartridges for that handheld. It even has some arbitrary limitations to preserve that aesthetic. (I carefully avoid calling them artificial limitations, because there are some technical reasons for them, and a lot of programmers do a thing with their face if you say “artificial” to them. Like you’ve just spat in their lunch.)

If you’ve got PICO-8 open, you can type splore at this little command prompt to open the cartridge explorer, which lets you download and play cartridges that have been posted to the PICO-8 BBS (forum). You might want to try a few to get a sense of what the PICO-8 can do, though bear in mind that some of the best games are incredible feats of ingenuity and not representative. A good place to start is the “featured” tab, which lists games that… I believe have been hand-picked as high-quality? Some suggestions:

  • Star Anise Chronicles: Oh No Wheres Twig is in there, as is our older (and first!) game Under Construction.

  • The original PICO-8 version of Celeste, if you weren’t aware of its origins.

  • Dusk Child, one of the earliest games I played and a big inspiration — it’s pretty and expansive, but doesn’t do anything I couldn’t figure out.

  • Just One Boss, which is just so damn crisp.

  • Dank Tomb, a dungeon crawler with absolutely beautiful lighting effects.

  • PicoHot, which is absolute fucking nonsense how dare you.

Note that when playing most games, the PICO-8 functions as though it only had six buttons: a directional pad bound to the arrow keys, and “O” and “X” buttons bound to the Z and X keys. Most games refer to those buttons by name (the PICO-8 font has built-in symbols for them) rather than keyboard key, since you might be playing on a controller or with some other bindings. You can always press Esc for the built-in menu.

Had fun? Great! Pressing Esc takes you back to the prompt. From there, you can press Esc again to switch to the editor (and vice versa).

Now, this is not a PICO-8 tutorial. But the PICO-8’s design and constraints immensely impact how much I could do and how I planned to do it, so I can’t very well explain my thought process without that context. Luckily, all the code and assets for the last game you played stay loaded, so I might as well give you the whirlwind tour. Even if you’re not following along with an actual copy of PICO-8, you should keep reading so you understand what I’ve got to work with.

Code editor

A very small text editor, populated with code

This is the code editor, a very tiny text editor. If you’ve loaded Under Construction, feel free to page through and see what I did. (Keyboard shortcuts help a lot; see the manual for a full list of them. There are also some cheat sheets floating around, though they focus more on programming capabilities.)

You may have noticed the ominous 7695/8192 in the bottom right. That’s hinting at one of the PICO-8’s limitations: the token count. A cartridge’s source code cannot exceed 8192 tokens, or it will not run at all. A “token” is, in general terms, a single “word” of code — a number like 133, a name like animframedelay, an operator like +, a keyword like function, and so on. The term “token” is borrowed from the field of parsing, which is an entire tangent you are free to look up yourself.

The PICO-8’s definition of “token” is slightly different from its typical usage and includes a few exceptions. The common Lua keywords local and end don’t count at all; nor do commas, periods, semicolons, or comments. A string of any length is one token. A pair of parentheses, brackets, or braces only counts as one token. Negative literal numbers (e.g., -25) are one token.

The token limit is the most oppressive of the limits on your code, but there are two others. The full size of your code cannot exceed 64KiB, though in practice I’ve never come anywhere near that size and I think you’d only approach it if you were committing some serious shenanigans. More of concern, the compressed size of your code cannot exceed 15,616 bytes. I do wind up battling that one near the end of this project (as I did with Under Construction), and it can be extra frustrating since it’s hard to gauge exactly what impact any particular change will have on compression. Thankfully, and unlike with the token limit, the PICO-8 will still run a game that’s over the compressed size; it just physically cannot export it to a cartridge.

Incidentally, you can use Alt and an arrow key to move between the editors.

Sprite editor

A very small sprite editor, showing the mole player character from Under Construction

Here we have a tiny pixel art editor. As you might have guessed, the “native” size for a tile is 8 × 8 pixels, though you can use the bottom of the two sliders to edit bigger blocks of tiles at a time. (The screen is 128 × 128 pixels, or 16 × 16 tiles.) You have at your disposal a spritesheet of 256 such tiles, which are arranged at the bottom of the screen in four tabs of 64 tiles each. 001 here is the tile number. Each tile has its own set of 8 flags you can toggle on and off, which are represented by the eight circles just above the tabs; here, all the flags are off. The flags do nothing by themselves, but you can use them for whatever you like, and they turn out to be pretty handy.

The palette is 16 colors, as shown. There are 16 more colors on the “secret palette” which I’ll be dipping into later, but you can only swap them in; you can never have more than 16 distinct colors on screen at the same time. This is reminiscent of how some early systems actually worked.

Map editor

A very small map editor, showing the upper left of a cave-like area from Under Construction

The map editor edits the map. You only get one; if you want to carve it up somehow, that’s up to you. It’s extremely simple: you have a grid of 128 × 64 tiles (that’s 8 × 4 screenfuls), and you can pick which tile goes in each cell. No layers, no stacking, no two things in the same cell. You can pan around with the middle mouse button and zoom with the mouse wheel (or check the manual for the keyboard equivalents).

The especially nice thing about the map is that you can draw entire blocks of it with the built-in map function, which saves a whole lot of tokens over drawing a bunch of tiles by hand. Even if you’re making a game that doesn’t have a literal map, it’s a convenient way to define and draw blocks of multiple tiles.

The catch is that the bottom half of the spritesheet and the bottom half of the map are shared, so you can’t actually have a full map and a full set of tiles in the same cartridge. You could have a full 8 × 4 map and 128 tiles, or you could have a full set of 256 tiles but only an 8 × 2 map, or you can split the space up somehow, but you can’t have the maximum of both. Drawing in the bottom half of one will immediately update the other with garbage. It’s beautiful, actually, if you’re into the aesthetic of arbitrary memory being drawn as tiles.

If you have a cartridge open, you can see this yourself: check out the bottom half of the map (it helps to use Tab or the buttons in the upper left to hide the tile palette) and tabs 2 and 3 of the sprite editor. If they’re not both completely empty, something will be full of garbage. Try drawing in one or the other, if you like, and you’ll see the other update with junk. That’s the memory layout of pixel data being interpreted as map data, or vice versa. Cool, right?

Sound editor

A very small sound editor, showing a sound as bars representing pitch
The same sound, but shown using a tracker-like interface

The sound editor (or SFX editor) does a lot, despite being very simple conceptually, and it can be a little intimidating if you’ve never worked with sound or music before. These screenshots are the two display modes, “pitch mode” and “tracker mode” — allegedly pitch mode is more suitable for sound effects and tracker mode is more suitable for music, but I honestly have no idea how anyone does anything in pitch mode, and I use tracker mode for both. Your mileage may vary. As with the map editor, use Tab or the buttons in the top-left to switch views.

There are 64 sound effects to work with, each consisting of 32 notes played by a little chiptune synth. Notes consist of a pitch (i.e., the actual note being played), an instrument, the volume, and an optional effect.

I could say an awful lot about sound and chiptunes and what any of this means, but this is not a chiptuning tutorial, so I’ll save that for when I actually made some sounds for the game. Do feel free to mess around here, though.

There’s also a music editor, but all it does is arrange several sound effects to play at the same time, so it’s not especially interesting.

And that’s everything at my disposal! I guess that means it’s time to get started, for real. Go back to the command prompt and use reboot to get a fresh blank cartridge, if you’re planning on following along.

Inspiration

The first step to making a game is having a game you want to make.

I started on this at the end of April, after a very rushed month spent preparing the Steam release of Cherry Kisses. I was pretty pumped about having just published something in a very visible place for the first time, and I wanted to keep that energy going, but I didn’t want to immediately jump into an even larger thing. I wanted to make something small, something self-contained, something I could do entirely on my own. (My spouse is the better artist by far, and they did all the art for Cherry Kisses.)

The PICO-8 came to mind as the obvious platform to use. For one, the limitations make it very difficult for a game’s scope to balloon very far; you will simply run out of space and have to cut some ideas. For two, the art and audio are fairly low-resolution, so I wouldn’t have much opportunity to endlessly fuss over trying to make them perfect. For three, it runs in a browser, even on phones, so the resulting game would be easy for anyone to play. (Having to download a thing will discourage a surprising amount of casual passersby, especially if the thing is fairly small and thus low-reward.)

I also just find the PICO-8 endlessly charming, and I hadn’t touched it in a couple years and was curious how it had improved in the interim. It’s great for a game started on a whim, too, since I can jump in and start slapping stuff on the screen without worrying that my ADHD brain will start fretting over how everything should be organized.

That only left the question of what to make.

Two and a half years prior — almost three, now — I’d started on a platformer where you played as Star Anise, my cat’s fursona. It was intended to be a goofy Metroidvania where you collected cat-themed powers, ran around defeating little monsters, collected useless garbage, and generally left a trail of minor mayhem in your wake. Sadly, it was interrupted by real-life events and we haven’t touched it since.

A clip of a pastel game where a small cat meows loudly and shoots a bubble gun that knocks jars off of shelves.

I loved how this game was shaping up! It was so goofy, but its goofiness really opened up the design. Star Anise is great to build a game around. I can give him all manner of strong yet absurd motivations, and as long as I tie them to something vaguely cat-themed, they’ll be memorable and feel sensible. I can load him up with goofy cat-themed powers without needing any kind of justification, because he’s a cat, and everyone knows cats are basically magic anyway. He has a group of friends already built in: other cats. And most importantly, he’s just fun to play as, because everything he does is ridiculous and overboard, but you never have to feel guilty about his mischief because he’s a cat.

It’s such a good hook. I’ve wanted to make a whole series of little Star Anise games, but the furthest I’d gotten so far was Star Anise Chronicles: Escape from the Chamber of Despair — which is good, but is also a text adventure, one of the most impenetrable genres imaginable.

So why not take another crack at it? I couldn’t fit the entire original vision into a PICO-8 game, but surely I’d have enough room for Star Anise, a few of the abilities we’d come up with, and some things to interact with. At long last, a Star Anise platformer.

You could say the stars aligned. The stars. Get it? Like Star Anise. Okay.

From zero to something

Before I could do anything, I needed some art. Okay, that’s not true; I could have boxes moving around on the screen, but I’ve done this enough that I am beyond tired of boxes. If I’m gonna make a Star Anise game then I want to have Star Anise on the damn screen right from the start.

And right away I had to make some decisions. I wanted this to be a little bit Metroidvania style, where Star Anise gained his handful of powers throughout the game and could then explore new areas.

That meant I wanted as much map space as humanly possible, so from the very beginning I knew the sprite/map split I wanted: all map. 32 screens, but only 128 sprites.

And that made several other decisions, automatically. I probably wouldn’t have enough sprite space to include a gun and enemies and whatnot, but a puzzler would let me skip all of that.

This is why I chose PICO-8! The game basically decided its own design with only minimal input from me. Puzzle platformer with some powerups.

Now, to draw Star Anise, which meant deciding how big he should be. A very conspicuous part of his design is his huge helmet, which wouldn’t fit especially well in a single 8×8 tile, or even in two of them stacked. I decided to go one bigger and make a 2×3 block.

A charming little Star Anise sprite, with some extra bits next to him

This wasn’t especially complicated to draw. At this size, it feels like a lot of the sprites draw themselves, too. It did help that I’d already seen my spouse’s interpretation of Star Anise from the prototype game above, but I think the general lesson there is to look at existing art that’s similar to what you want to draw and reverse-engineer the bits that make it work. Here, I made a big circle, squeezed in the narrowest possible face — a pixel each for the eyes, then three pixels for spacing — and gave him a rectangle for his body. Toss a couple stars into the inside of the helmet and, presto, that’s Star Anise.

You might be wondering about those weird extra tiles on the side! I’ll get to those in a moment.

With Star Anise drawn, the obvious first thing is to put him on the dang screen.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function _init()
end

function _update()
end

function _draw()
    cls()
    spr(1, 64, 64, 2, 3)
end

Some explanation may be in order. For starters, a “function” is a block of code that can be used repeatedly. (But then, this is not a programming tutorial.) These particular functions are special to the PICO-8: _init runs when the cartridge starts, _update runs every frame, and _draw also runs every frame.

What’s a frame, you ask? Well, you know how movies aren’t really showing movement, but are more like a very fast slideshow? Real life is “continuous” — that is, events occur smoothly over time, so when an object moves, it goes through every point between where it started and where it ends up. But we have no way to record that motion in full, becuase that would be an infinite amount of information! The best we can do is take a lot of snapshots very close together. And it turns out our eyes also work with snapshots (more or less), so it works well enough.

Likewise, simulating continuous behavior is extremely difficult, so video games tend to cheat the same way. We slice time into thin chunks — also called frames — and during each one, we move everything in the world ahead by that amount of time. If frames are short enough, you get the illusion that the world is behaving smoothly. Surprise! It’s all fake.

Modern games can (or should) deal with a varying frame rate, where each frame is a slightly (or greatly) different duration for any of myriad reasons. Since the PICO-8 is a faux-retro console, I’ll be using the retro term tic. It means the same thing, but it’s sometimes used for older systems where the framerate is reliably fixed, usually because it’s tied to (or even enforced by) hardware somewhere. Here it’s just emulated, but, you know, close enough.

Right, so, back to the PICO-8 itself. Every tic (of which there are 30 per second), the PICO-8 does two things: it calls _update to advance the game, then it calls _draw to draw the new state of the game to the screen. You might immediately wonder: why have these be separate if they happen one after the other anyway? Great question! The answer is that the PICO-8 does something clever — if it notices that the _update + _draw combination is taking longer than one tic (and the game is thus starting to lag), it will automatically drop down to 15 FPS. In this mode, it will call _update twice and then call _draw. Here is a terrible ASCII diagram.

1
2
3
4
5
        | tic                   | tic                   |
--------+-----------------------+-----------------------+
30 FPS: | _update() _draw()     | _update() _draw()     |
--------+-----------------------+-----------------------+
15 FPS: | _update() _update() _draw()                   |

As you can see, the game still updates twice in the same amount of time, so it still runs at the same speed, but it only draws half as often. With any luck, that saves enough effort that the game can keep running at the intended speed.

All of that is to say: the _draw function draws to the screen.

The first thing you (usually) want to do in _draw is clear the screen, which is accomplished by the charmingly terse cls(). If you don’t do this, your game will merrily draw right on top of whatever was on the screen previously: the prompt, a previous game, even the code editor.

After that, I called spr() to draw Star Anise. The usual arguments are spr(n, x, y), where n is the sprite number (visible near the middle of the screen in the sprite editor) and x, y say where to place him. He’s made up of six tiles, and you might think that drawing six tiles would thus require calling spr() six times, but it helpfully takes two more optional arguments: how many tiles to draw, as a single rectangle taken from the spritesheet. The above code thus draws a 2-by-3 block of tiles, starting from tile 1, at the coordinates (64, 64) — the center of the screen.

As is programming tradition, sprites are drawn from their top-left corner, so the initial tile is the top-left of the rectangle that gets drawn, and the coordinates are where the top-left of the drawn rectangle appears on screen. Thus, Star Anise appears with his top left “corner” in the middle of the screen.

Star Anise standing near the middle of the screen, as promised

There he is! How immensely satisfying. I always try to get something “real” drawing as early as humanly possible. It helps me feel like I’ve made some progress, like I’m working on a specific game and have made steps towards making it exist. This is already, quite clearly, a Star Anise game, but that wouldn’t be obvious if I’d started out with rectangles.

Now what? A good start would be to have him move around a bit. That’s easy enough if I introduce some state.

I do need to check what buttons the player is pressing, which I can do with btn(b), where b is the button… number. Left is button 0, right is button 1, up is button 2… but that makes for some unreadable garbage, so instead, let’s use a recently-introduced shortcut. If you hold Shift and press U, D, L, R, O, or X, the PICO-8 will insert a symbol representing that button. (I will be representing those symbols as ⬆️⬇️⬅️➡️🅾️❎, which is how the PICO-8 stores them on disk.)

That’s enough to move him around:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
function _init()
end

local px = 64
local py = 64

function _update()
    if btn(⬆️) then
        py -= 1
    end
    if btn(⬇️) then
        py += 1
    end
    if btn(⬅️) then
        px -= 1
    end
    if btn(➡️) then
        px += 1
    end
end

function _draw()
    cls()
    spr(1, px, py, 2, 3)
end

Here I’ve put his position (still anchored at his top-left) into some variables, and during _update() I update them. (If you’re familiar with Lua, you may balk at += and -= — these are extensions added by PICO-8, and they save enough space that they’re definitely worth it.)

Star Anise sliding around the screen

This is already halfway to being a game — it does something when I press buttons! Excellent. But also weird. This doesn’t look like Star Anise is walking around; it looks like he’s a static image being dragged by an invisible cursor or something. A very easy aesthetic improvement would be to make him not moonwalk when moving left.

That’s easy enough; the spr() function takes two more optional arguments, indicating whether to flip the sprite horizontally and/or vertically. I can just slap those in when he’s moving left. Or, well, not quite — I want to flip him when the last direction he moved was left. If he moves left and then stops, or moves left and then up and down, he should still be facing left.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function _init()
end

local px = 64
local py = 64
local left = false

function _update()
    if btn(⬆️) then
        py -= 1
    end
    if btn(⬇️) then
        py += 1
    end
    if btn(⬅️) then
        px -= 1
        left = true
    end
    if btn(➡️) then
        px += 1
        left = false
    end
end

function _draw()
    cls()
    spr(1, px, py, 2, 3, left)
end
Star Anise sliding around the screen, but turning around when moving left

Making progress, but obviously he’d look a lot better if he were animated, right?

Which, finally, brings us back to those extra tiles I drew. They’re copies of Star Anise’s legs and antenna, lightly edited to look like he’s in mid-step. The legs are sticking out all the way, and the antenna is adjusted to be… positioned slightly differently, since it’s bouncy. It’s a bit rough, but I can touch it up later.

Star Anise's walk animation

Note that I’ve crammed as much movement into as little space as possible here. This is only a two-frame animation, so the leg movement is exaggerated to get the most bang for my buck. I don’t even duplicate the entirety of Star Anise for the other frame; instead, I only copied the tiles that change. That’ll make him more complicated to draw, but it does save me sprite space — remember, I only have 127 tiles available, and 9 of them is already 7% gone. (Writing more code to save on limited asset space is, in my experience, a pretty common PICO-8 tactic.)

Unfortunately, this makes flipping his sprite somewhat more complicated. I can’t just use that argument to spr(), because— well, I’ll get to that in a second. Here’s the updated code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
local anise_stand = {1, 2, 17, 18, 33, 34}
local anise_jump = {3, 2, 17, 18, 19, 35}

function _init()
end

local t = 0
local px = 64
local py = 64
local left = false
local moving = false

function _update()
    t += 1
    t %= 120

    moving = false
    if btn(⬆️) then
        py -= 1
        moving = true
    end
    if btn(⬇️) then
        py += 1
        moving = true
    end
    if btn(⬅️) then
        px -= 1
        moving = true
        left = true
    end
    if btn(➡️) then
        px += 1
        moving = true
        left = false
    end
end

function _draw()
    cls()

    local pose = anise_stand
    if moving and t % 8 < 4 then
        pose = anise_jump
    end
    local y = py
    local x0 = px
    local dx = 8
    if left then
        dx = -8
        x0 += 8
    end
    local x = x0
    for i = 1, #pose do
        spr(pose[i], x, y, 1, 1, left)
        if i % 2 == 0 then
            x = x0
            y += 8
        else
            x += dx
        end
    end
end

That sure got longer in a hurry! A quick overview:

I’ve introduced a global called t to act as a clock. I intend to use this for animation and other global cycles, so I don’t care about the actual time — that’s why I take it mod 120.

If you’re not familiar, the % (or “modulus”) operator gives you the remainder after division. It’s super duper useful and I wish we taught it as a primitive math operation! You can think of it like “clock arithmetic” — if it’s 9 o’clock and you wait 4 hours, it becomes 1 o’clock, which is the remainder when you divide 9 + 4 by 12. Or you can think of it as removing all chunks of something — to convert the 24-hour “13 o’clock” to 12-hour, you remove all the 12s, leaving just 1 behind. Or you can think of it as coiling the entire number line into a circle, so after 11 you wrap around to 0 and start over. (That’s not quite how clocks work, but using 0–11 turns out to be much simpler than using 1–12.)

The upshot here is that t will hit 119 and then wrap back around to zero, which is important because PICO-8 numbers can’t go any higher than 32767. If I left it to its own devices, it would still wrap around, but to the more cumbersome -32768. I don’t want a negative clock!

But why 120? Because I want to be able to divide the clock cycle into smaller animation cycles, and I can only do that evenly if the whole clock’s length is a multiple of the smaller cycle’s length. (On a more powerful system, I’d have a more elaborate animation setup, but that would cost more space and code than I’m willing to spend here.) Consider if I had a clock that wrapped around at 10, and I wanted an animation 3 tics long. I would use modulo 3 to shrink the clock, resulting in:

Whoops! Frame 0 will show twice in a row, intermittently, even seemingly at random. That’s not great. For the best chance of avoiding that problem without having to think too hard about it, I want a clock whose length is divisible by as much stuff as possible — a highly composite number. And, of course, 120 is one such number.

Next, I track whether Star Anise is moving at all, so I know whether to play the walk animation. Note that I always assume he isn’t moving, and then correct myself if it turns out he is; otherwise, the new value of moving would persist into future tics and he’d never stop.

That brings me to the new drawing code, which is a little tricky, so here it is a bit at a time:

1
2
3
4
5
6
7
8
9
-- top of the file
local anise_stand = {1, 2, 17, 18, 33, 34}
local anise_jump = {3, 2, 17, 18, 19, 35}

    -- in _draw()
    local pose = anise_stand
    if moving and t % 8 < 4 then
        pose = anise_jump
    end

This decides which tiles I’m going to draw. I can’t draw the walking part (which I’ve called “jump” because it does look like a jump in isolation, and I’ll be reusing them for that later) as a single block with spr() like before, and I’d like to share the code, so both frames are now assembled from individual tiles.

Note that tiles 1, 2, 17, 18, 33, and 34 are exactly the ones I was drawing in a single spr() call before. (The numbers increase by 16 when jumping to the next row, which makes sense, because each row has 16 tiles in it.) The other set is similar, but it has the alternate tiles substituted in.

I only want to use the jump tiles if Star Anise is moving, and if t % 8 < 4. That % turns my 120-tic clock into an 8-tic clock, then checks if we’re in the first half of it. Essentially: if it’s before noon, show the alternate frame; otherwise, show the normal standing frame.

The use of a global timer does have some subtle drawbacks here. If I tap an arrow key to move Star Anise only very briefly, then he may or may not animate, depending on whether the tap happens to be during the “stand” or “jump” intervals. A more powerful system, where every animation kept track of its own time, would always briefly show him moving. (On the other hand, this is an interesting aesthetic in its own right that kinda complements the very low-res and exaggerated animation.)

Next I need to draw the tiles, but we’ve come to the catch I mentioned before. When I draw Star Anise flipped, I’m now drawing him as a bunch of separate tiles. If I drew them in the same left-to-right order, then his left side would be flipped, and his right side would be flipped, but the whole image wouldn’t be. Er, just look at this picture.

Star Anise's walk frames, flipped one tile at a time

See? The tiles are arranged the same way, but each one is individually flipped, and the result is… not what I want. I’ll need to also draw the columns in reverse order. And that’s exactly what I do:

45
46
47
48
49
50
51
    local y = py
    local x0 = px
    local dx = 8
    if left then
        dx = -8
        x0 += 8
    end

Here I’m determining the start point and how far apart the tiles are. The variable names are fairly terse, for a couple of reasons: one, the PICO-8 screen is not very wide, so long variable names make code much harder to read; but also, math code tends to be easier to follow with shorter names anyway. I’ve even taken the naming conventions from math — the initial state of a variable is often written with a subscript zero (\(x_0\)) and a change is written with the Greek letter delta (\(\Delta x\)), so I’ve used the ASCII equivalents of those, x0 and dx.

I’m starting from Star Anise’s position, of course, and then each tile is 8 pixels right of the previous one… if he’s not flipped. If he is flipped, I want to move left, which will draw the tiles in reverse order. But that would change where he draws from, so to compensate, I also start drawing 8 pixels right of where I usually would. (Try to convince yourself that this is correct; on a flipped Star Anise, tile number 1 should draw 8 pixels left from his upper-left corner.)

52
53
54
55
56
57
58
59
60
61
    local x = x0
    for i = 1, #pose do
        spr(pose[i], x, y, 1, 1, left)
        if i % 2 == 0 then
            x = x0
            y += 8
        else
            x += dx
        end
    end

All that’s left to do is the drawing itself. For each tile in the pose list, I draw that tile. Each row is two tiles wide, so after every second tile, I reset the horizontal “cursor” (x) back to where it started and move down by one row’s worth of pixels. For any other tile, I just move horizontally by dx.

The results are basically magic.

Star Anise walking around the screen and turning to face the way he's moving

And that’s a good place to pause for now. Yes, I know, we didn’t get very far, but this is part zero! It’s mostly a test of this series and its tone for me, and a test of fortitude for you. I hope you could follow along with the minor mathematical hijinks above, because next time it gets much worse — before I can do anything else at all, I have to write collision detection. Oh boy! Stay tuned! And always feel free to ask questions, of me or anyone else!

Appendix: PICO-8 Lua extensions

Here are all the modifications PICO-8 has made to the language (based on Lua 5.2). If you’ve never used Lua, keep in mind that these won’t carry over if you try to write Lua anywhere else. Some of these are advanced features, so if you have no idea what something means, that’s probably fine.

Spoilers: it’s mostly that the standard library has changed.

  • Numbers are signed 15.16 fixed-point, rather than stock Lua’s 64-bit floating point. That means fractions can only be represented in increments of 0.0000152587890625 (= \(2^{-16}\), a cumbersome number I refer to as the “Planck size”), and numbers can’t exceed ±32768.

  • Compound assignment is supported: a += b works as in a = a + b in stock Lua, where + can be replaced with any binary operator.

  • != is allowed as an alias for ~=.

  • if (foo) bar = 1 is shorthand for if foo then bar = 1 end. The parentheses are required, and the condition ends at the end of the line. (I strongly advise against using this unless you’re very desperate for space; it scans poorly and doesn’t even save tokens.)

  • The new @, %, and $ unary prefix operators read 1, 2, or 4 bytes from a memory address. (PICO-8’s memory, not system RAM!)

  • The ? unary prefix operator is equivalent to print. (I’ve never used it, and it’s not even directly documented.)

  • The built-in functions collectgarbage, dofile, error, pcall, require, select, and xpcall are not available (though the lack of select might be a bug).

    The built-in variables _G and _VERSION are not available.

    load has been replaced with a function that loads PICO-8 carts from files.

    print has been replaced with a drawing function, which prints a single string at a position on screen.

    tonumber and tostring have been replaced with tonum and tostr, which behave slightly differently (but tostr does still respect the __tostring metatable field).

    (assert, getmetatable, ipairs, next, pairs, rawequal, rawget, rawlen, rawset, setmetatable, and type still exist and work as in stock Lua.)

  • The coroutine library is not available, but most of its contents are exposed directly as cocreate, coresume, costatus, and yield. There is no equivalent for coroutine.running or coroutine.wrap.

  • The require function and package library are not available, though the #include syntax can be used to textually substitute the contents of a Lua file.

  • The string library is not available. Replacement string functions are: chr, ord, split, and sub.

  • The table library is not available. Replacement table functions are: add, del, deli, count, all, foreach. There is no built-in way to concatenate or sort a list.

  • The math library is not available. Replacement math functions are: max, min, mid, flr, ceil, sin, cos, atan2, sqrt, abs, rnd, srand. There is also an integer division operator, \.

  • The bit32 library is not available, but bitwise operations are available as both functions — band, bor, bxor, bnot, shl, shr, lshr, rotl, rotr — and operators — &, |, ^^, ~, <<, >>, >>>, <<>, >><.

  • The io library is not available. Running PICO-8 cartridges have no notion of a filesystem.

  • The os library is not available. Running PICO-8 cartridges have no direct access to the underlying operating system. (Some facilities are exposed through the “syscall” function stat, such as accessing the current UTC or local time.)

  • The debug library is not available.

  • A number of other new functions were added, though I won’t list them all here; they’re generally for drawing, working with assets, or interacting with the PICO-8’s faux hardware.

Lexy’s Labyrinth

Post Syndicated from Eevee original https://eev.ee/release/2020/09/26/lexys-labyrinth/

Screenshot of a small tile-based puzzle with a number of different elements, taken from CCLP1

🔗 Lexy’s Labyrinth
🔗 Source code on GitHub
🔗 itch.io later

Here is Lexy’s Labyrinth, a web-based Chip’s Challenge emulator.

It’s easy to get into and mostly speaks for itself, so here is a story.


Once upon a time, there was a puzzle game called Chip’s Challenge. It was created in 1989 for the Atari Lynx, an early handheld that is probably best known for… uh… Chip’s Challenge. It stood out as a curious blend of Sokoban head-scratching with real-time action, and it was one of the first computer puzzle games that had a whole pile of different mechanics and relied on exploiting the interesting interactions between them[citation needed].

The game found wider recognition with its inclusion in Microsoft Entertainment Pack 4, and later the Best of Windows Entertainment Pack (charmingly abbreviated “BOWEP”).


That in itself is a curious story — numerous features of the Atari Lynx version were lost in translation, most notably that the Lynx version has the player and monsters slide smoothly between grid cells, whereas the Microsoft port has everything instantly snap from one cell to the next. Also conspicuous is the presence of several typos in level passwords, which are exactly consistent with a set of notes a player took about the Lynx game, but which would be impossible in a straight port — the Lynx level passwords weren’t manually set, but were generated on the fly by a PRNG.

Screenshot of the Microsoft edition of Chip's Challenge, showing the first level, courtesy of the BBC wiki

The most obvious explanation is that the developer responsible for the Microsoft port didn’t have access to the Lynx source code, and in fact, had never played the original game at all. That would explain nearly every major gameplay difference between the Lynx and Microsoft versions, which are all things you’d never notice if you only had static screenshots and maps to work from. Given that restriction, hey, not a bad job.


I played the BOWEP edition of Chip’s Challenge as a kid and was completely enamoured. I suppose what got me the most was the same thing that I found so compelling about Doom: the ability to modify your environment, whether by using blocks to clear water or toggling green blocks or generating new monsters from a clone machine. Being able to affect my environment in (more or less) free-form ways felt curiously powerful.

Well, let’s not think about that too hard. I’ll save it for my therapist.

Some years later I discovered an incredible tool called The Internet, and with it I learned of the impending Chip’s Challenge 2, a sequel with way more tiles and possibilities! Fantastic!

Unfortunately, there was a complication. Epyx, the original publisher of Chip’s Challenge, had gone bankrupt (somehow!) and had sold most of its assets, including the Chip’s Challenge rights, to a company called Bridgestone Media (now Alpha Omega Productions), a Christian propaganda distributor.

You read that correctly.

Bridgestone, a company that generally dealt in movies, had some very peculiar ideas about the video game industry. Apparently they expected the assets they’d acquired to magically make them filthy rich — you know, just like Jesus would want — despite having acquired them from a company that had just evaporated. As such, they told the original developer, Chuck Somerville, that he could only release Chip’s Challenge 2 if he paid them one million dollars upfront.

He did not have one million dollars, and so Chip’s Challenge 2 languished forever.

(At this point, in hindsight, I wonder why Chuck didn’t simply change the story and tileset and release the game under a different name. Apparently he did start on something like this some years later, in the form of an open clone from scratch called Puzzle Studio, but it was eventually abandoned in favor of Chuck’s Challenge 3D. But I still wonder: why start a brand new thing, rather than rebrand and release the existing thing?)

We did have some descriptions of new Chip’s Challenge 2 mechanics, and so at the ripe old age of 15, with no idea what I was doing, I decided I would simply write my own version of Chip’s Challenge 2.

In QBasic.

Also I didn’t really understand how to handle the passage of time, so the game was turn-based and had no monsters.

But, given all that, it wasn’t that bad. I found the source code a few years ago and put it on GitHub along with a sample level and a description of all the tiles you can use in the plaintext level format. I’ve got a prebuilt binary for DOS (usable in DosBox) too, if you like — just have a levels.txt in the same directory, and be sure it uses DOS line endings. I used to have one or two actual levels, but they have tragically been lost to the sands of time.

Screenshot of my QBasic implementation of Chip's Challenge, using all character-based graphics

That would’ve been 2002.


Thirteen years later, in April 2015, a miracle occurred and defeated the Christians. Chip’s Challenge 2 was released on Steam.

It was fine. I don’t know. Over a decade of anticipation gets your hopes up, maybe. It’s a perfectly good puzzle game, and I don’t want to dunk on it, but sometimes I interact with it and I feel all life drain from my body.

Screenshot of CC2, with an overlaid hint saying: "This is Melinda.  Being female, she does some things differently than Chip."
Screenshot of CC2, with an overlaid hint saying: "She doesn't slide when she steps on ice.  But she needs hiking boots to walk on dirt or gravel."

I don’t even know whether to talk about this completely unreadable way of showing hints or the utterly baffling justification of “being female” for these properties.

But it’s fine. The game was Windows-only, but it was old Windows-only, so Wine handled it perfectly well. I played through a few dozen levels. Passwords were gone, so you were free to skip over levels you just didn’t feel like playing.

And then they patched a level editor into the game, and it completely broke under Wine. Completely. Like, would not even run. It’s only in recent years that it even tries to run, and now it can’t draw the window and crashes if you attempt to do anything.

The funny thing is, apparently it doesn’t draw for some people on Windows, either. It doesn’t for me in a Windows VM. The official sanctioned solution is to… install… wined3d, a Windows port of the Wine implementation of Direct3D.

I don’t know. I don’t know! I don’t know what the hell anything. This situation is utterly baffling. What even are computers.


I gave up on the game until recently, when something reminded me of it and I tried it again in Wine. No luck, obviously. I spent half a day squabbling with bleeding-edge versions and Proton patches and all manner of other crap, then resorted to the Bit Busters Club Discord, but they couldn’t help me either.

And then something stirred, deep inside of me. This game wasn’t that complicated, right? I actually know how to make video games now. I even know how to make art, sort of. And sound. And music. And…


And here I am, a month later, having replicated Chip’s Challenge in a web browser, fueled entirely by some new emotion I’ve discovered that lies halfway between spite and exhaustion. My real goal was to clone Chip’s Challenge 2 so I can actually fucking play this game I bought, but it is of course a more complex game. Still, CC2 support is something like 60% done; most of what remains is wiring, tracks, and ghost/rover behavior.

CC1 support is more interesting, anyway — there are far more custom CC1 levels around, and Lexy’s Labyrinth exposes almost 600 of them a mere click away. Given that the original Microsoft port was 16-bit and is now difficult to run (and impossible to buy), and the official (free!) Steam release is fairly awkward and unmaintained (the dev mostly makes vague statements about “old code”), and even the favored emulator Tile World has the aesthetics and usability of a 1991 Unix application, I’m hoping this will make the Chip’s Challenge experience a little more accessible. It has a partially working level editor, too, which lets you share levels you make by simply passing around a URL, and I think that is fucking fantastic.

LL cannot currently load level packs from the Steam release, but it’s a high priority. In the meantime, if you really want to play the original levels (even though CCLP1 is far better in my experience), it’ll load CHIPS.DAT if you’ve got it lying around. Also, it works on phones!


Probably the most time-consuming parts of this project were the assets. I had to draw a whole tileset from scratch, including all of the CC2 tiles which you don’t even get to see yet (and a few of which aren’t actually done). That probably took a week, spread out over the course of the entire last month. Sound effects took several days, though they got much easier once I decided to give up on doing them by wiring LFOs together in SunVox and just use a bunch of BeepBox presets. I spent a couple days on my own music track, and half a dozen other kind souls chipped in their own music — thank you so much, everyone!

And thank you to the Bit Busters Club, whose incredibly detailed knowledge made it possible to match the behavior of a lot of obscure-but-important interactions. The Steam version of CC1 comes with solution replays, and LL can even play a significant number of them back without ever desyncing.

I’ve been ignoring pretty much everything else for a month to get this in a usable state, so I’d like to take a break from it for now, but I’d really like to get all of CC2 working when I can, and of course make the level editor fully functional. I love accessible modding tools, you don’t see many of them in games any more, and with any luck maybe it’ll inspire some other kid to get into game development later.


…okay, I haven’t been ignoring everything else. I also reused the tiles I drew for a fox flux minigame in a similar style, except that you place a limited set of tiles in empty spaces and then let the game run by itself. Kind of like… Chip’s Challenge meets The Incredible Machine.

Recording of a minigame, showing a drone character interacting with moving floors and following instructions on the ground

(That arrow tile has since been updated to be more clear, but it means “when you hit something, turn around instead of stopping and ending the game.”)

I guess two little puzzle game engines isn’t too bad for not quite a month of work!

fox flux, three years later

Post Syndicated from Eevee original https://eev.ee/dev/2020/08/04/fox-flux-three-years-later/

I’m working on a video game! Like, a serious one.

The past

I wrote the original game (very slightly NSFW) for my own “horny” game jam, Strawberry Jam (more likely to be NSFW), way back in February 2017.

You play as Lexy, my shameless Floraverse self-insert, who owns an enchanted collar that (among other things) makes her basically indestructible and allows her to easy to transform into… whatever, given some kind of sensible trigger. And then you do some puzzle-platforming to collect “strawberry hearts” and gain access to new areas, much of which (surprise!) involves getting turned into things.

For example, this chain-link fence blocks you:

Screenshot of the player being stuck on one side of a fence

But if you let that green blob in the grass turn you into slime, you can walk right through it.

Screenshot of the same area, but the player is now green slime and free to pass through the fence

There are also spikes, which you get stuck on if you land on them… but slime can walk right through them, glass can stand on top of them, and stone outright destroys them. And so on. As a jam game, it’s not very expansive, but many of the puzzle elements interact differently with many of the handful of Lexy variants, which provided enough potential to make eight levels.

Post-jam

The jam game was rough, but I really liked the concept and wanted to expand on it. I spent a good chunk of the summer of 2017 on it, but it was a struggle. I was still fairly new to pretty much every aspect of actually creating a game — I’d only been drawing for two years, I’d sometimes hit big gaps in the design with no idea how to fill them, and I wasn’t yet entirely comfortable with complex physics or shaders. The art in particular was a huge problem; it took me a long time to produce sprites that I was only passably happy with. My spouse Ash is an artist, and we’ve made several games together where they produced all the art, but this was my idea and I was determined to draw it myself.

Then 2018 hit, which was a whole entire mess, and I didn’t really touch fox flux at all for over a year. I made a couple of other games with Ash, some finished, some not, and kept drawing intermittently.

I returned to fox flux for the middle of 2019, and decided… I’m not sure what I decided, exactly. I guess I’d gotten better at all the things that had been difficult for me before, so I set about trying to improve every aspect of the game at once.

  • I realized the (many, many) improved sprites I’d drawn in 2017 were not actually very good, and drew a new Lexy design from scratch that absolutely blew me away… which meant throwing away all the existing art.

  • I’d come up with a few new things for Lexy to turn into, each of which altered her behavior pretty significantly, and her code was becoming a spaghetti disaster. So I spent some time completely refactoring actors into bags of components, which I was unsure about until very recently and which ended up breaking pretty much every single object in the game, sometimes in subtle ways.

  • I decided to add water, which unraveled into a whole pile of decisions and problems.

  • I tried to make consistent or interesting physics for pushing things (e.g. wooden crates), and that became a nightmare. I easily spent weeks on this, trapped in a cycle of finding some edge case that couldn’t be fixed without considerably expanding what I was simulating, struggling to do that expansion while keeping all the basic stuff working, and then finding a new and different edge case.

Did I mention that I tried to do all of these things at the same time, while also trying to nail down the design of a game that’s naturally prone to a combinatoric explosion of interactions?

At a certain point it just felt hopeless. I’d poured easily over a year into this game, and all I had to show for it was a jumbled pile of stuff that didn’t work, strewn about a couple test maps that didn’t even contain any puzzles.

The present

I don’t know what happened, exactly. I’d given up on the heavily-simulated push physics last year, at least, so that wasn’t so much of a concern any more. But I still had a mess. I’d long since written git status off as unusable.

Until this past month, when I sat down and just started powering through the mess. One by one, I fixed the serious breakages that the component refactor had caused. I dedicated a day or two just to figuring out water physics, put a little more thought into it, and ended up with something that looks and plays quite nicely. I finished redrawing basic Lexy, and even added frames I hadn’t had before.

I think the difference was… fear. I’d previously hesitated so much, both in the art and the gnarlier code. It was such a struggle to get something working at all that changing it in any way was terrifying — what if I broke it and couldn’t even get it back to how it’d been?

I don’t know how to describe exactly how this felt, and I also don’t know how to explain what changed. It was like a switch flipped. I think it started when I drew new dirt tiles, and it didn’t even take that long, and I loved them. I’ve always had a hard time drawing terrain, and for once I just sat down and did it and it came out well and it looked like mine, like my style, which was a thing I hadn’t even really grasped I have before. After that I just cranked out a mountain of new sprite art, faster and better than anything I’d done before. Like I’d been accumulating XP over the past few years and just now decided to spend it all on levelling up.

Over the past six weeks, I have:

  • Redesigned the terrain
  • Vastly improved the palette
  • Completely finished redrawing Lexy
  • Redesigned the HUD
  • Mocked up a new dialogue layout
  • Drawn a new font
  • Drawn and implemented new consistent level entrances
  • Animated a treasure chest opening cutscene
  • Animated getting a key
  • Added a completely new tally at the end of a level
  • Added transitions for entering and leaving levels
  • Added swimming behavior
  • Redrawn the old gecko as a much more visible bananalizard
  • Animated the hearts and several other pickups
  • Ported the original forest levels to use all the new stuff
  • I don’t even know there has been just so much

Just look at the style evolution! God damn.

Three versions of Lexy in dirt tiles; over time, the style becomes more colorful and relies on stronger shapes and silhouettes

Here’s that same level from above:

Slime Lexy once again passing freely through the fence, but using newer assets

A lot of the last few weeks went towards level transitions, which previously… kind of worked. They were always a hasty jam hack that I never liked; there was a quick screen fade when going through a door, there was barely any notion of being “in a level” vs not, and the game even counted the fucking hearts in a level on the fly the first time you entered it. It was all very silly.

But now (please pardon the occasional frame drops from my screen recorder):

GIF of Lexy entering a level with a transition, collecting candy, exiting with another transition, and seeing the level tally

I finally feel like I’m making some real progress. I finally feel like this could be something I take seriously, that it could be a real game, something more than half an hour long. At some point it just became an absolute joy to look at and run around in.

The idea

The basic concept is the same, but I want to add some structure to it. The jam game was four single-room levels you could tackle in any order without much guidance, then another set of the same. Which is fine, but doesn’t give me much wiggle room in the design.

In the full game, levels will contain not just hearts, but also a treasure (a la Wario Land 3), some amount of candy (usable at the shop to buy things of some description), and an explicit exit. The overworld will function a bit more like a world map, and though you’ll still need to collect N hearts to get to the next zone, there may sometimes be obstacles that can only be overcome by finding the right treasure in a level.

I also intend to give Lexy some active abilities, for example this blown kiss (recorded with older art) that can toggle pink objects between two states:

Lexy blows a kiss towards a pink brick wall, which changes it into a pink grating

I even have a plot in mind! The jam game had only a teeny tiny one.

The future

Ash is currently busy with their own game, so I think this is gonna be The Thing I Do for a while. To that end, I’m in the middle of setting up some infrastructure:

Also, I recently created a secret Discord channel on the same server, where I intend to do planning and design work that I’m not ready to make public yet! Spoilers will abound, but if you’re interested and okay with that, you can get in by pledging at least $4 on Patreon and letting me know to give you the role. (I don’t use Patreon’s native Discord integration because it does rude things like forcibly rejoin you to the server even if you manually leave.)

Specific priorities

I’d like to finish porting the old levels over to new artwork, the new level infrastructure, etc. It’d make for a nice little Patreon demo or something, it gives me a milestone with pretty clear goals, and it’ll leave me with at least a small palette of puzzle elements that I know work correctly.

I’d like to write about what I’m doing sometimes on this dang blog. I’ve found that structured writing is really, really, really hard when my head is a mess, and it has been extremely a mess for the last two and a half years (sorry), but jotting down what I’m already doing should be much easier than the more elaborate posts I’ve written, which need research and tooling and whatnot.

I have a good handful of puzzle elements — some of which even work — and a bunch of ideas for more, but I haven’t actually tried building levels since I made the original game! That’s kind of the important part, so I’d love to do some of it now that the dust is finally settling.

I still have some design decisions to make, though they’re getting trickier since I’ve already decided all the easy stuff. But I’ll save that for the generous folks who give me four dollars, I guess.

The elephant in the room

So. As I mentioned at the beginning, this game was originally made for a “horny” game jam. Given that it’s mostly platforming, you might be wondering why that is. I already feel like I’m crossing the streams somehow by even mentioning this on this blog, so I’ll try very hard not to get TMI here.

I have a foot in “TF” (transformation) kink circles, and one thing that’s always struck me about that subculture is how much of it is completely non-sexual. You can find no end of artwork of, say, someone being turned into one of those inflatable pooltoys — where both the artist and the audience are obviously having a good time with it — yet with no hint of sexual elements whatsoever. It’s a form of sexuality that doesn’t need to be sexual at all.

I started Strawberry Jam because I wanted to see some adult games that were more creative with their gameplay. Much of the genre consists of otherwise regular games that occasionally show you some explicit artwork, and while that’s a perfectly fine way to design a game, I felt that the medium surely had more potential. It turns out that a non-sexual fantasy kink works wonders as a gameplay element; rather than just giving you a picture, the game takes a concept and has you experience it yourself, even figure out by experimentation how it’s altered the way you interact with the world.

This puts me in a slightly awkward position. I do, genuinely and platonically, love these kinds of gameplay themes! I adore changes in how you perceive or interact with a world — the dark world in Metroid Prime 2, the time reversal in Braid, the “dimension” swapping in Quantum Conundrum, etc. I think this is a great concept that anyone can have a good time with, and I feel like this game is a love letter to the Wario Land series.

At the same time, I do also appreciate the kink inspiration. Even Lexy’s collar was originally conceived as a gimmick I could use for drawing adult artwork. The jam game contains a lot of suggestive dialogue, since Lexy herself also appreciates the kink aspect. And that was a lot of fun to write, and I’m sure it enhanced the experience for other folks with similar leanings.

But this is such a good concept that I want it to be playable as just a regular puzzle-platformer as well. I think it would have fairly broad appeal, and I don’t want to hamstring myself by totally fucking weirding people out when it dawns on them that “oh the dev is kinda Into This huh”. And yet I don’t want to completely sterilize the game, either, because… well, ultimately, it’s my game and I like the suggestive parts.

This is a tough line to draw, and I’m not yet sure how to do it. I’ve considered just making alternative dialogue that you can opt into when you start the game, but given that Lexy already speaks differently depending on what form she’s in, I have no idea how feasible that is.

I don’t know how to gauge this. I’ve always been up to my armpits in the side of the internet that just posts porn and talks about sexuality casually, whereas I’m dimly aware that most people see sexuality as this completely distinct part of life that you hide in a small box, far away from the eyes of polite society. But maybe I’m overestimating that? Does anyone actually care if the protagonist of a game comments “hey this is hot” about something weird but innocuous?

Or maybe that’s exactly where the line is. I remember Nier: Automata, a game that is all too happy to show off the protagonist’s immaculately-rendered ass, which is clearly meant for the enjoyment of both the creator and the players. But nobody comments on it within the game, which makes it seem incidental, somehow. I can’t explain why that is, and it feels slightly dishonest to me.

Am I overthinking this? If you’re not involved in any kind of kink circles and played the original jam game, I’m curious to hear how it read to you. Was it at all uncomfortable, like perhaps the game was expecting you to heavily empathize with a feeling you don’t share at all? Or does putting that feeling on a character, rather than aiming it at the human player, make it something you can easily shrug off? The full game will have more stuff going on, so there should be lots more dialogue that isn’t solely about Lexy’s feelings, if that helps.


Hm, I thought I would have more to say here! I have a lot of ideas, but only a handful of them are implemented yet, and I guess it’s hard to show what a game will be like before most of it works.

I hope this is enough to whet some appetites, at least! I haven’t been excited like this about anything in far too long.

Star Anise Chronicles: Oh No Wheres Twig??

Post Syndicated from Eevee original https://eev.ee/release/2020/05/10/star-anise-chronicles-oh-no-wheres-twig/

Title and logo for the game

🔗 Play it on itch.io
🔗 Play it on the PICO-8 BBS (where you can also download the cart and view the source code)

(I originally drafted this just after publishing the game, but then decided to start a whole series about its development and wasn’t sure what to do with this! But it’s solid and serves a different purpose, so here it is.)

It’s been a while, but I made another PICO-8 game! It’s a little platformer with light puzzling, where you help Star Anise find his best friend Branch Commander Twig. It’s only half an hour long at worst, and it’s even playable on a phone!

This is the one-and-a-halfth entry in the Star Anise Chronicles series, which after several false starts, finally kicked off over Christmas with a… uh… interactive fiction game. Expect the series to continue with even more whiplash-inducing theme shifts.

More technical considerations will go in the “gamedev from scratch” series, but read on for some overall thoughts on the design. Both contain spoilers, of course, so I do urge you to play the game first.


The first attempt at a Star Anise game was two years ago, in early 2018. The idea was to make a Metroidvania where Star Anise had a bunch of guns that shot cat-themed projectiles, obtained a couple other cat-themed powers, and made a total mess of a serious plot happening in the background while he ran around collecting garbage.

After finishing up the Steam release of Cherry Kisses last month, we decided that our next game should be that one, which would now be Star Anise 2 (since i’d already released a Star Anise 1 some months ago). We have, uh, already altered these plans, but that’s the background.

I don’t really know why I started on this game. I guess there’s some element of stress to working on a project with someone, even if that someone is Ash (my spouse), and especially if I’m supposed to be driving it forward. I have to tell someone what to do, and then if I don’t like the result I have to ask them to fix it, and a lot of tiny design questions are out of my control anyway, and all of this is happening on someone else’s schedule, and I have to convey all the project state that’s in my head in a complicated non-verbal form, and… all of those things are a constant low-level source of stress.

So I guess we’d just finished a game that I’d designed, and it was looking like we were about to start a sizable project where I was the design lead again, and I wanted to make something I could finish by myself as an interlude.

And so I sat down with a teeny tiny tool to make a teeny tiny version of what I expected would be our next game.

Design

The basics were obvious: run, jump, land. I gave Star Anise little landing particles early on — they’re in the bigger prototype, I love landing puffs in general, and having them be stars adds so much silly personality.

I knew I wanted to have multiple abilities you collect, since that’s the heart of Metroidventures. I briefly considered giving Star Anise a gun, as in the prototype, but gave up on that pretty early. I would’ve had to sprite a gun, a projectile, a projectile explosion, enemies, enemy attacks, enemy death frames…

Don’t get me wrong; I have no problem with drawing all of that. The concern was that PICO-8 has a very limited amount of space for sprites — in the configuration I was using, 128 sprites of 8×8 pixels each. Star Anise himself takes up 9, even with some clever reuse for his walking animation. The star puff takes 4. The common world tile, plus versions for edges and corners, takes up 9. That’s 22 sprites already, more than 17% of the space I have, for absolutely nothing besides jumping around on solid ground. I would have to keep it simple.

That led me to the first two powers, both borrowed from the prototype:

  • AOWR starts conversation with NPCs and opens doors. I can’t really take any creative credit here, since these are both things Anise attempts to do with aowrs in real life.

  • Papping activates levers and knocks over glasses of liquid. Anise only does one of those in real life. (In the prototype, this is a gun — which shoots pawprint-shaped projectiles — but I’d already been thinking about making it a “melee” ability first.)

I adore both of these abilities. I think they both turn some common UI tropes on their heads. NPCs, doors, and levers are all things you usually interact with by pressing some generic “interact” button, but hitting a lever (and meowing at a door) adds some physicality to the action — you’re actually doing something, not just making it go.

And pressing A to talk to an NPC doesn’t really make any sense at all! Consider: almost universally, even in games where the player character speaks, pressing A to start a conversation leads off with the NPC talking. So what the hell did you actually do? What does pressing A represent actually doing that results in someone else casually starting a conversation with you, seemingly unprompted? I have no idea! It’s nonsense! But Anise meows at me all the time and I always respond to him, which is perfectly sensible.

The third power, telepawt, is a little newer. We’d conceived a cat teleporting power pretty recently, but it was more involved and required some big environmental props. I realized pretty quickly that I couldn’t possibly do much of interest on the tiny PICO-8 screen (16 × 16 tiles), but I do like teleporting abilities! I briefly considered ripping off Axiom Verge, but I’ve already done that in fox flux, and the physics are a little involved… and then, lo, inspiration! Combine the two ideas: teleport great distances, but in a controlled and predictable way, by teleporting to the point on the opposite side of the screen. It felt like a very 8-bit kind of power, and I could already imagine a few ways to hide stuff with it, so off I went.

And that seemed like a reasonable progression. A way to talk (and progress through doors), a way to interact with objects, and a way to move around. I decided about halfway through development to make jumping a faux powerup as well; it stretches out the progression a bit more by making you walk past potential jumps and then come back to them later, which is important when I don’t have much map space to work with.

I’d originally planned for items to be separate from abilities, but ran into a couple problems, the worst of which was that I really didn’t have much screen space for sprinkling more items around. I ended up turning items into abilities in their own right, which I think was an improvement overall; now you can crinkle the plastic bag wherever you want, for example.

The game deliberately doesn’t try to explain itself; PICO-8 only has six buttons, and four of them are a d-pad, so I figured button-mashing (as in ye olde NES days) would get everyone through. Still, several players were confused about how to jump (and possibly gave up before even acquiring jump?), and one didn’t realize you could switch abilities, despite the up/down arrowheads on the ability box. Not sure what to learn from this.

The map

I struggled a bit with the map. PICO-8 has a built-in map editor with enough space for 32 screen-sized rooms (arranged in an 8 × 4 grid), which it turns out is not very many. I also very much did not want the game space to be confined to exactly that size of rectangle, so I knew I’d have to do some funky stuff with room connections. (Armed with that power, I ended up making loops and other kinds of non-Euclidean geometry, but hey that’s plenty appropriate for an imaginary moon.)

The bigger problem was designing the rooms outside of the PICO-8 map editor. I tried sketching in Krita, and then on paper, but kept running into the same two problems: it was tedious to rearrange rooms, and I didn’t have a good sense of how much space was available per room.

I found a novel solution: I wrote a Python script to export the map to a PNG, opened it in Aseprite, and edited it there — with each pixel representing a tile and the grid size set to 16. Now I knew exactly how much space I had, and rearranging rooms was easy: double-clicking a cell selects it, and holding Alt while dragging a selection snaps it to the grid. Here’s the beginning part of the game, screenshotted directly from Aseprite at 400% zoom:

A very pixellated map, with bright pink lines to indicate odd connections

When it came time to pack it all back into a rectangle, I copied the whole map, rearranged the rooms, and numbered them all so I could keep track of connections. Surprisingly, it wasn’t that bad a workflow.

The non-Euclidean map connections came in handy for packing secrets in more efficiently; most of the secret stars are off-screen, making them harder to find, but I couldn’t really afford to have a dedicated treasure room for every single one. So I crammed two treasures into the same room a few times, even though the two routes you’d take to get there are generally nowhere near each other.

Doors helped stretch the map out, too. It’s probably obvious if you think about it in the slightest, but doors don’t lead to different rooms; they reuse the same room. But some tiles only appear in the overworld, some tiles only appear in cave world, and actors (besides doors) don’t spawn in caves. That seemingly small difference was enough to make rooms vastly different in the two worlds; the most extreme case is a “crossroads” room, which you traverse vertically in the overworld but horizontally in cave world. (Honestly, I wish I’d done a bit more of this, but it works best in rooms that only have two overworld exits, and there ended up not being too many of those. Also, caves are restricted to basically just platforming, so there’s only so much variety I can squeeze out of them.)

Designing caves was a little trickier than you might think, since the PICO-8 map has no layers! If something needed to occupy a tile in the overworld, then I could not put something in the same place in cave world. Along with the design nightmare that is telepawt, this gave me a couple headaches.

I do like the cave concept a lot, though. I love parallel versions of places in games, and I have an unfinished PICO-8 game that’s centered around that idea taken to extremes. It’s also kind of a nod to my LÖVE games, all the way back to Neon Phase, where going indoors didn’t load another map — rooms were just another layer.

Aesthetics

Originally, PICO-8 had a fixed palette of 16 colors. You could do palette swaps of various sorts, but you can’t actually change any of the colors.

But since I last used it, PICO-8 gained a “secret palette” — an extra 16 colors that you can request. You can’t have more than 16 colors on the screen at a time, but you can replace one of the existing colors with a “secret” color. There’s also an obscure way to tell PICO-8 to preserve the screen palette when the game finishes, which means I could effectively change the palette in the sprite editor. Hooray!

I didn’t want to completely change the palette, so I tried to keep the alterations minor. For the most part, I gave up reds and pinks for a better spread of greens, purples, and yellows. Here’s the core PICO-8 palette, the secret PICO-8 palette, and the game’s palette, respectively:

A very bright palette, a softer and warmer version of the same colors, and a mix of them

I think I did a decent job of preserving the overall color aesthetic while softening the harsh contrasts of the original palette, and the cool colors really helped the mood.

Note that I changed the background color (color 0 isn’t drawn when it appears in a sprite) to navy and promoted black to a foreground color, which helped black stand out more when used as an outline or whatever. Probably the best example of this is in the logo, traced from the vector logo I made for the first Star Anise game.

Hmm, what else. The tiles themselves felt almost forced, if that makes sense? Like I could only draw them one way. PICO-8 tiles are a rinky-dink 8 pixels, and boy that is not much to work with. If I had a lot of sprite space, I could make bigger metatiles, but… I don’t, so I couldn’t. I tried a lot of variations of tiles, and what I ended up with were pretty much the only things that worked.

I love how the emoting came out. I knew I didn’t have nearly enough room for facial expressions for everyone, but I wanted to give them some kind of visual way to express mood, and the tiny overlays kinda fell naturally out of that. I think they add a ton of personality, especially in how everyone uses them differently.

I’m pretty happy with the sound design, as well. I’m an extremely amateur composer, and I wrote 90% of the music in a few hours on the start of the last day, but I actually like how it came out and I like going back to listen to it. The sound effects are, with some mild exceptions, pretty much excellent — the aowr is incredible, it has fooled other folks in the house more than once, and I knew I had it right when I had a blast just running around mashing the meow button.

I’m also happy with the dialogue, and hope it conveys the lunekos’ personalities in just these few interactions.

While writing the ending, I had to stop in mid-draft to go cry. Then I cried again when I finished it a few days later. I’ll miss you forever, Branch Commander Twig.

If you did, thanks for playing.

Cheezball Rising: Collision detection, part 1

Post Syndicated from Eevee original https://eev.ee/blog/2018/11/28/cheezball-rising-collision-detection-part-1/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I bash my head against a rock. Sorry, I mean I bash Star Anise against a rock. It’s about collision detection.

Previously: I draw some text to the screen.
Next: more collision detection, and fixed-point arithmetic.

Recap

Last time I avoided doing collision detection by writing a little dialogue system instead. It was cute, and definitely something that needed doing, but something much more crucial still looms.

Animation of the text box sliding up and scrolling out the text

I’ve put it off as long as I can. If I want to get anywhere with actual gameplay, I’m going to need some collision detection.

Background and upfront decisions

Collision detection is hard. It’s a lot of math that happens a few pixels at a time. Small mistakes can have dramatic consequences, yet be obscure enough that you don’t even notice them. Even using an off-the-shelf physics engine often requires dealing with a mountain of subtle quirks. And did I mention I have to do it on a Game Boy?

Someday I’ll write an article about everything I’ve picked up about collision detection, but I haven’t yet, so you get the quick version. The problem is that an object is moving around, and it should be unable to move into solid objects. There are two basic schools of thought about the solution.


Discrete collision observes that an object moves in steps — a little chunk of movement every frame — and simply teleports the object to its new location, then checks whether it now overlaps anything.

Illustration of an object attempting to move into a wall

(Note that all of these diagrams show very exaggerated motion. In most games, objects are slow and frames are short, so nothing moves more than a pixel or two at a time. That’s another reason collision detection is hard: the steps are so small that it can be difficult to see what’s actually going on.)

If it does overlap, you might might try to push it out of whatever it’s overlapping, or you might cancel the movement entirely and simply not move the object that frame.

Both approaches have drawbacks. Pushing an object out of an obstacle isn’t too difficult a problem, but it’s possible that the object will be pushed out into another obstacle, and now you have a complicated problem. (At this point, though, you could just give up and fall back to cancelling the movement.)

But cancelling the movement means that an object might get “stuck” a pixel or two away from a wall and never be able to butt up against it. The faster the object is trying to move, the bigger the risk that this might happen.

That said, this is exactly how the original Doom engine handles collision, and it seems to work well enough there. On the other hand, Doom is first-person so you can’t easily tell if you’re butting right up against a wall; a pixel gap is far more obvious in a game like this. On the other other hand, Doom also has bugs where a fast monster can open a locked door from its other side, because the initial teleport briefly moves the monster far enough into the door that it’s touching the other (unlocked) side.

Sorry. I have very conflicting feelings about this thicket of drawbacks and possible workarounds.

Either way, discrete collision has one other big drawback: tunnelling. Since the movement is done by teleporting, a very fast object might teleport right past a thin barrier. Only the new position is checked for collisions, so the barrier is never noticed. (This is how you travel to parallel universes in Mario 64 — by building up enough speed that Mario teleports through walls without ever momentarily overlapping them.)

Illustration of an object passing through a wall or erroneously pushing into one

There are some other potential gotchas, though they’re rare enough that I’ve never seen anyone mention them. One that stands out to me is that you don’t know the order that an object collided with obstacles, which might make a difference if the obstacles have special behavior when collided with and the order of that behavior matters.


Continuous collision detection observes that game physics are trying to simulate continuous motion, like happens in the real world, and tries to apply that to movement as well. Instead of teleporting, objects slide until they hit something. Tunnelling is thus impossible, and there’s no need to handle collisions since they’re prevented in the first place.

Illustration of an object sliding towards a wall and stopping when it touches

This has some clear advantages, in that it eliminates all the pitfalls of discrete collision! It even functions as a superset — if you want some object to act discretely, you could simply teleport it and then attempt to “move” it along the zero vector.

That said, continuous collision introduces some of its own problems. The biggest (for my purposes, anyway) is that it’s definitely more complicated to implement. “Sliding” means figuring out which obstacle would be hit first. You can do raycasting in the direction of movement and see what the ray hits first, though that’s imprecise and opens you up to new kinds of edge cases. If you’re lucky, you’re using something like Unity and can cast the entire shape as a single unit. Otherwise, well, you have to do a bunch of math to find everything in the swept path, then sort them in the order they’d be hit.

The other big problem is that it’s more work at runtime. With discrete collision, you only need to check for collisions in the new location. That only costs more time when a lot of objects are bunched together in one place, which is unlikely. With continuous collision, everything along the swept path needs to be examined, and that means that the faster an object moves, the more expensive its movement becomes.

So, not quite a golden bullet for the tunnelling problem. But that’s not a surprise; the only way to prevent tunnelling is to check for objects between the start and end positions.


Which, then, do I want to implement here?

For platforms without floating point (including the PICO-8 and Game Boy), there’s a third, hybrid option. If everything’s expressed with integers (or fixed point), then the universe has a Planck length: a minimum distance that every other distance must be an integral multiple of. You can thus fake continuous collision by doing repeated steps of discrete collision, one Planck length at a time. Objects will be collided with in the correct order, and you can simply stop at the first overlap.

Of course, this eats up a lot of time, since it involves doing collision detection numerous times per object per frame. So unless your Planck length is really big, I’m not sure it’s worth it.

Instead, I’m going to try for continuous collision. It’s closer to “correct” (whatever that means), and it’s what I did for all of my other games so far. It’s definitely harder, thornier, more complicated, and slower, but dammit I like it. It should also save me from encountering surprise bugs later on, which means I can write collision code once and then pretty much forget about it. Ideal.

Getting started

Star Anise is the only entity at the moment, so as a first pass, I’m only going to implement collision with the world.

World collision is much easier! Everything is laid out in a fixed grid, so I already know where the cells are. Finding potential overlaps is fairly simple, and best of all, I don’t need to sort anything to know what order the cells are in.

Right away, I find I have another decision to make. I would normally want to use vector math here — the motion is some distance in some direction, and hey, that’s a vector. But vectors take up twice as much space (read: twice as many registers), and a lot of vector operations rely on division or square roots which are non-trivial on this hardware.

With a great reluctant sigh, I thus commit to one more approximation, one made on 8-bit hardware since time immemorial. I won’t actually move in the direction of motion; instead, I’ll move along the x-axis, then move along the y-axis separately. Diagonal movement could theoretically cut across some corners (or be unable to fit through very tight gaps), but those are very minor and unlikely inconveniences. More importantly, this handwaving can’t allow any impossible motion.

I’ve already taken for granted that entities will all be axis-aligned rectangles. I’m definitely not dealing with slopes on a goddamn Game Boy. That was hard enough to do from scratch on a modern computer.

But I’m getting ahead of myself. First things first: you may recall that Star Anise’s movement is a bit of a hack. Pressing a direction button only adds to or subtracts from the sprite coordinates in the OAM buffer; his position isn’t actually stored in RAM anywhere. In fact, thanks to my slightly nonlinear storytelling across these posts, his movement isn’t stored anywhere either! The input-reading code writes directly to the OAM buffer. Whoops. I intended to fix that later, and now it’s later, so here we go.

1
2
3
4
5
; Somewhere in RAM, before anise_facing etc
anise_x:
    db
anise_y:
    db

So far, so good. OAM is populated in two places (and I should fix that later, too): once during setup, and once in the main game loop. Both will need to be updated to use these values.

Setup needs to initialize them first, of course:

1
2
3
4
    ld a, 64
    ld [anise_x], a
    ld [anise_y], a
    ; ... initialize anise_facing, etc ...

And now the OAM setup can be fixed. But, surprise! I left myself another hardcoded knot to untangle: even the relative positions of the sprites are hardcoded. Okay, so, those need to be put somewhere too. Eventually I’m going to need some kinda entity structure, but since there’s only one entity, I’ll just slap it into a constant somewhere.

(I guess my programming philosophy is leaking out a bit here. Don’t worry about structure until you need it, and you don’t need it until you need it twice. Once code works for one thing, it’s relatively straightforward to make it work for n things, and you have fewer things to worry about while you’re just trying to make something work.)

1
2
3
4
5
; In ROM somewhere
ANISE_SPRITE_POSITIONS:
    db -2, -20
    db -8, -14
    db 0, -14

It’s not immediately obvious from looking at these numbers, but I’m taking Star Anise’s position to mean the point on the ground between his feet. That’s the best approximation of where he is, after all.

(Early in game development, it seems natural to treat position as the upper-left corner of the sprite, so you can simply draw the sprite at the entity’s position — but that tangles the world model up with the sprite you happen to have at the moment. Imagine the havoc it’d wreak if you changed the size of the sprite later!)

Okay, now I can finally—

What? How does the code know there are exactly 3 sprites, on this byte-level platform? Because I’m hardcoding it. Shut up already I’ll fix it later

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
    ; Load the x and y coordinates into the b and c registers
    ld hl, anise_x
    ld b, [hl]
    inc hl
    ld c, [hl]
    ; Leave hl pointing at the sprite positions, which are
    ; ordered so that hl+ will step through them correctly
    ld hl, ANISE_SPRITE_POSITIONS

    ; ANTENNA
    ; x-coord
    ; The x coordinate needs to be added to the sprite offset,
    ; AND the built-in OAM offset (8, 16).  Reading the sprite
    ; offset first allows me to use hl+.
    ld a, [hl+]
    add a, b
    add a, 8
    ; Previously, hl pointed into the OAM buffer and advanced
    ; throughout this code, but now I'm using hl for something
    ; else, so I use direct addresses of positions within the
    ; buffer.  Obviously this is a kludge and won't work once
    ; I stop hardcoding sprites' positions in OAM, but, you
    ; know, I'll fix it later.
    ld [oam_buffer + 1], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 0], a
    ; This stuff is still hardcoded.
    ; chr index
    xor a
    ld [oam_buffer + 2], a
    ; attributes
    ld [oam_buffer + 3], a

    ; The rest of this is not surprising.

    ; LEFT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 5], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 4], a
    ; chr index
    ld a, 2
    ld [oam_buffer + 6], a
    ; attributes
    ld a, %00000001
    ld [oam_buffer + 7], a

    ; RIGHT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 9], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 8], a
    ; chr index
    ld a, 4
    ld [oam_buffer + 10], a
    ; attributes
    ld a, %00000001
    ld [oam_buffer + 11], a

Boot up the game, and… it looks the same! That’s going to be a running theme for a little bit here. Sorry, this isn’t a particularly screenshot-heavy post. It’s all gonna be math and code for a while.

Now I need to split apart the code that reads input and applies movement to OAM. Reading input gets much simpler, since it doesn’t have to do anything any more, just compute a dx and dy.

This code does still have looming questions, such as how to handle pressing two opposite directions (which is impossible on hardware but easy on an emulator), or whether diagonal movement should be fixed so that Anise doesn’t move at \(\sqrt{2}\) his movement speed.

Later. Seriously the actual code has so many XXX and TODO and FIXME comments that I edit out of these posts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    ; Anise update loop
    ; Stick dx and dy in the b and c registers.
    ld a, [buttons]
    ; b/c: dx/dy
    ld b, 0
    ld c, 0
    bit PADB_LEFT, a
    jr z, .skip_left
    dec b
.skip_left:
    bit PADB_RIGHT, a
    jr z, .skip_right
    inc b
.skip_right:
    bit PADB_UP, a
    jr z, .skip_up
    dec c
.skip_up:
    bit PADB_DOWN, a
    jr z, .skip_down
    inc c
.skip_down:

    ; For now just add b and c to Anise's coordinates.  This
    ; is where collision detection will go in a moment!
    ld a, [anise_x]
    add a, b
    ld [anise_x], a
    ld a, [anise_y]
    add a, c
    ld [anise_y], c

All that’s left is to more explicitly update the OAM buffer!

This code ends up looking fairly similar to the setup code. So similar, in fact, that I wonder if these blocks should be merged, but I’ll do that later:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
    ; Load x and y into b and c
    ld hl, anise_x
    ld b, [hl]
    inc hl
    ld c, [hl]
    ; Point hl at the sprite positions
    ld hl, ANISE_SPRITE_POSITIONS

    ; ANTENNA
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 1], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 0], a
    ; LEFT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 5], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 4], a
    ; RIGHT PART
    ; x-coord
    ld a, [hl+]
    add a, b
    add a, 8
    ld [oam_buffer + 9], a
    ; y-coord
    ld a, [hl+]
    add a, c
    add a, 16
    ld [oam_buffer + 8], a

Phew! And the game plays exactly the same as before. Programming is so rewarding.

On to the main course!

Collision detection, sort of

So. First pass. Star Anise can only collide with the map.

Ah, but first, what size is Star Anise himself? I’ve only given him a position, not a hitbox. I could use his sprite as the hitbox, but with his helmet being much bigger than his body, that’ll make it seem like he can’t get closer than a foot to anything else. I’d prefer if he had an explicit radius.

1
2
3
; in ROM somewhere
ANISE_RADIUS:
    db 3

Remember, Star Anise’s position is the point between his feet. This describes his hitbox as a square, centered at that point, with sides 6 pixels long. The top and bottom edges of his hitbox are thus at y - r and y + r, which makes for some pleasing symmetry.

(Making hitboxes square doesn’t save a lot of effort or anything, but switching to rectangles later on wouldn’t be especially difficult either.)

The plan

My plan for moving rightwards, which I came up with after a lot of very careful and very messy sketching, looks like this:

  1. Figure out which rows I’m spanning.

  2. Move right until the next grid line. No new obstacle can possibly be encountered until then, so there’s nothing to check.

    (Unless I’m somehow already overlapping an obstacle, of course, but then I’d rather be able to move out of the obstacle than stay stuck and possibly softlock the game.)

  3. In the next grid column, check every cell that’s in a spanned row. If any of those cells block us, stop here. Otherwise, move to the next grid line (8 pixels).

  4. Repeat until I run out of movement.

    (It’s very unlikely the previous step would happen more than once; an entity would have to move more than 8 pixels per frame, which is 3 entire screen widths per second.)

Here’s a diagram. In this case, step 3 checks two cells for each column, but it might check more or fewer depending on how the entity is positioned. (It’ll never need to check more than one cell more than the entity’s height.)

Illustration of the above algorithm

Seems straightforward enough. But wait!

Edge case

I’ll save you a bunch of debugging anguish on my part and skip to the punchline: there’s an edge case.

I mean, literally, the case of when the entity’s edge is already against a grid line. That’ll happen fairly frequently — every time an entity collides with the map, it’ll naturally stop with its edge aligned to the grid.

The problem is all the way back in step 1. Remember, I said that to figure out which grid row or column a point belongs to, I need to divide by 8 (or shift right by 3). So the rows an entity spans must count from its top edge divided by 8, to its bottom edge divided by 8. Right?

Well…

Diagram showing division by 8 for several possible positions; when the bottom of the entity touches a grid line, it appears to be jutting into the row below

Everything’s fine until the entity’s bottom edge is exactly flush with the grid line, as in the last example. Then it seems to be jutting into the row below, even though no part of it is actually inside that row. If the entity tried to move rightwards from here, it might get blocked on something in row 1! Even worse, if row 1 were a solid wall that it had just run into, it wouldn’t be able to move left or right at all!

What happened here? There’s a hint in how I laid out the diagram.

There’s something akin to the fencepost problem here. I’ve been talking about rows and columns of the grid as if they were regions — “row 1” labels a rectangular strip of the world. But pixel coordinates don’t describe regions! They describe points. A pixel is a square area, but a pixel coordinate is the point at the upper left corner of that area.

In the incorrect example, the bottom of the entity is at y = 8, even though the row of pixels described by y = 8 doesn’t contain any part of the hitbox. I’m using the coordinate of the pixel’s top edge to describe a box’s bottom edge, and it falls apart when I try to reinterpret that coordinate as a region. In terms of area, y = 8 really names the first row of pixels that the entity doesn’t overlap.

To work around this, I need to adjust how I convert a coordinate to the corresponding grid cell, but only when that coordinate describes the right or bottom of a bounding box. Bottom pixel 8 should belong to row 0, but 9 should still end up in row 1.

As luck would have it, I’m using integers for coordinates, which means there’s a Planck length — a minimum distance of which all other distances are a multiple. That length is, of course, 1 pixel. If I subtract that length from a bottom coordinate, I get the next nearest coordinate going upwards. If the original coordinate was on a grid line, it’ll retreat back into the cell above; otherwise, it’ll stay in the same cell. You can check this with the diagram, if you need some convincing.

(This works for any fixed point system; integers are the special case of fixed point with zero fractional bits. It would not work so easily with floating point — subtracting the smallest possible float value will usually do nothing, because there’s not enough precision to express the difference. But then, if you have floating point, you probably have division and can write vector-based collision instead of taking grid-based shortcuts.)

All that is to say, I just need to subtract 1 before shifting. For clarity, I’ll write these as macros to convert a coordinate in a to a grid cell. I call the top or left conversion inclusive, because it includes the pixel the coordinate refers to; conversely, the bottom and right conversion is exclusive, like how a bottom of 8 actually excludes the pixels at y = 8.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
; Given a point on the top or left of a box, convert it to the
; containing grid cell.
ToInclusiveCell: MACRO
    ; This is just floor division
    srl a
    srl a
    srl a
ENDM
; Given a point on the bottom or right of a box, convert it to
; the containing grid cell.
ToExclusiveCell: MACRO
    ; Deal with the exclusive edge by subtracting the planck
    ; length, then flooring
    dec a
    srl a
    srl a
    srl a
ENDM

At last, I can write some damn code!

Some damn code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
    ; Here, b and c contain dx and dy, the desired movement.

    ; First, figure out which columns we might collide with.
    ; The NEAREST is the first one to our right that we're not
    ; already overlapping, i.e. the one /after/ the one
    ; containing our right edge.  That's Exc(x + r) + 1.
    ; The FURTHEST is the column that /will/ contain our right
    ; edge.  That's Exc(x + r + dx).
    ld hl, ANISE_RADIUS
    ; Put the NEAREST column in d
    ld a, [anise_x]             ; a = x
    add a, [hl]                 ; a = x + r
    ld e, a                     ; e = x + r
    ToExclusiveCell
    inc a                       ; a = Exc(x + r) + 1
    ld d, a                     ; d = Exc(x + r) + 1
    ; Put the FURTHEST column in e
    ld a, e                     ; a = x + r
    add a, b                    ; a = x + r + dx
    ToExclusiveCell
    ld e, a                     ; e = Exc(x + r + dx)

    ; Loop over columns in [d, e].
    ; If d > e, this movement doesn't cross a grid line, so
    ; nothing can stop us and we can skip all this logic.
    ld a, e
    cp d
    jp c, .done_x
    ; We don't need dx for now, so stash bc for some work space
    push bc
.x_row_scan:
    ; For each column we might cross: check whether any of the
    ; rows we span will block us.
    ; Hm.  This code probably should've been outside the loop.
    ld a, [anise_y]
    ld hl, ANISE_RADIUS
    sub a, [hl]
    ToInclusiveCell
    ld b, a                     ; b = minimum y
    ld a, [anise_y]
    add a, [hl]
    ToExclusiveCell
    ld c, a                     ; c = maximum/current y

.x_column_scan:
    ; Put the cell's row and column in bc, and call a function
    ; to check its "map flags".  I'll define that in a moment,
    ; but for now I'll assume that if bit 0 is set, that means
    ; the cell is solid.
    ; This is also why the inner loop counts down with c, not
    ; up with b: get_cell_flags wants the y coord in c, and
    ; this way, it's already there!
    push bc
    ld b, d
    call get_cell_flags
    pop bc
    ; If this produces zero, we can skip ahead
    and a, $01
    jr z, .not_blocked

    ; We're blocked!  Stop here.  Set x so that we're butted
    ; against this cell, which means subtract our radius from
    ; its x coordinate.
    ; Note that this can't possibly move us further than dx,
    ; because dx was /supposed/ to move us INTO this cell.
    ld a, d
    ; This is a /left/ shift three times, for cell -> pixel
    sla a
    sla a
    sla a
    sub a, [hl]
    ld [anise_x], a
    ; Somewhat confusing pop, to restore dx and dy.
    pop bc
    jp .done_x

.not_blocked:
    ; Not blocked, so loop to the next cell in this column
    dec c
    ld a, c
    cp b
    jr nc, .x_column_scan

    ; Finished checking one column successfully, so continue on
    ; to the next one
    inc d
    ld a, e
    cp d
    jr nc, .x_row_scan

    ; Done, and we never hit anything!  Update our position to
    ; what was requested
    pop bc
    ld a, [anise_x]
    add a, b
    ld [anise_x], a

I’ve also gotta implement get_cell_flags, which is slightly uglier than I anticipated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
; Fetches properties for the map cell at the given coordinates.
; In: bc = x/y coordinates
; Out: a = flags
get_cell_flags:
    push hl
    push de
    ; I have to figure out what char is at these coordinates,
    ; which means consulting the map, which means doing math.
    ; The map is currently 16 (big) tiles wide, or 32 chars,
    ; so the byte for the indicated char is at b + 32 * c.
    ld hl, TEST_MAP_1
    ; Add x coordinate.  hl is 16 bits, so extend b to 16 bits
    ; using the d and e registers separately, then add.
    ld d, 0
    ld e, b
    add hl, de
    ; Add y coordinate, with stride of 32, which we can do
    ; without multiplying by shifting left 5.  Alas, there are
    ; no 16-bit shifts, so I have to do this by hand.
    ; First get the 5 high bits by copying y into d, then
    ; shifting the 3 low bits off the right end.
    ld d, c
    srl d
    srl d
    srl d
    ; Then get the low 3 bits into the high 3 by swapping,
    ; shifting, and masking them off.
    ld a, c
    swap a
    sla a
    and a, $e0
    ld e, a
    ; Not sure that was really any faster than just shifting
    ; left through the carry flag 5 times.  Oh well.  Add.
    add hl, de

    ; At last, we know the char.  I don't have real flags at
    ; the moment, so I just hardcoded the four chars that make
    ; up the small rock tile.
    ld a, [hl]
    cp a, 2
    jr z, .blocking
    cp a, 3
    jr z, .blocking
    cp a, 12
    jr z, .blocking
    cp a, 13
    jr z, .blocking
    jr .not_blocking
    ; The rest should not be too surprising.
.blocking:
    ld a, 1
    jr .done
.not_blocking:
    xor a
.done:
    pop de
    pop hl
    ret

And that’s it!

That’s not it

The code I wrote only applies when moving right. It doesn’t handle moving left at all.

And here I run into a downside of continuous collision, at least in this particular case. Because of the special behavior of right/bottom edges, I can’t simply flip a sign to make this code work for leftwards movement as well. For example, the set of columns I might cross going rightwards is calculated exclusively, because my right edge is the one in front… but if I’m moving leftwards, it’s calculated inclusively. Those columns are also in reverse order and thus need iterating over backwards, so an inc somewhere becomes a dec, and so on.

I have two uncomfortable options for handling this. One is to add all the required conditional tests and jumps, but that adds a decent CPU cost to code that’s fairly small and potentially very hot, and complicates code that’s a bit dense and delicate to begin with. The other option is to copy-paste the whole shebang and adjust it as needed to go leftwards.

Guess which I did!

1
2
3
4
5
6
7
8
9
    ld a, b
    cp a, $80
    jp nc, .negative_x
.positive_x:
    ; ... everything above ...
    jp .done_x
.negative_x:
    ; ... everything above, flipped ...
.done_x:

Ugh. Don’t worry, though — it gets worse later on!

I could copy-paste for y movement too and give myself a total of four blocks of similar code, but I’ll hold off on that for now.

Ah.

You want the payoff, don’t you.

Well, I’m warning you now: the next post gets much hairier, and if I show you a GIF now, there won’t be any payoff next time.

You sure? Really?

No going back!

Star Anise walking around, but not through a rock!

I admit, this was pretty damn satisfying the first time it actually worked. Collision detection is a pain in the ass, but it’s the first step to making a game feel like a game. Games are about working within limitations, after all!

An aside: debugging

I’ve made this adventure seem much easier than it actually was by eliding all the mistakes. I made a lot of mistakes, and as I said upfront, it can be very difficult to notice heisenbugs or figure out exactly what’s causing them.

One thing that helped tremendously near the beginning was to hack Star Anise to have a fourth sprite: a solid black 6×6 square under his feet. That let me see where he was actually supposed to be able to stand. Highly recommend it. All I did was copy/paste everywhere that mentioned his sprites to add a fourth one, and position it centered under his feet.

(On any other system, I’d just draw collision rectangles everywhere, but the Game Boy is sprite-based so that’s not really gonna fly.)

I also had pretty good success with writing intermediate values to unused bytes in RAM, so I could inspect them in mGBA’s memory viewer even after the movement was finished. And of course, as an absolute last resort, bgb has an interactive graphical debugger. (Nothing against bgb per se; I just prefer not to rely on closed-source software running in Wine if I can at all get away with it.)

To be continued

Obviously, this isn’t anywhere near done. There’s no concept of collision with other entities, and before that’s even a possibility, I need a concept of other entities. I left myself a long trail of do-it-laters. There are even risks of overflow and underflow in a couple places, which I didn’t bother pointing out because I completely overhaul this code later.

But it’s a big step forward, and now I just need a few more big steps forward. (I say, four months later, long after all those steps are done.)

I already have some future ideas in mind, like: what if a map tile weren’t completely solid, but had its own radius? Could I implement corner cutting, where the game gently guides you if you get stuck on a corner by only a single pixel? What about having tiles that are 45° angles, just to cut down on the overt squareness of the map?

Well. Maybe, you know, later.

Anyway, that brings us up to commit da7478e. It’s all downhill from here.

Next time: more collision detection, and fixed-point arithmetic!

Cheezball Rising: Opening a dialogue

Post Syndicated from Eevee original https://eev.ee/blog/2018/10/09/cheezball-rising-opening-a-dialogue/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I draw some text!

Previously: I get a Game Boy to meow.
Next: collision detection, ohh nooo

Recap

The previous episode was a diversion (and left an open problem that I only solved after writing it), so the actual state of the game is unchanged.

Star Anise walking around a moon environment in-game, animated in all four directions

Where should I actually go from here? Collision detection is an obvious place, but that’s hard. Let’s start with something a little easier: displaying scrolling dialogue text. This is likely to be a dialogue-heavy game, so I might as well get started on that now.

Planning

On any other platform, I’d dive right into it: draw a box on the screen somewhere, fill it with text.

On the Game Boy, it’s not quite that simple. I can’t just write text to the screen; I can only place tiles and sprites.

Let’s look at how, say, Pokémon Yellow handles its menu.

Pokémon Yellow with several levels of menu open

This looks — feels — like it’s being drawn on top of the map, and that sub-menus open on top of other menus. But it’s all an illusion! There’s no “on top” here. This is a completely flat image made up of tiles, like anything else.

The same screenshot, scaled up, with a grid showing the edges of tiles

This is why Pokémon has such a conspicuously blocky font: all the glyphs are drawn to fit in a single 8×8 char, so “drawing” text is as simple as mapping letters to char indexes and drawing them onto the background. The map and the menu are all on the same layer, and the game simply redraws whatever was underneath when you close something. Part of the illusion is that the game is clever enough to hide any sprites that would overlap the menu — because sprites would draw on top! (The Game Boy Color has some twiddles for controlling this layering, but Yellow was originally designed for the monochrome Game Boy.)

A critical reason that this actually works is that in Pokémon, the camera is always aligned to the grid. It scrolls smoothly while you’re walking, but you can’t actually open the menu (or pick up an item, or talk to someone, or do anything else that might show text) until you’ve stopped moving. If you could, the menu would be misaligned, because it’s part of the same grid as the map!

This poses a slight problem for my game. Star Anise isn’t locked to the grid like the Pokémon protagonist is, and unlike Link’s Awakening, I do want to have areas larger than the screen that can scroll around freely.

I know offhand that there are a couple ways to do this. One is the window, an optional extra opaque layer that draws on top of the background, with its top-left corner anchored to any point on the screen. Another is to change some display registers in the middle of the screen redrawing. If you’re thinking of any games with a status bar at the bottom or right, chances are they use the window; games with a status bar at the top have to use display register tricks.

But I don’t want to worry about any of this right now, before I even have text drawing. I know it’s possible, so I’ll deal with it later. For now, drawing directly onto the background is good enough.

Font decisions

Let’s get back to the font itself. I’m not in love with the 8×8 aesthetic; what are my other options? I do like the text in Oracle of Ages, so let’s have a look at that:

Oracle of Ages, also scaled up with a grid, showing its taller text

Ah, this is the same approach again, except that letters are now allowed to peek up into the char above. So these are 8×16, but the letters all occupy a box that’s more like 6×9, offering much more familiar proportions. Oracle of Ages is designed for the Game Boy Color, which has twice as much char storage space, so it makes sense that they’d take advantage of it for text like this.

It’s not bad, but the space it affords is still fairly… limited. Only 16 letters will fit in a line, just as with Pokémon, and that means a lot of carefully wording things to be short and use mostly short words as well. That’s not gonna cut it for the amount of dialogue I expect to have.

(You may be wondering, as I did, how Oracle pulled off this grid-aligned textbox. In small buildings and the overworld, each room is exactly the size of the screen, so there’s no scrolling and no worry about misaligned text. But how does the game handle showing text inside a dungeon, where a room is bigger than the screen and can scroll freely? The answer is: it doesn’t! The textbox is just placed as close as possible to the position shown in this screenshot, so the edges might be misaligned by up to 4 pixels. In 20 years, I never noticed this until I thought to check how they were handling it. I’m sure there’s a lesson, here.)

What other options do I have? It seems like I’m limited to multiples of 8 here, surely. (The answer may be obvious to some of you, but shh, don’t read ahead.)

The answer lies in the very last game released for the Game Boy Color: Harry Potter and the Chamber of Secrets. Whatever deep secrets were learned during the Game Boy’s lifetime will surely be encapsulated within this, er, movie tie-in game.

Harry Potter and the Chamber of Secrets, also scaled up with a grid, showing its text isn't fixed to the grid

Hot damn. That is a ton of text in a relatively small amount of space! And it doesn’t fit the grid! How did they do that?

The answer is… exactly how you’d think!

Tile display for the above screenshot, showing that the text is simply written across consecutive tiles

With a fixed-width font like in Pokémon and Zelda games, the entire character set is stored in VRAM, and text is drawn by drawing a string of characters. With a variable-width font like in Harry Potter, a block of VRAM is reserved for text, and text is drawn into those chars, in software. Essentially, some chars are used like a canvas and have text rendered to them on the fly. The contents of the background layer might look like this in the two cases:

Illustration of fixed width versus variable width text

Some pros of this approach:

  • Since the number of chars required is constant and the font is never loaded directly into char memory, the font can have arbitrarily many glyphs in it. Multiple fonts could be used at the same time, even. (Of course, if you have more than 256 glyphs, you’ll have to come up with a multi-byte encoding for actually storing the text…)

  • A lot more text can fit in one line while still remaining readable.

  • It has the potential to look very cool. I definitely want to squeeze every last drop of fancy-pants graphical stuff that I can from this hardware.

And, cons:

  • It’s definitely more complicated! But I only have to write the code once, and since the game won’t be doing anything but drawing dialogue while the box is up, I don’t think I’ll be in danger of blowing my CPU budget.

  • Colored text becomes a bit trickier. But still possible, so, we can worry about that later.

  • Fixed text that doesn’t scroll, like on menus and whatnot, will be something of a problem — this whole idea relies on amortizing the text rendering across multiple frames. On the other hand, this game shouldn’t have too much of that, and this sounds like a good excuse to hand-draw fixed text (which can then be much more visually interesting). At worst, I could just render the fixed text ahead of time.

Well, I’m sold. Let’s give it a shot.

First pass

Well, I want to do something on a button press, so, let’s do that.

A lot of games (older ones especially) have bugs from switching “modes” in the same frame that something else happens. I don’t entirely understand why that’s so common and should probably ask some speedrunners, but I should be fine if I do mode-switching first thing in the frame, and then start over a new frame when switching back to “world” mode. Right? Sure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; ... button reading code in main loop ...
    bit BUTTON_A, a
    jp nz, .do_show_dialogue

    ; ... main loop ...

    ; Loop again when done
    jp vblank_loop

.do_show_dialogue:
    call show_dialogue
    jp vblank_loop

The extra level of indirection added by .do_show_dialogue is just so the dialogue code itself isn’t responsible for knowing where the main loop point is; it can just ret.

Now to actually do something. This is a first pass, so I want to do as little as possible. I’ll definitely need a palette for drawing the text — and here I’m cutting into my 8-palette budget again, which I don’t love, but I can figure that out later. (Maybe with some shenanigans involving changing the palettes mid-redraw, even.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
PALETTE_TEXT:
    ; Black background, white text...  then gray shadow, maybe?
    dcolor $000000
    dcolor $ffffff
    dcolor $999999
    dcolor $666666

show_dialogue:
    ; Have to disable the LCD to do video work.  Later I can do
    ; a less jarring transition
    DisableLCD

    ; Copy the palette into slot 7 for now
    ld a, %10111000
    ld [rBCPS], a
    ld hl, PALETTE_TEXT
    REPT 8
    ld a, [hl+]
    ld [rBCPD], a
    ENDR

I also know ahead of time what chars will need to go where on the screen, so I can fill them in now.

Note that I really ought to blank them all out, especially since they may still contain text from some previous dialogue, but I don’t do that yet.

An obvious question is: which tiles? I think I said before that with 512 chars available, and ¾ of those still being enough to cover the entire screen in unique chars, I’m okay with dedicating a quarter of my space to UI stuff, including text. To keep that stuff “out of the way”, I’ll put them at the “end” — bank 1, starting from $80.

I’m thinking of having characters be about the same proportions as in the Oracle games. Those games use 5 rows of tiles, like this:

1
2
3
4
5
top of line 1
bottom of line 1
top of line 2
bottom of line 2
blank

Since the font is aligned to the bottom and only peeks a little bit into the top char, the very top row is mostly blank, and that serves as a top margin. The bottom row is explicitly blank for a bottom margin that’s nearly the same size. The space at the top of line 2 then works as line spacing.

I’m not fixed to the grid, so I can control line spacing a little more explicitly. But I’ll get to that later and do something really simple for now, where $ff is a blank tile:

1
2
3
4
5
6
7
8
9
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|80|82|84|86|88|8a|8c|8e|90|92|94|96|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|81|83|85|87|89|8b|8d|8f|91|93|95|97|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+

This gives me a canvas for drawing a single line of text. The staggering means that the first letter will draw to adjacent chars $80 and $81, rather than distant cousins like $80 and $a0.

You may notice that the below code updates chars across the entire width of the grid, not merely the screen. There’s not really any good reason for that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
    ; Fill text rows with tiles (blank border, custom tiles)
    ; The screen has 144/8 = 18 rows, so skip the first 14 rows
    ld hl, $9800 + 32 * 14
    ; Top row, all tile 255
    ld a, 255
    ld c, 32
.loop1:
    ld [hl+], a
    dec c
    jr nz, .loop1

    ; Text row 1: 255 on the edges, then middle goes 128, 130, ...
    ld a, 255
    ld [hl+], a
    ld a, 128
    ld c, 30
.loop2:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop2
    ld a, 255
    ld [hl+], a

    ; Text row 2: same as above, but middle is 129, 131, ...
    ld a, 255
    ld [hl+], a
    ld a, 129
    ld c, 30
.loop3:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop3
    ld a, 255
    ld [hl+], a

    ; Bottom row, all tile 255
    ld a, 255
    ld c, 32
.loop4:
    ld [hl+], a
    dec c
    jr nz, .loop4

Now I need to repeat all of that, but in bank 1, to specify the char bank (1) and palette (7) for the corresponding tiles. Those are the same for the entire dialogue box, though, so this part is easier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    ; Switch to VRAM bank 1
    ld a, 1
    ldh [rVBK], a

    ld a, %00001111  ; bank 1, palette 7
    ld hl, $9800 + 32 * 14
    ld c, 32 * 4  ; 4 rows
.loop5:
    ld [hl+], a
    dec c
    jr nz, .loop5

    EnableLCD

Time to get some real work done. Which raises the question: how do I actually do this?

If you recall, each 8-pixel row of a char is stored in two bytes. The two-bit palette index for each pixel is split across the corresponding bit in each byte. If the leftmost pixel is palette index 01, then bit 7 in the first byte will be 0, and bit 7 in the second byte will be 1.

Now, a blank char is all zeroes. To write a (left-aligned) glyph into a blank char, all I need to do is… well, I could overwrite it, but I could just as well OR it. To write a second glyph into the unused space, all I need to do is shift it right by the width of the space used so far, and OR it on top. The unusual split layout of the palette data is actually handy here, because it means the size of the shift matches the number of pixels, and I don’t have to worry about overflow.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
0 0 0 0 0 0 0 0  <- blank glyph

1 1 1 1 0 0 0 0  <- some byte from the first glyph
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 0 0 0  <- ORed together to display first character

          1 1 1 1 0 0 0 0  <- some byte from the second glyph,
                              shifted by 4 (plus a kerning pixel)
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 1 1 1  <- ORed together to display first two characters

The obvious question is, well, what happens to the bits from the second character that didn’t fit? I’ll worry about that a bit later.

Oh, and finally, I’ll need a font, plus some text to display. This is still just a proof of concept, so I’ll add in a couple glyphs by hand.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
; somewhere in ROM
font:
; A
    ; First byte indicates the width of the glyph, which I need
    ; to know because the width varies!
    db 6
    dw `00000000
    dw `00000000
    dw `01110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11111000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
; B
    db 6
    dw `00000000
    dw `00000000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000

text:
    ; Shakespeare it ain't.
    ; Need to end with a NUL here so I know where the text
    ; ends.  This isn't C, there's no automatic termination!
    db "ABABAAA", 0

And here we go!

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
    ; ----------------------------------------------------------
    ; Setup done!  Real work begins here
    ; b: x-offset within current tile
    ; de: text cursor + current character tiles
    ; hl: current VRAM tile being drawn into
    ld b, 0
    ld de, text
    ld hl, $8800

    ; This loop waits for the next vblank, then draws a letter.
    ; Text thus displays at ~60 characters per second.
.next_letter:
    ; This is probably way more LCD disabling than is strictly
    ; necessary, but I don't want to worry about it yet
    EnableLCD
    call wait_for_vblank
    DisableLCD

    ld a, [de]                  ; get current character
    and a                       ; if NUL, we're done!
    jr z, .done
    inc de                      ; otherwise, increment

    ; Get the glyph from the font, which means computing
    ; font + 33 * a.
    ; A little register juggling.  hl points to the current
    ; char in VRAM being drawn to, but I can only do a 16-bit
    ; add into hl.  de I don't need until the next loop,
    ; since I already read from it.  So I'm going to push de
    ; AND hl, compute the glyph address in hl, put it in de,
    ; then restore hl.
    push de
    push hl
    ; The text is written in ASCII, but the glyphs start at 0
    sub a, 65
    ld hl, font
    ld de, 33                   ; 1 width byte + 16 * 2 tiles
    ; This could probably be faster with long multiplication
    and a
.letter_stride:
    jr z, .skip_letter_stride
    add hl, de
    dec a
    jr .letter_stride
.skip_letter_stride:
    ; Move the glyph address into de, and restore hl
    ld d, h
    ld e, l
    pop hl

    ; Read the first byte, which is the character width.  This
    ; overwrites the character, but I have the glyph address,
    ; so I don't need it any more
    ld a, [de]
    inc de

    ; Copy into current chars
    ; Part 1: Copy the left part into the current chars
    push af                     ; stash width
    ; A glyph is two chars or 32 bytes, so row_copy 32 times
    ld c, 32
    ; b is the next x position we're free to write to.
    ; Incrementing it here makes the inner loop simpler, since
    ; it can't be zero.  But it also means two jumps per loop,
    ; so, ultimately this was a pretty silly idea.
    inc b
.row_copy:
    ld a, [de]                  ; read next row of character

    ; Shift right by b places with an inner loop
    push bc                     ; preserve b while shifting
    dec b
.shift:                         ; shift right by b bits
    jr z, .done_shift
    srl a
    dec b
    jr .shift
.done_shift:
    pop bc

    ; Write the updated byte to VRAM
    or a, [hl]                  ; OR with current tile
    ld [hl+], a
    inc de
    dec c
    jr nz, .row_copy
    pop af                      ; restore width

    ; Part 2: Copy whatever's left into the next char
    ; TODO  :)

    ; Cleanup for next iteration
    ; Undo the b increment from way above
    dec b
    ; It's possible I overflowed into the next column, in which
    ; case I want to leave hl where it is: pointing at the next
    ; column.  Otherwise, I need to back it up to where it was.
    ; Of course, I also need to update b, the x offset.
    add a, b                    ; a <- new x offset
    ; If the new x offset is 8 or more, that's actually the next
    ; column
    cp a, 8
    jr nc, .wrap_to_next_tile
    ld bc, -32                  ; a < 8: back hl up
    add hl, bc
    jr .done_wrap
.wrap_to_next_tile:
    sub a, 8                    ; a >= 8: subtract tile width
    ld b, a
.done_wrap:
    ; Either way, store the new x offset into b
    ld b, a

    ; And loop!
    pop de                      ; pop text pointer
    jr .next_letter

.done:
    ; Undo any goofy stuff I did, and get outta here
    EnableLCD
    ; Remember to reset bank to 0!
    xor a
    ldh [rVBK], a
    ret

Phew! That was a lot, but hopefully it wasn’t too bad. I hit a few minor stumbling blocks, but as I recall, most of them were of the “I get the conditions backwards every single time I use cp augh” flavor. (In fact, if you look at the actual commit the above is based on, you may notice that I had the condition at the very end mixed up! It’s a miracle it managed to print part of the second letter at all.)

There are a lot of caveats in this first pass, including that there’s nothing to erase the dialogue box and reshow the map underneath it. (But I might end up using the window for this anyway, so there’s no need for that.)

As a proof of concept, though, it’s a great start!

Screenshot of Anise, with a black dialogue box that says: A|

That’s the letter A, followed by the first two pixels of the letter B. I didn’t implement the part where letters spill into the next column, yet.

Guess I’d better do that!

Second pass

One of the big problems with the first pass was that I had to turn the screen off to do the actual work safely. Shifting a bunch of bytes by some amount is a little slow, since I can only shift one bit at a time and have to do it within a loop, and vblank only lasts for about 6.5% of the entire duration of the frame. If I continued like this, the screen would constantly flicker on and off every time I drew a new letter. Yikes.

I’ll solve this the same way I solve pretty much any other vblank problem: do the actual work into a buffer, then just copy that buffer during vblank. Since I intend to draw no more than one character per frame, and each character glyph is no wider than a single char column, I only need a buffer big enough to span two columns. Text covers two rows, also, so that’s four tiles total.

I also need to zero out the tile buffer when I first start drawing text — otherwise it may still have garbage left over from the last time text was displayed! — and this seems like a great opportunity to introduce a little fill function. Maybe then I’ll do the right damn thing and clear out other stuff on startup.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
; Utility code section

; fill c bytes starting at hl with a
; NOTE: c must not be zero
fill:
    ld [hl+], a
    dec c
    jr nz, fill
    ret

; ...

; Stick this at a fixed nice address for now, just so it's easy
; for me to look at and debug
SECTION "Text buffer", WRAM0[$C200]
text_buffer:
    ; Text is up to 8x16 but may span two columns, so carve out
    ; enough space for four tiles
    ds $40

show_dialogue:
    DisableLCD
    ; ... setup stuff ...
    EnableLCD

    ; Zero out the tile buffer
    xor a
    ld hl, text_buffer
    ld c, $40
    call fill

That first round of disabling and enabling the LCD is still necessary, because the setup work takes a little time, but I can get rid of that later too. For now, the priority is fixing the text scroll (and supporting text that spans more than one tile).

The code is the same up until I start copying the glyph into the tiles. Now it doesn’t go to VRAM, but into the buffer.

There’s another change here, too. Previously, I shifted the glyph right, letting bits fall off the right end and disappear. But the bits that drop off the end are exactly the bits that I need to draw to the next char. I could do a left shift to retrieve them, but I had a different idea: rotate the glyph instead.

Say I want to draw a glyph offset by 3 pixels. Then I want to do this:

1
2
3
4
5
6
7
8
abcdefgh  <- original glyph bits
fghabcde  <- rotate right 3
00011111  <- mask, which is just $ff shifted right 3

000abcde  <- rotated glyph AND mask gives the left part

11100000  <- mask, inverted
fgh00000  <- rotated glyph AND inverted mask gives the right part

The time and code savings aren’t huge, exactly, and nothing else is going on while text is rendering so it’s not like time is at a premium here. But hey this feels clever so let’s do it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
    ; Copy into current chars
    push af                     ; stash width
    ld c, 32                    ; 32 bytes per row
    ld hl, text_buffer          ; new!
    ; This is still silly.
    inc b
.row_copy:
    ld a, [de]                  ; read next row of character
    ; Rotate right by b - 1 pixels -- remember, b contains the
    ; x-offset within the current tile where to start drawing
    push bc                     ; preserve b while shifting
    ld c, $ff                   ; initialize the mask
    dec b
    jr z, .skip_rotate
.rotate:
    ; Rotate the glyph (a), but shift the mask (c), so that the
    ; left end of the mask fills up with zeroes
    rrca
    srl c
    dec b
    jr nz, .rotate
.skip_rotate:
    push af                     ; preserve glyph
    and a, c                    ; mask right pixels
    ; Draw to left half of text buffer
    or a, [hl]                  ; OR with current tile
    ld [hl+], a
    ; Write the remaining bits to right half
    ld a, c                     ; put mask in a...
    cpl                         ; ...to invert it
    ld c, a                     ; then put it back
    pop af                      ; restore unmasked glyph
    and a, c                    ; mask left pixels
    ld [hl+], a                 ; and store them!
    ; Clean up after myself, and loop to the next row
    inc de                      ; next row of glyph
    pop bc                      ; restore counter!
    dec c
    jr nz, .row_copy
    pop af                      ; restore width

The use of the stack is a little confusing (and don’t worry, it only gets worse in later posts). Note for example that c is used as the loop counter, but since I don’t actually need its value within the body of the loop, I can push it right at the beginning and use c to hold the mask, then pop the loop counter back into place at the end.

This is where I first started to feel register pressure, especially when addresses eat up two of them. My options are pretty limited: I can store stuff on the stack, or store stuff in RAM. The stack is arguably harder to follow (and easier to fuck up, which I’ve done several times), but either way there’s the register ambiguity.

Which is shorter/faster? Well:

  • A push/pop pair takes 2 bytes and 7 cycles.

  • Immediate writing to RAM and immediate reading back from it takes 6 bytes and 8 cycles, and can only be done with a, so I’d probably have to copy into and out of some other register too.

  • Putting an address in hl, writing to it, then reading from it takes 5 bytes and 7 cycles, but requires that I can preserve hl. (On the other hand, if I can preserve the value of hl across a loop or something, then it’s amortized away and the read/write is only 2 bytes and 3 cycles. But if that’s the case, chances are that I’m not under enough register pressure to need using RAM in the first place.)

  • Parts of high RAM ($ff80 and up) are available for program use, and they can be read or written with the same instructions that operate on the control knobs starting at $ff00. A high RAM read and write takes 4 bytes and 6 cycles, which isn’t too bad, but once again I have to go through the a register so I’ll probably need some other copies.

Stack it is, then.

Anyway! Where were we. I need to now copy the buffer into VRAM.

You may have noticed that the buffer isn’t quite populated in char format. Instead, it’s populated like one big 16-pixel char, with the first 16 bits corresponding to the 16 pixels spanning both columns. VRAM, of course, expects to get all the pixels from the first column, then all the pixels from the second column. If that’s not clear, here’s what I have (where the bits are in order from left to right, top to bottom):

1
2
3
AAAAAAAA BBBBBBBB  <- high bits for first row of pixels
aaaaaaaa bbbbbbbb  <- low bits for first row of pixels
... other rows ...

And here’s what I need to put in VRAM:

1
2
3
4
5
6
AAAAAAAA  <- high bits for first row of left column of pixels
aaaaaaaa  <- low bits for first row of left column of pixels
... other rows of left column ...
BBBBBBBB  <- high bits for first row of right column of pixels
bbbbbbbb  <- low bits for first row of right column of pixels
... other rows of right column ...

I hope that makes sense! To fix this, I use two loops (one for each column), and in each loop I copy every other byte into VRAM. That deinterlaces the buffer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
    ; Draw the buffered tiles to vram
    ; The text buffer is treated like it's 16 pixels wide, but
    ; VRAM is of course only 8 pixels wide, so we need to do
    ; this in two iterations: the left two tiles, then the right
    pop hl                      ; restore hl (VRAM)
    push af                     ; stash width, again
    call wait_for_vblank        ; always wait before drawing
    push bc
    push de
    ; Draw the left two tiles
    ld c, $20
    ld de, text_buffer
.draw_left:
    ld a, [de]
    ; This double inc fixes the interlacing
    inc de
    inc de
    ld [hl+], a
    dec c
    jr nz, .draw_left
    ; Draw the right two tiles
    ld c, $20
    ; This time, start from the SECOND byte, which will grab
    ; all the bytes skipped by the previous loop
    ld de, text_buffer + 1
.draw_right:
    ld a, [de]
    inc de
    inc de
    ld [hl+], a
    dec c
    jr nz, .draw_right
    pop de
    pop bc
    pop af                      ; restore width, again

Just about done! There’s one last thing to do before looping to the next character. If this character did in fact span both columns, then the buffer needs to be moved to the left by one column. Here’s a simplified diagram, pretending chars are 5×5 and I just drew a B:

1
2
3
4
5
6
7
+-----+-----+.....+
| A  B|B    |     .
|A A B| B   |     .
|AAA B|B    |     .
|A A B| B   |     .
|A A B|B    |     .
+-----+-----+.....+

The left column is completely full, so I don’t need to buffer it any more. The next character wants to draw in the last partially full column, which here is the one containing the B; it’ll also want an empty right column to overflow into if necessary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
    ; Increment the pixel offset and deal with overflow
    add a, b                    ; a <- new x offset
    ; Regardless of whether this glyph overflowed, the VRAM
    ; pointer was left at the beginning of the next (empty)
    ; column, and it needs rewinding to the right column
    ld bc, -32                  ; move the VRAM pointer back...
    add hl, bc                  ; ...to the start of the char
    cp a, 8
    jr nc, .wrap_to_next_char
    ; The new offset is less than 8, so this character didn't
    ; actually draw anything in the right column.  Move the
    ; VRAM pointer back a second time, to the left column,
    ; which still has space left
    add hl, bc
    jr .done_wrap
.wrap_to_next_char:
    ; The new offset is 8 or more, so this character drew into
    ; the next char.  Subtract 8, but also shift the text buffer
    ; by copying all the "right" chars over the "left" chars
    sub a, 8                    ; a >= 8: subtract char width
    push hl
    push af
    ; The easy way to do this is to walk backwards through the
    ; buffer.  This leaves garbage in the right column, but
    ; that's okay -- it gets overwritten in the next loop,
    ; before the buffer is copied into VRAM.
    ld hl, text_buffer + $40 - 1
    ld c, $20
.shift_buffer:
    ld a, [hl-]
    ld [hl-], a
    dec c
    jr nz, .shift_buffer
    pop af
    pop hl
.done_wrap:
    ld b, a                     ; either way, store into b

    ; Loop
    pop de                      ; pop text pointer
    jp .next_letter

And the test run:

Screenshot of Anise, with a black dialogue box that says: ABABAAA

Hey hey, success!

Quick diversion: Anise corruption

I didn’t mention it above because I didn’t actually use it yet, but while doing that second pass, I split the button-polling code out into its own function, read_input. I thought I might need it in dialogue as well (which has its own vblank loop and thus needs to do its own polling), but I didn’t get that far yet, so it’s still only called from the main loop.

While testing out the dialogue, I notice a teeny tiny problem.

A screenshot similar to the above, but with some mild graphical corruption on Anise

Well, yes, obviously there’s the problem of the textbox drawing underneath the player. Which is mostly a problem because the textbox doesn’t go away, ever. I’ll worry about that later.

The other problem is that Anise’s sprite is corrupt. Again. Argh!

A little investigation suggests that, once again, I’m blowing my vblank budget. But this time, it’s a little more reasonable. Remember, I’m overwriting Anise’s sprite after handling movement. That means I do a bunch of logic followed by writing to char data. No wonder there’s a problem. I must’ve just slightly overrun vblank when I split out read_input (or checked for the dialogue button press in the first place?), since call has a teeny tiny bit of overhead.

That approach is a little inconsistent, as well. Remember how I handle OAM: I write to a buffer, which is then copied to real OAM during the next vblank. But I’m updating the sprite immediately. That means when Anise turns, the sprite updates on the very next frame, but the movement isn’t visible until the frame after that. Whoops.

So, a buffer! I could make this into a more general mechanism later, but for now I only care about fixing Anise. I can revisit this when I have, uh, a second sprite.

1
2
3
4
; in ram somewhere

anise_sprites_address:
    dw

Now, Anise is composed of three objects, which is six chars, which is 96 bytes. The fastest way to copy bytes by hand is something like this:

1
2
3
4
5
6
7
8
9
    ld hl, source
    ld de, destination
    ld c, 96
.loop:
    ld a, [hl+]
    ld [de], a
    inc de
    dec c
    jr nz, .loop

Each iteration of the loop copies 1 byte and takes 7 cycles. (It’s possible to shave a couple cycles off in some specific cases, and unrolling would save some time, but let’s stay general for now.) That’s 672 cycles, plus 10 for the setup, minus one on the final jr, for 681 total. But vblank only lasts 1140 cycles! That’s more than half the budget blown for updating a single entity. This can’t possibly work.

Enter a feature exclusive to the Game Boy Color: GDMA, or general DMA. This is similar to OAM DMA, except that it can copy (nearly) anything to anywhere. Also (unlike OAM DMA), the CPU pauses while the copy is taking place, so there’s no need to carefully time a busy loop. It’s configured by writing to five control registers (which takes 5 cycles each), and then it copies two bytes per cycle, for a total of 73 cycles. That’s 9.3 times faster. Seems worth a try.

(Note that I’m not using double-speed CPU mode yet, as an incentive to not blow my CPU budget early on. Turning that on would halve the time taken by the manual loop, but wouldn’t affect GDMA.)

GDMA has a couple restrictions: most notably, it can only copy multiples of 16 bytes, and only to/from addresses that are aligned to 16 bytes. But each char is 16 bytes, so that works out just fine.

The five GDMA registers are, alas, simply named 1 through 5. The first two are the source address; the next two are the destination address; the last is the amount to copy. Or, well, it’s the amount to copy, divided by 16, minus 1. (The high bit is reserved for turning on a different kind of DMA that operates a bit at a time during hblanks.) Writing to the last register triggers the copy.

Plugging in this buffer is easy enough, then:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
    ; Update Anise's current sprite.  Use DMA here because...
    ; well, geez, it's too slow otherwise.
    ld hl, anise_sprites_address
    ld a, [hl+]
    ld [rHDMA1], a
    ld a, [hl]
    ld [rHDMA2], a
    ; I want to write to $8000 which is where Anise's sprite is
    ; hardcoded to live, and the top three bits are ignored so
    ; that the destination is always in VRAM, so $0000 works too
    ld a, HIGH($0000)
    ld [rHDMA3], a
    ld a, LOW($0000)
    ld [rHDMA4], a
    ; And copy!
    ld a, (32 * 3) / 16 - 1
    ld [rHDMA5], a

Finally, instead of actually overwriting Anise’s sprite, I write the address of the new sprite into the buffer:

1
2
3
4
5
    ; Store the new sprite address, to be updated during vblank
    ld a, h
    ld [anise_sprites_address], a
    ld a, l
    ld [anise_sprites_address + 1], a

And done! Now I can walk around just fine. It looks basically like the screenshot from the previous section, so I don’t think you need a new one.

Note that this copy will always happen, since there’s no condition for skipping it when there’s nothing to do. That’s fine for now; later I’ll turn this into a list, and after copying everything I’ll simply clear the list.

Crisis averted, or at least deferred until later. Back to the dialogue!

Interlude: A font

Writing out the glyphs by hand is not going to cut it. It was fairly annoying for two letters, let alone an entire alphabet.

Nothing about this part was especially interesting. I used LÖVE’s font format, which puts all glyphs in a single horizontal strip. The color of the top-left pixel is used as a sentinel; any pixel in the top row that’s the same color indicates the start of a new glyph.

(I note that LÖVE actually recommends against using this format, but the alternatives are more complicated and require platform-specific software — whereas I can slop this format together in any image editor without much trouble.)

I then turned this into Game Boy tiles much the same way as with the sprite loader, except with the extra logic to split on the sentinel pixels and pad each glyph to eight pixels wide. I won’t reproduce the whole script here, but it’s on GitHub if you want to see it.

The font itself is, well, a font? I initially tried to give it a little personality, but that made some of the characters weirdly wide and was a bit hard to read, so I revisited it and ended up with this:

Pixel font covering all of ASCII

I like it, at least! The characters all have shadows built right in, and you can see at the end that I was starting to play with some non-ASCII characters. Because I can do that!

Third pass

One major obstacle remains: I can only have one line of text right now, when there’s plenty of space for two.

The obvious first thing I need to do is alter the dialogue box’s char map. It currently has a whole char’s worth of padding on every side. What a waste. I want this instead:

1
2
3
4
5
6
7
8
9
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|80|82|84|86|88|8a|8c|8e|90|92|94|96|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|81|83|85|87|89|8b|8d|8f|91|93|95|97|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|a8|aa|ac|ae|b0|b2|b4|b6|b8|ba|bc|be|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+
|a9|ab|ad|af|b1|b3|b5|b7|b9|bb|bd|bf|...|
+--+--+--+--+--+--+--+--+--+--+--+--+---+

The second row begins with char $a8 because that’s $80 + 40.

Obviously I’ll need to change the setup code to make the above pattern. But while I’m in here… remember, the setup code is the only remaining place that disables the LCD to do its work. Can I do everything within vblank instead?

I’m actually not sure, but there’s an easy way to reduce the CPU cost. Instead of setting up the whole dialogue box at once, I can do it one row at a time, starting from the bottom. That will cut the vblank pressure by a factor of four, and it’ll create a cool slide-up effect when the dialogue box opens!

Let’s give it a try. I’ll move the real code into a function, since it’ll run multiple times now. I’ll also introduce a few constants, since I’m getting tired of all the magic numbers everywhere.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
SCREEN_WIDTH_TILES EQU 20
CANVAS_WIDTH_TILES EQU 32
SCREEN_HEIGHT_TILES EQU 18
CANVAS_HEIGHT_TILES EQU 32
BYTES_PER_TILE EQU 16
TEXT_START_TILE_1 EQU 128
TEXT_START_TILE_2 EQU TEXT_START_TILE_1 + SCREEN_WIDTH_TILES * 2

; Fill a row in the tilemap in a way that's helpful to dialogue.
; hl: where to start filling
; b: tile to start with
fill_tilemap_row:
    ; Populate bank 0, the tile proper
    xor a
    ldh [rVBK], a

    ld c, SCREEN_WIDTH_TILES
    ld a, b
.loop0:
    ld [hl+], a
    ; Each successive tile in a row increases by 2!
    add a, 2
    dec c
    jr nz, .loop0

    ; Populate bank 1, the bank and palette
    ld a, 1
    ldh [rVBK], a
    ld a, %00001111  ; bank 1, palette 7
    ld c, SCREEN_WIDTH_TILES
    dec hl
.loop1:
    ld [hl-], a
    dec c
    jr nz, .loop1

    ret

Now replace the setup code with four calls to this function, waiting for vblank between successive calls.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
    ; Row 4
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 1)
    ld b, TEXT_START_TILE_2 + 1
    call fill_tilemap_row

    ; Row 3
    call wait_for_vblank
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 2)
    ld b, TEXT_START_TILE_2
    call fill_tilemap_row

    ; Row 2
    call wait_for_vblank
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 3)
    ld b, TEXT_START_TILE_1 + 1
    call fill_tilemap_row

    ; Row 1
    call wait_for_vblank
    ld hl, $9800 + CANVAS_WIDTH_TILES * (SCREEN_HEIGHT_TILES - 4)
    ld b, TEXT_START_TILE_1
    call fill_tilemap_row

Cool. I have a full font now, too, so I might as well try it out with some more interesting text.

1
2
3
SECTION "Font", ROMX
text:
    db "The quick brown fox jumps over the     lazy dog's back.  AOOWWRRR!!!!", 0

Now I just need to— oh, hang on.

Animation of the text box sliding up and scrolling out the text

Hey, it already works! Magic.

(I did also change the initial value for the x-offset to 4 rather than 0, so the text doesn’t start against the left edge of the screen.)

Well. Not really. The code I wrote doesn’t actually know when to stop writing, so it continues off the end of the first line and onto the second. You may notice the conspicuous number of extra spaces in the new text.

Still, it looks right, and this was a lot of effort already, and it’s not actually plugged into anything yet, so I called this a success and shelved it for now. Quit while you’re ahead, right?

Future work

Obviously this is still a bit rough.

That thing where the player can walk on top of the textbox is a bit of a problem, since the same thing happens if the textbox opens while the player is near the bottom of the screen. There are a couple solutions to this, and they’ll really depend on how I end up deciding to display the box.

I actually wanted the glyphs to be drawn a little lower than normal on the top line, to add half a char or so of padding around them, but I tried it and got a buffer overrun that I didn’t feel like investigating. That’s an obvious thing to fix next time I touch this code.

What about word wrapping? I’ve written about that before and clearly have strong opinions about it, but I really don’t want to do dynamic word wrapping with a variable-width font on a Game Boy. Instead, I’ll probably store dialogue in some other format and use another converter script to do the word-wrapping ahead of time. That’ll also save me from writing large amounts of dialogue in, um, assembly. And if/when I want any fancy-pants special effects within dialogue, I can describe them with a human-readable format and then convert that to more assembly-friendly bytecode instructions.

The dialogue box still doesn’t go away, partly because it draws right on top of the map, and I don’t have any easy way to repair the map right now. I’ll probably switch to one of those other mechanisms for showing the box later that won’t require clobbering the map, and then this problem will pretty much solve itself.

What about menus? Those will either have to go inside the dialogue box (which means the question being asked isn’t visible, oof), or they’ll have to go in a smaller box above it like in Pokémon. But the latter solution means I can’t use the window or display trickery — both of those only work reliably for horizontal splits. I’m not quite sure how to handle this, yet.

And then, what of portraits? Most games get away without them by having a silent protagonist, which makes it obvious who’s talking. But Anise is anything but silent, so I need a stronger indicator. I obviously can’t overlay a big transparent portrait on the background, like I do in my LÖVE games. I think I can reseve space for them in the status bar, which will go underneath the dialogue box. I’ll have to see how it works out. Maybe I could also use a different text color for every speaker?

After all that, I can start worrying about other frills like colored text and pauses and whatever. Phew.

To be continued

That brings us up to commit a173db, which is slightly beyond the second release (which includes a one-line textbox)! Also that was three months ago oh dear. I think I’ll be putting out a new release soon, stay tuned!

Next time: collision detection! I am doomed.

Cheezball Rising: Opening a dialogue

Post Syndicated from Eevee original https://eev.ee/blog/2018/09/08/cheezball-rising-opening-a-dialogue/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I draw some text!

Previously: I get a Game Boy to meow.
Next: collision detection, ohh nooo

Recap

The previous episode was a diversion (and left an open problem that I only solved after writing it), so the actual state of the game is unchanged.

Star Anise walking around a moon environment in-game, animated in all four directions

Where should I actually go from here? Collision detection is an obvious place, but that’s hard. Let’s start with something a little easier: displaying scrolling dialogue text. This is likely to be a dialogue-heavy game, so I might as well get started on that now.

Planning

On any other platform, I’d dive right into it: draw a box on the screen somewhere, fill it with text.

On the Game Boy, it’s not quite that simple. I can’t just write text to the screen; I can only place tiles and sprites.

Let’s look at how, say, Pokémon Yellow handles its menu.

Pokémon Yellow with several levels of menu open

This looks — feels — like it’s being drawn on top of the map, and that sub-menus open on top of other menus. But it’s all an illusion! There’s no “on top” here. This is a completely flat image made up of tiles, like anything else.

The same screenshot, scaled up, with a grid showing the edges of tiles

This is why Pokémon has such a conspicuously blocky font: all the glyphs are drawn to fit in a single 8×8 char, so “drawing” text is as simple as mapping letters to char indexes and drawing them onto the background. The map and the menu are all on the same layer, and the game simply redraws whatever was underneath when you close something. Part of the illusion is that the game is clever enough to hide any sprites that would overlap the menu — because sprites would draw on top! (The Game Boy Color has some twiddles for controlling this layering, but Yellow was originally designed for the monochrome Game Boy.)

A critical reason that this actually works is that in Pokémon, the camera is always aligned to the grid. It scrolls smoothly while you’re walking, but you can’t actually open the menu (or pick up an item, or talk to someone, or do anything else that might show text) until you’ve stopped moving. If you could, the menu would be misaligned, because it’s part of the same grid as the map!

This poses a slight problem for my game. Star Anise isn’t locked to the grid like the Pokémon protagonist is, and unlike Link’s Awakening, I do want to have areas larger than the screen that can scroll around freely.

I know offhand that there are a couple ways to do this. One is the window, an optional extra opaque layer that draws on top of the background, with its top-left corner anchored to any point on the screen. Another is to change some display registers in the middle of the screen redrawing. The Oracle games combine both features to have a status bar at the top of the screen but a scrolling map underneath.

But I don’t want to worry about any of this right now, before I even have text drawing. I know it’s possible, so I’ll deal with it later. For now, drawing directly onto the background is good enough.

Font decisions

Let’s get back to the font itself. I’m not in love with the 8×8 aesthetic; what are my other options? I do like the text in Oracle of Ages, so let’s have a look at that:

Oracle of Ages, also scaled up with a grid, showing its taller text

Ah, this is the same approach again, except that letters are now allowed to peek up into the char above. So these are 8×16, but the letters all occupy a box that’s more like 6×9, offering much more familiar proportions. Oracle of Ages is designed for the Game Boy Color, which has twice as much char storage space, so it makes sense that they’d take advantage of it for text like this.

It’s not bad, but the space it affords is still fairly… limited. Only 16 letters will fit in a line, just as with Pokémon, and that means a lot of carefully wording things to be short and use mostly short words as well. That’s not gonna cut it for the amount of dialogue I expect to have.

What other options do I have? It seems like I’m limited to multiples of 8 here, surely. (The answer may be obvious to some of you, but shh, don’t read ahead.)

The answer lies in the very last game released for the Game Boy Color: Harry Potter and the Chamber of Secrets. Whatever deep secrets were learned during the Game Boy’s lifetime will surely be encapsulated within this, er, movie tie-in game.

Harry Potter and the Chamber of Secrets, also scaled up with a grid, showing its text isn't fixed to the grid

Hot damn. That is a ton of text in a relatively small amount of space! And it doesn’t fit the grid! How did they do that?

The answer is… exactly how you’d think!

Tile display for the above screenshot, showing that the text is simply written across consecutive tiles

With a fixed-width font like in Pokémon and Zelda games, the entire character set is stored in VRAM, and text is drawn by drawing a string of characters. With a variable-width font like in Harry Potter, a block of VRAM is reserved for text, and text is drawn into those chars, in software. Essentially, some chars are used like a canvas and have text rendered to them on the fly. The contents of the background layer might look like this in the two cases:

Illustration of fixed width versus variable width text

Some pros of this approach:

  • Since the number of chars required is constant and the font is never loaded directly into char memory, the font can have arbitrarily many glyphs in it. Multiple fonts could be used at the same time, even. (Of course, if you have more than 256 glyphs, you’ll have to come up with a multi-byte encoding for actually storing the text…)

  • A lot more text can fit in one line while still remaining readable.

  • It has the potential to look extremely cool and maybe even vaguely technically impressive.

And, cons:

  • It’s definitely more complicated! But I only have to write the code once, and since the game won’t be doing anything but drawing dialogue while the box is up, I don’t think I’ll be in danger of blowing my CPU budget.

  • Colored text becomes a bit trickier. But still possible, so, we can worry about that later.

Well, I’m sold. Let’s give it a shot.

First pass

Well, I want to do something on a button press, so, let’s do that.

A lot of games (older ones especially) have bugs from switching “modes” in the same frame that something else happens. I don’t entirely understand why that’s so common and should probably ask some speedrunners, but I should be fine if I do mode-switching first thing in the frame, and then start over a new frame when switching back to “world” mode. Right? Sure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; ... button reading code in main loop ...
    bit BUTTON_A, a
    jp nz, .do_show_dialogue

    ; ... main loop ...

    ; Loop again when done
    jp vblank_loop

.do_show_dialogue:
    call show_dialogue
    jp vblank_loop

The extra level of indirection added by .do_show_dialogue is just so the dialogue code itself isn’t responsible for knowing where the main loop point is; it can just ret.

Now to actually do something. This is a first pass, so I want to do as little as possible. I’ll definitely need a palette for drawing the text — and here I’m cutting into my 8-palette budget again, which I don’t love, but I can figure that out later. (Maybe with some shenanigans involving changing the palettes mid-redraw, even.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
PALETTE_TEXT:
    ; Black background, white text...  then gray shadow, maybe?
    dcolor $000000
    dcolor $ffffff
    dcolor $999999
    dcolor $666666

show_dialogue:
    ; Have to disable the LCD to do video work.  Later I can do
    ; a less jarring transition
    DisableLCD

    ; Copy the palette into slot 7 for now
    ld a, %10111000
    ld [rBCPS], a
    ld hl, PALETTE_TEXT
    REPT 8
    ld a, [hl+]
    ld [rBCPD], a
    ENDR

I also know ahead of time what chars will need to go where on the screen, so I can fill them in now.

Note that I really ought to blank them all out, especially since they may still contain text from some previous dialogue, but I don’t do that yet.

An obvious question is: which tiles? I think I said before that with 512 chars available, and ¾ of those still being enough to cover the entire screen in unique chars, I’m okay with dedicating a quarter of my space to UI stuff, including text. To keep that stuff “out of the way”, I’ll put them at the “end” — bank 1, starting from $80.

I’m thinking of having characters be about the same proportions as in the Oracle games. Those games use 5 rows of tiles, like this:

1
2
3
4
5
top of line 1
bottom of line 1
top of line 2
bottom of line 2
blank

Since the font is aligned to the bottom and only peeks a little bit into the top char, the very top row is mostly blank, and that serves as a top margin. The bottom row is explicitly blank for a bottom margin that’s nearly the same size. The space at the top of line 2 then works as line spacing.

I’m not fixed to the grid, so I can control line spacing a little more explicitly. But I’ll get to that later and do something really simple for now, where $ff is a blank tile:

1
2
3
4
5
6
7
8
9
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|80|82|84|86|88|8a|8c|8e|90|92|94|96|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|81|83|85|87|89|8b|8d|8f|91|93|95|97|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+
|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|ff|...|
+--+--+--+--+--+--+--+--+--+--+--+--+--+---+

This gives me a canvas for drawing a single line of text. The staggering means that the first letter will draw to adjacent chars $80 and $81, rather than distant cousins like $80 and $a0.

You may notice that the below code updates chars across the entire width of the grid, not merely the screen. There’s not really any good reason for that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
    ; Fill text rows with tiles (blank border, custom tiles)
    ; The screen has 144/8 = 18 rows, so skip the first 14 rows
    ld hl, $9800 + 32 * 14
    ; Top row, all tile 255
    ld a, 255
    ld c, 32
.loop1:
    ld [hl+], a
    dec c
    jr nz, .loop1

    ; Text row 1: 255 on the edges, then middle goes 128, 130, ...
    ld a, 255
    ld [hl+], a
    ld a, 128
    ld c, 30
.loop2:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop2
    ld a, 255
    ld [hl+], a

    ; Text row 2: same as above, but middle is 129, 131, ...
    ld a, 255
    ld [hl+], a
    ld a, 129
    ld c, 30
.loop3:
    ld [hl+], a
    add a, 2
    dec c
    jr nz, .loop3
    ld a, 255
    ld [hl+], a

    ; Bottom row, all tile 255
    ld a, 255
    ld c, 32
.loop4:
    ld [hl+], a
    dec c
    jr nz, .loop4

Now I need to repeat all of that, but in bank 1, to specify the char bank (1) and palette (7) for the corresponding tiles. Those are the same for the entire dialogue box, though, so this part is easier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    ; Switch to VRAM bank 1
    ld a, 1
    ldh [rVBK], a

    ld a, %00001111  ; bank 1, palette 7
    ld hl, $9800 + 32 * 14
    ld c, 32 * 4  ; 4 rows
.loop5:
    ld [hl+], a
    dec c
    jr nz, .loop5

    EnableLCD

Time to get some real work done. Which raises the question: how do I actually do this?

If you recall, each 8-pixel row of a char is stored in two bytes. The two-bit palette index for each pixel is split across the corresponding bit in each byte. If the leftmost pixel is palette index 01, then bit 7 in the first byte will be 0, and bit 7 in the second byte will be 1.

Now, a blank char is all zeroes. To write a (left-aligned) glyph into a blank char, all I need to do is… well, I could overwrite it, but I could just as well OR it. To write a second glyph into the unused space, all I need to do is shift it right by the width of the space used so far, and OR it on top. The unusual split layout of the palette data is actually handy here, because it means the size of the shift matches the number of pixels, and I don’t have to worry about overflow.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
0 0 0 0 0 0 0 0  <- blank glyph

1 1 1 1 0 0 0 0  <- some byte from the first glyph
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 0 0 0  <- ORed together to display first character

          1 1 1 1 0 0 0 0  <- some byte from the second glyph,
                              shifted by 4 (plus a kerning pixel)
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
1 1 1 1 0 1 1 1  <- ORed together to display first two characters

The obvious question is, well, what happens to the bits from the second character that didn’t fit? I’ll worry about that a bit later.

Oh, and finally, I’ll need a font, plus some text to display. This is still just a proof of concept, so I’ll add in a couple glyphs by hand.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
; somewhere in ROM
font:
; A
    ; First byte indicates the width of the glyph, which I need
    ; to know because the width varies!
    db 6
    dw `00000000
    dw `00000000
    dw `01110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11111000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
; B
    db 6
    dw `00000000
    dw `00000000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `10001000
    dw `10001000
    dw `10001000
    dw `11110000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000

text:
    ; Shakespeare it ain't.
    ; Need to end with a NUL here so I know where the text
    ; ends.  This isn't C, there's no automatic termination!
    db "ABABAAA", 0

And here we go!

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
    ; ----------------------------------------------------------
    ; Setup done!  Real work begins here
    ; b: x-offset within current tile
    ; de: text cursor + current character tiles
    ; hl: current VRAM tile being drawn into
    ld b, 0
    ld de, text
    ld hl, $8800

    ; This loop waits for the next vblank, then draws a letter.
    ; Text thus displays at ~60 characters per second.
.next_letter:
    ; This is probably way more LCD disabling than is strictly
    ; necessary, but I don't want to worry about it yet
    EnableLCD
    call wait_for_vblank
    DisableLCD

    ld a, [de]                  ; get current character
    and a                       ; if NUL, we're done!
    jr z, .done
    inc de                      ; otherwise, increment

    ; Get the glyph from the font, which means computing
    ; font + 33 * a.
    ; A little register juggling.  hl points to the current
    ; char in VRAM being drawn to, but I can only do a 16-bit
    ; add into hl.  de I don't need until the next loop,
    ; since I already read from it.  So I'm going to push de
    ; AND hl, compute the glyph address in hl, put it in de,
    ; then restore hl.
    push de
    push hl
    ; The text is written in ASCII, but the glyphs start at 0
    sub a, 65
    ld hl, font
    ld de, 33                   ; 1 width byte + 16 * 2 tiles
    ; This could probably be faster with long multiplication
    and a
.letter_stride:
    jr z, .skip_letter_stride
    add hl, de
    dec a
    jr .letter_stride
.skip_letter_stride:
    ; Move the glyph address into de, and restore hl
    ld d, h
    ld e, l
    pop hl

    ; Read the first byte, which is the character width.  This
    ; overwrites the character, but I have the glyph address,
    ; so I don't need it any more
    ld a, [de]
    inc de

    ; Copy into current chars
    ; Part 1: Copy the left part into the current chars
    push af                     ; stash width
    ; A glyph is two chars or 32 bytes, so row_copy 32 times
    ld c, 32
    ; b is the next x position we're free to write to.
    ; Incrementing it here makes the inner loop simpler, since
    ; it can't be zero.  But it also means two jumps per loop,
    ; so, ultimately this was a pretty silly idea.
    inc b
.row_copy:
    ld a, [de]                  ; read next row of character

    ; Shift right by b places with an inner loop
    push bc                     ; preserve b while shifting
    dec b
.shift:                         ; shift right by b bits
    jr z, .done_shift
    srl a
    dec b
    jr .shift
.done_shift:
    pop bc

    ; Write the updated byte to VRAM
    or a, [hl]                  ; OR with current tile
    ld [hl+], a
    inc de
    dec c
    jr nz, .row_copy
    pop af                      ; restore width

    ; Part 2: Copy whatever's left into the next char
    ; TODO  :)

    ; Cleanup for next iteration
    ; Undo the b increment from way above
    dec b
    ; It's possible I overflowed into the next column, in which
    ; case I want to leave hl where it is: pointing at the next
    ; column.  Otherwise, I need to back it up to where it was.
    ; Of course, I also need to update b, the x offset.
    add a, b                    ; a <- new x offset
    ; If the new x offset is 8 or more, that's actually the next
    ; column
    cp a, 8
    jr nc, .wrap_to_next_tile
    ld bc, -32                  ; a < 8: back hl up
    add hl, bc
    jr .done_wrap
.wrap_to_next_tile:
    sub a, 8                    ; a >= 8: subtract tile width
    ld b, a
.done_wrap:
    ; Either way, store the new x offset into b
    ld b, a

    ; And loop!
    pop de                      ; pop text pointer
    jr .next_letter

.done:
    ; Undo any goofy stuff I did, and get outta here
    EnableLCD
    ; Remember to reset bank to 0!
    xor a
    ldh [rVBK], a
    ret

Phew! That was a lot, but hopefully it wasn’t too bad. I hit a few minor stumbling blocks, but as I recall, most of them were of the “I get the conditions backwards every single time I use cp augh” flavor. (In fact, if you look at the actual commit the above is based on, you may notice that I had the condition at the very end mixed up! It’s a miracle it managed to print part of the second letter at all.)

There are a lot of caveats in this first pass, including that there’s nothing to erase the dialogue box and reshow the map underneath it. (But I might end up using the window for this anyway, so there’s no need for that.)

As a proof of concept, though, it’s a great start!

Screenshot of Anise, with a black dialogue box that says: A|

That’s the letter A, followed by the first two pixel of the letter B. I didn’t implement the part where letters spill into the next column, yet.

Guess I’d better do that!

Second pass

One of the big problems with the first pass was that I had to turn the screen off to do the actual work safely. Shifting a bunch of bytes by some amount is a little slow, since I can only shift one bit at a time and have to do it within a loop, and vblank only lasts for about 6.5% of the entire duration of the frame.

SECTION “Text buffer”, WRAM0[$C200]
text_buffer:
; Text is up to 8×16 but may span two columns, so carve out
; enough space for four tiles
ds $40

SECTION “Text rendering”, ROM0
PALETTE_TEXT:
dcolor $000000
dcolor $ffffff
dcolor $999999
dcolor $666666

show_dialogue:
; TODO blank out the second half of bank 1 before all this, maybe on the fly to average out the cpu time

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
; TODO get rid of this with a slide-up effect
DisableLCD

; Set up palette
ld a, %10111000
ld [rBCPS], a
ld hl, PALETTE_TEXT
REPT 8
ld a, [hl+]
ld [rBCPD], a
ENDR

; Fill text rows with tiles (blank border, custom tiles)
ld hl, $9800 + 32 * 14
; Top row, all tile 255
ld a, 255
ld c, 32

.loop1:
ld [hl+], a
dec c
jr nz, .loop1
; Text row 1: 255 on the edges, then middle goes 128, 130, …
ld a, 255
ld [hl+], a
ld a, 128
ld c, 30
.loop2:
ld [hl+], a
add a, 2
dec c
jr nz, .loop2
ld a, 255
ld [hl+], a
; Text row 2: same as above, but middle is 129, 131, …
ld a, 255
ld [hl+], a
ld a, 129
ld c, 30
.loop3:
ld [hl+], a
add a, 2
dec c
jr nz, .loop3
ld a, 255
ld [hl+], a
; Bottom row, all tile 255
ld a, 255
ld c, 32
.loop4:
ld [hl+], a
dec c
jr nz, .loop4

1
2
3
4
5
6
7
; Repeat all of the above, but in bank 1, which specifies the character bank and palette.  Luckily, that's the same for everyone.
ld a, 1
ldh [rVBK], a
ld a, %00001111  ; bank 1, palette 7
ld hl, $9800 + 32 * 14
; Top row, all tile 255
ld c, 32 * 4  ; 4 rows

.loop5:
ld [hl+], a
dec c
jr nz, .loop5

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
EnableLCD

; Zero out the tile buffer
xor a
ld hl, text_buffer
ld c, $40
call fill

; ----------------------------------------------------------
; Setup done!  Real work begins here
; b: x-offset within current tile
; de: text cursor + current character tiles
; hl: current VRAM tile being drawn into + buffer pointer
ld b, 0
ld de, text
ld hl, $8800

; The basic problem here is to shift a byte and split it
; across two other bytes, like so:
;      yyyyy YYY
;   xxx00000 00000000
;           ↓
;   xxxyyyyy YYY00000
; To do this, we rotate the byte, mask the low bits, OR them
; with the first byte, restore it, mask the high bits, and
; then store that directly as the second byte (which should
; be all zeroes anyway).

.next_letter:
ld a, [de] ; get current character
and a ; if NUL, we’re done!
jp z, .done
inc de ; otherwise, increment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
; Get the font character
push de                     ; from here, de is tiles
; Alas, I can only add to hl, so I need to compute the font
; character address in hl and /then/ put it in de.  But I
; already pushed de, so I can use that as scratch space.
push hl
sub a, 65   ; TODO temporary
ld hl, font
ld de, 33                   ; 1 width byte + 16 * 2 tiles
; TODO can we speed striding up with long mult?
and a

.letter_stride:
jr z, .skip_letter_stride
add hl, de
dec a
jr .letter_stride
.skip_letter_stride:
ld d, h ; move char tile addr to de
ld e, l

1
2
3
4
5
6
7
8
ld a, [de]                  ; read width
inc de

; Copy into current tiles
push af                     ; stash width
ld c, 32                    ; 32 bytes per row
ld hl, text_buffer
inc b   ; FIXME? this makes the loop simpler since i only test after the dec, but it also is the 1px kerning between characters...

.row_copy:
ld a, [de] ; read next row of character
; Rotate right by b – 1 pixels
push bc ; preserve b while shifting
ld c, $ff ; create a mask
dec b
jr z, .skip_rotate
.rotate:
rrca
srl c
dec b
jr nz, .rotate
.skip_rotate:
push af
and a, c ; mask right pixels
; Draw to left half of text buffer
or a, [hl] ; OR with current tile
ld [hl+], a
; Write the remaining bits to right half
ld a, c ; put mask in a…
cpl ; …to invert it
ld c, a ; then put it back
pop af ; restore unmasked pixels
and a, c ; mask left pixels
ld [hl+], a ; and store them!
; Loop and cleanup
inc de ; next row of character
pop bc ; restore counter!
dec c
jr nz, .row_copy
pop af ; restore width

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
; Draw the buffered tiles to vram
; The text buffer is treated like it's 16 pixels wide, but
; VRAM is of course only 8 pixels wide, so we need to do
; this in two iterations: the left two tiles, then the right
; TODO explain this with a fucking diagram because i feel
; like i'm wrong about it anyway
pop hl                      ; restore hl (VRAM)
push af                     ; stash width, again
call wait_for_vblank        ; always wait before drawing
push bc
push de
; Draw the left two tiles
ld c, $20
ld de, text_buffer

.draw_left:
ld a, [de]
inc de
inc de
ld [hl+], a
dec c
jr nz, .draw_left
; Draw the right two tiles
ld c, $20
ld de, text_buffer + 1
.draw_right:
ld a, [de]
inc de
inc de
ld [hl+], a
dec c
jr nz, .draw_right
pop de
pop bc
pop af ; restore width, again

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
; Increment the pixel offset and deal with overflow
; TODO it's possible we're at 9 pixels wide, thanks to the
; kerning pixel, uh oh.  but that pixel would be empty,
; right?  wait, no, it comes /before/...  well fuck
; TODO actually that might make something weird happen due
; to the inc b above, maybe...?
add a, b                    ; a <- new x offset
ld bc, -32                  ; move the VRAM pointer back...
add hl, bc                  ; ...to the start of the tile
cp a, 8
jr nc, .wrap_to_next_tile
; The new offset is less than 8, so this character didn't
; draw into the next tile.  Move the VRAM pointer back
; another two tiles, to the column we started in
add hl, bc
jr .done_wrap

.wrap_to_next_tile:
; The new offset is 8 or more, so this character drew into
; the next tile. Subtract 8, but also shift the text buffer
; by copying all the “right” tiles over the “left” tiles
sub a, 8 ; a >= 8: subtract tile width
push hl
push af
ld hl, text_buffer + $40 – 1
ld c, $20
.shift_buffer:
ld a, [hl-]
ld [hl-], a
dec c
jr nz, .shift_buffer
pop af
pop hl
.done_wrap:
ld b, a ; either way, store into b

1
2
3
; Loop
pop de                      ; pop text pointer
jp .next_letter

.done:
EnableLCD ; TODO get rid of me with a buffer
; Remember to reset bank to 0!
xor a
ldh [rVBK], a ret

wait_for_vblank:
xor a ; clear the vblank flag
ld [vblank_flag], a
.vblank_loop:
halt ; wait for interrupt
ld a, [vblank_flag] ; was it a vblank interrupt?
and a
jr z, .vblank_loop ; if not, keep waiting ret

  • future ideas: how will this work with a status bar, how do i do portraits, how do i hide sprites behind this, how do i handle the map not being aligned (contrast with pokemon which draws the entire menu on the background)

lingering problems
– note on word wrapping

  • alignment, window
  • prompts will probably have to go inside the text box? hmm. that’s tricky.
  • portraits!

content/2016-10-20-word-wrapping-dialogue.markdown
– the dialogue box does not actually go away. but i think the window will solve this

To be continued

This work doesn’t correspond to a commit at all; it exists only as a local stash. I’ll clean it up later, once I figure out what to actually do with it.

Next time: dialogue! With moderately less suffering along the way!

Cheezball Rising: Resounding failure

Post Syndicated from Eevee original https://eev.ee/blog/2018/09/06/cheezball-rising-resounding-failure/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I cannot get a goddamn Game Boy to meow at me.

Previously: maps and sprites.
Next: text!

Recap

With the power of Aseprite, Tiled, and some Python I slopped together, the game has evolved beyond Test Art and into Regular Art.

Star Anise walking around a moon environment in-game, animated in all four directions

I’ve got so much work to do on this, so it’s time to prioritize. What is absolutely crucial to this game?

The answer, of course, is to make Anise meow. Specifically, to make him AOOOWR.

Brief audio primer

What we perceive as sound is the vibration of our eardrums, caused by vibration of the air against them. Eardrums can only move along a single axis (in or out), so no matter what chaotic things the air is doing, what we hear at a given instant is flattened down to a single scalar number: how far the eardrum has displaced from its normal position.

(There’s also a bunch of stuff about tiny hairs in the back of your ear, but, close enough. Also it’s really two numbers since you have two ears, but stereo channels tend to be handled separately.)

Digital audio is nothing more than a sequence of those numbers. Of course, we can’t record the displacement at every single instant, because there are infinitely many instants; instead, we take measurements (samples) at regular intervals. The interval is called the sample rate, is usually a very small fraction of a second, and is generally measured in Hertz/Hz (which just means “per second”). A very common sample rate is 44100 Hz, which means a measurement was taken every 0.0000227 seconds.

I say “measurement” but the same idea applies for generating sounds, which is what the Game Boy does. Want to make a square wave? Just generate a block of all the same positive sample, then another block of all the same negative sample, and alternate back and forth. That’s why it’s depicted as a square — that’s the graph of how the samples vary over time.

Okay! I hope that was enough because it’s like 80% of everything I know about audio. Let’s get to the Game Boy.

Game Boy audio

The Game Boy contains, within its mysterious depths, a teeny tiny synthesizer. It offers a vast array of four whole channels (instruments) to choose from: a square wave, also a square wave, a wavetable, and white noise. They can each be controlled with a handful of registers, and will continually produce whatever tone they’re configured for. By changing their parameters at regular intervals, you can create a pleasing sequence of varying tones, which you humans call “music”.

Making music is, I’m sure, going to be an absolute nightmare. What music authoring tools am I possibly going to dig up that exactly conform to the Game Boy hardware? I can’t even begin to imagine what this pipeline might look like.

Luckily, that’s not what this post is about, because I chickened out and tried something way easier instead.

Before I set out into the wilderness myself, I did want to get an emulator to create any kind of noise at all, just to give myself a starting point. There are an awful lot of audio twiddles, so I dug up a Game Boy sound tutorial.

I became a little skeptical when the author admitted they didn’t know what a square wave was, but they did provide a brief snippet of code at the end that’s claimed to produce a sound:

1
2
3
4
5
6
7
8
9
NR52_REG = 0x80;
NR51_REG = 0x11;
NR50_REG = 0x77;

NR10_REG = 0x1E;
NR11_REG = 0x10;
NR12_REG = 0xF3;
NR13_REG = 0x00;
NR14_REG = 0x87;

That’s C, written for the much-maligned GBDK, which for some reason uses regular assignment to write to a specific address? It’s easy enough to translate to rgbasm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
    ; Enable sound globally
    ld a, $80
    ldh [rAUDENA], a
    ; Enable channel 1 in stereo
    ld a, $11
    ldh [rAUDTERM], a
    ; Set volume
    ld a, $77
    ldh [rAUDVOL], a

    ; Configure channel 1.  See below
    ld a, $1e
    ldh [rAUD1SWEEP], a
    ld a, $10
    ldh [rAUD1LEN], a
    ld a, $f3
    ldh [rAUD1ENV], a
    ld a, $00
    ldh [rAUD1LOW], a
    ld a, $85
    ldh [rAUD1HIGH], a

It sounds like this.

Some explanation may be in order. This is a big ol’ mess and you could just as well read the wiki’s article on the sound controller, so feel free to skip ahead a bit.

First, the official names for all of the sound registers are terrible. They’re all named “NRxy” — “noise register” perhaps? — where x is the channel number (or 5 for master settings) and y is just whatever. Thankfully, hardware.inc provides some aliases that make a little more sense, and those are what I’ve used above.

The very first thing I have to do is set the high bit of AUDENA (NR52), which toggles sound on or off entirely. The sound system isn’t like the LCD, which I might turn off temporarily while doing a lot of graphics loading; when the high bit of AUDENA is off, all the other sound registers are wiped to zero and cannot be written until sound is enabled again.

The other important master registers are AUDVOL (NR50) and AUDTERM (NR51). Both of them are split into two identical nybbles, each controlling the left or right output channel. AUDVOL controls the master volume, from 0 to 7. (As I understand it, the high bit is used to enable audio output from extra synthesizer hardware on the cartridge, a feature I don’t believe any game ever actually used.) AUDTERM enables channels/instruments, one bit per channel. The above code turns on channel 1, the square wave, at max volume in stereo.

Then there’s just, you know, sound stuff.

AUD1HIGH (NR14) and AUD1LOW (NR13) are a bit of a clusterfuck, and one shared by all except the white noise channel. The high bit of AUD1HIGH is the “init” bit and triggers the sound to actually play (or restart), which is why it’s set last. The second highest bit, bit 6, controls timing: if it’s set, then the channel will only play for as long as a time given by AUD1LEN; if not, the channel will play indefinitely.

Finally, the interesting part: the lower three bits of AUD1HIGH and the entirety of AUD1LOW combine to make an 11-bit frequency. Or, rather, if those 11 bits are \(n\), then the frequency is \(\frac{131072}{2048-n}\). (Since their value appears in the denominator, they really express… inverse time, not frequency, but that’s neither here nor there.) The code above sets that 11-bit value to $500, for a frequency of 171 Hz, which in A440 is about an F3.

AUD1SWEEP (NR10) can automatically slide the frequency over time. It distinguishes channel 1 from channel 2, which is otherwise identical but doesn’t have sweep functionality. The lower three bits are the magnitude of each change; bit 3 is a sign bit (0 for up, 1 for down), and bits 6–4 are a time that control how often the frequency changes. (Setting the time to zero disables the sweep.) Given a magnitude of \(n\) and time \(t\), every \(\frac{t}{128}\) seconds, the frequency is multiplied by \(1 ± \frac{1}{2^n}\).

Note that when I say “frequency” here, I’m referring to the 11-bit “frequency” value, not the actual frequency in Hz. A “frequency” of $400 corresponds to 128 Hz, but halving it to $200 produces 85 Hz, a decrease of about a third. Doubling it is impossible, because $800 doesn’t fit in 11 bits. This setup seems, ah, interesting to make music with. Can’t wait!

The above code sets this register to $1e, so \(t = 1\), \(n = 6\), and the frequency is decreasing; thus every \(\frac{1}{128}\) seconds, the “frequency” drops by \(\frac{1}{64}\).

Next is AUD1LEN (NR11), so named because its lower six bits set how long the sound will play. Again we have inverse time: given a value \(t\) in the low six bits, the sound will play for \(\frac{64-t}{256}\) seconds. Here those six bits are &x#24;10 or 16, so the sound lasts for \(\frac{48}{256} = \frac{3}{16} = 0.1875\) seconds. Except… as mentioned above, this only applies if bit 6 of AUD1HIGH is set, which it isn’t, so this doesn’t apply at all and there’s no point in setting any of these bits. Hm.

The two high bits of AUD1LEN select the duty cycle, which is how long the square wave is high versus low. (A “normal” square wave thus has a duty of 50%.) Our value of 0 selects 12.5% high; the other values are 25% for 1, 50% for 2, or 75% for 3. I do wonder if the author of this code meant to use 50% duty and put the bit in the wrong place? If so, AUD1LEN should be $80, not $10.

Finally, AUD1ENV selects the volume envelope, which can increase or decrease over time. Curiously, the resolution is higher here than in AUDVOL — the entire high nybble is the value of the envelope. This value can be changed automatically over time in increments of 1: bit 3 controls the direction (0 to decrease, 1 to increase) and the low three bits control how often the value changes, counted in \(\frac{1}{64}\) seconds. For our value of $f3, the volume starts out at max and decreases every \(\frac{3}{64}\) seconds, so it’ll stop completely (or at least be muted?) after fifteen steps or \(\frac{45}{64} ≈ 0.7\) seconds.

And hey, that’s all more or less what I see if I record mGBA’s output in Audacity!

Waveform of the above sound

Boy! What a horrible slog. Don’t worry; that’s a good 75% of everything there is to know about the sound registers. The second square wave is exactly the same except it can’t do a frequency sweep. The white noise channel is similar, except that instead of frequency, it has a few knobs for controlling how the noise is generated. And the waveform channel is what the rest of this post is about—

Hang on!” I hear you cry. “That’s a mighty funny-looking ‘square’ wave.”

It sure is! The Game Boy has some mighty funny sound hardware. Don’t worry about it. I don’t have any explanation, anyway. I know the weird slope shapes are due to a high-pass filter capacitor that constantly degrades the signal gradually towards silence, but I don’t know why the waveform isn’t centered at zero. (Note that mGBA has a bug and currently generates audio inverted, which is hard to notice audibly but which means the above graph is upside-down.)

The thing I actually wanted to do

Right, back to the thing I actually wanted to do.

I have a sound. I want to play it on a Game Boy. I know this is possible, because Pokémon Yellow does it.

Channel 3 is a wavetable channel, which means I can define a completely arbitrary waveform (read: sound) and channel 3 will play it for me. The correct approach seems obvious: slice the sound into small chunks and ask channel 3 to play them in sequence.

How hard could this possibly be?

Channel 3

Channel 3 plays a waveform from waveform RAM, which is a block of 16 bytes in register space, from $FF30 through $FF3F. Each nybble is one sample, so I have 32 samples whose values can range from 0 to 15.

32 samples is not a whole lot; remember, a common audio rate is 44100 Hz. To keep that up, I’d need to fill the buffer almost 1400 times per second. I can use a lower sample rate, but what? I guess I’ll figure that out later.

First things first: I need to take my sound and cram it into this format, somehow. Here’s the sound I’m starting with.

The original recording was a bit quiet, so I popped it open in Audacity and stretched it to max volume. I only have 4-bit samples, remember, and trying to cram a quiet sound into a low bitrate will lose most of the detail.

(A very weird thing about sound is that samples are really just measurements of volume. Every feature of sound is nothing more than a change in volume.)

Now I need to turn this into a sequence of nybbles. From previous adventures, I know that Python has a handy wave module for reading sample data directly from a WAV file, and so I wrote a crappy resampler:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import wave

TARGET_RATE = 32768

with wave.open('aowr.wav') as w:
    nchannels, sample_width, framerate, nframes, _, _ = w.getparams()
    outdata = bytearray()
    gbdata = bytearray()

    frames_per_note = framerate // TARGET_RATE
    nybble = None
    while True:
        data = w.readframes(frames_per_note)
        if not data:
            break

        n = 0
        total = 0
        # Left and right channels are interleaved; this will pick up data from only channel 0
        for i in range(0, len(data), nchannels * sample_width):
            frame = int.from_bytes(data[i : i + sample_width], 'little', signed=True)
            n += 1
            total += frame

        # Crush the new sample to a nybble
        crushed_frame = int(total / n) >> (sample_width * 8 - 4)
        # Expand it back to the full sample size, to make a WAV simulating how it should sound
        encoded_crushed_frame = (crushed_frame << (sample_width * 8 - 4)).to_bytes(2, 'little', signed=True)
        outdata.extend(encoded_crushed_frame * (nchannels * frames_per_note))

        # Combine every two nybbles together.  The manual shows that the high nybble plays first.
        # WAV data is signed, but Game Boy nybbles are not, so add the rough midpoint of 7
        if nybble is None:
            nybble = crushed_frame + 7
        else:
            byte = (nybble << 4) | (crushed_frame + 7)
            gbdata.append(byte)
            nybble = None

    with wave.open('aowrcrush.wav', 'wb') as wout:
        wout.setparams(w.getparams())
        wout.writeframes(outdata)

with open('build/aowr.dat', 'wb') as f:
    f.write(gbdata)

This is incredibly bad. It integer-divides the original rate by the target rate, so if I try to resample 44100 to 32768, I’ll end up recreating the same sound again.

I don’t know why I started with 32768, either. The resulting data is too big to even fit in a section! Kicking it down to 8192 is a bit better (5 samples to 1, so the real final rate is 8820), but if I get any smaller, too many samples cancel each other out and I end up with silence! I have no idea what I am doing help.

The aowrcrush.wav file sounds a little atrocious, fair warning.

But it seems to be correct, if I open it alongside the original:

Waveforms of the original sound and its bitcrushed form; the latter is very blocky

Crushing it to four bits caused the graph to stay fixed to only 16 possible values, which is why it’s less smooth. Reducing the sample rate made each sample last longer, which is why it’s made up of short horizontal chunks. (I resampled it back to 44100 for this comparison, so really it’s made of short horizontal chunks because each sample appears five times; Audacity wouldn’t show an actual 8192 Hz file like this.)

It doesn’t sound great, but maybe it’ll be softened when played through a Game Boy. Worst case, I can try cleaning it up later. Let’s get to the good part: playing it!

Playing with channel 3

Here we go! First the global setup stuff I had before.

1
2
3
4
5
6
7
8
9
    ; Enable sound globally
    ld a, $80
    ldh [rAUDENA], a
    ; Map instruments to channels
    ld a, $44
    ldh [rAUDTERM], a
    ; Set volume
    ld a, $77
    ldh [rAUDVOL], a

Then some bits specific to channel 3.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ld a, $80
    ldh [rAUD3ENA], a
    ld a, $ff
    ldh [rAUD3LEN], a
    ld a, $20
    ldh [rAUD3LEVEL], a
SAMPLE_RATE EQU 8192
CH3_FREQUENCY set 2048 - 65536/(SAMPLE_RATE / 32)
    ld a, LOW(CH3_FREQUENCY)
    ldh [rAUD3LOW], a
    ld a, $80 | HIGH(CH3_FREQUENCY)
    ldh [rAUD3HIGH], a

Channel 3 has its own bit for toggling it on or off in AUD3ENA (NR30); none of the other bits are used. The other new register is AUD3LEVEL (NR32), which is sort of a global volume control. The only bits used are 6 and 5, which make a two-bit selector. The options are:

  • 00: mute
  • 01: play nybbles as given
  • 10: play nybbles shifted right 1
  • 11: play nybbles shifted right 2

Three of those are obviously useless, so 01 it is! That’s where I get the $20.

Figuring out the frequency is a little more clumsy. I used some rgbasm features here to do it for me, and it took a bit of fiddling to get it right. For example, why am I using 65536 instead of 131072, the factor I said was used for the square wave?

The answer is that for the longest time I kept getting this absolutely horrible output, recorded directly from mGBA:

I had no idea what this was supposed to be. Turns out it’s, well, roughly what happens when you halve the Game Boy’s idea of frequency. I finally found out this coefficient was different from the gbdev wiki. I’m guessing the factor of 2 has something to do with there being two nybbles per byte?

Then there’s the division by 32, which neither the manual nor the gbdev wiki mention. The frequency isn’t actually the time it takes to play one sample, but the time it takes to play the entire buffer. Which does make some sense — the “normal” use for the channel 3 is as a custom instrument, so you’d want to apply the frequency to the entire waveform to get the right notes out. This was even more of a nightmare to figure out, since it produced… well, mostly just garbage. I’ll leave it to your imagination.

1
2
3
4
    ld a, 256 - 4096 / (SAMPLE_RATE / 32)
    ldh [rTMA], a
    ld a, 4
    ldh [rTAC], a

Oho! TMA and TAC are new.

The CPU has a timer register, TIMA, which counts up every… well, every so often. It’s only a single byte, and when it overflows, it generates a timer interrupt. It then resets to the value of TMA.

TAC is the timer controller. Bit 2 enables the timer, and the lower two bits select how fast the clock counts up.

Above, I’m using clock speed 00, which is 4096 Hz. The expression for TMA computes SAMPLE_RATE / 32, which is the number of times per second that the entire waveform should play, and then divides that into 4096 to get the number of timer ticks that the waveform plays for. Subtract that from 256, and I have the value TIMA should start with to ensure that it overflows at the right intervals.

I note that this will cause a timer interrupt 256 times per second, which sounds like a lot on a CPU-constrained system. It’s only 4 or 5 interrupts per frame, though, so maybe it won’t intrude too much. I’ll burn down that bridge when I come to it.

Now I just need to enable timer interrupts:

1
2
3
4
start:
    ; Enable interrupts
    ld a, IEF_TIMER | IEF_VBLANK
    ldh [rIE], a

And of course do a call in the timer interrupt, which you may remember is a fixed place in the header:

1
2
3
SECTION "Timer overflow interrupt", ROM0[$0050]
    call update_aowr
    reti

One last gotcha: I discovered that timer interrupts can fire during OAM DMA, a time when most of the memory map is inaccessible. That’s pretty bad! So I also added di and ei around my DMA call.

Okay! I’m so close! All that’s left is the implementation of update_aowr.

Updating the waveform

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
aowr:
INCBIN "build/aowr.dat"
aowr_end:

; ...

update_aowr:
    push hl
    push bc
    push de
    push af

    ; The current play position is stored in music_offset, a
    ; word in RAM somewhere.  Load its value into de
    ld hl, music_offset
    ld d, [hl]
    inc hl
    ld e, [hl]

    ; Compare this to aowr_end.  If it's >=, we've reached the
    ; end of the sound, so stop here.  (Note that the timer
    ; interrupt will keep firing!  This code is a first pass.)
    ld hl, aowr_end
    ld a, d
    cp a, h
    jr nc, .done
    jr nz, .continue
    ld a, e
    cp a, l
    jr nc, .done
    jr z, .done
.continue:

    ; Copy the play position back into hl, and copy 16 bytes
    ; into waveform RAM.  This unrolled loop is as quick as
    ; possible, to keep the gap between chunks short.
    ld h, d
    ld l, e
_addr = _AUD3WAVERAM
    REPT 16
    ld a, [hl+]
    ldh [_addr], a
_addr = _addr + 1
    ENDR

    ; Write the new play position into music_offset
    ld d, h
    ld e, l
    ld hl, music_offset
    ld [hl], d
    inc hl
    ld [hl], e
.done:
    pop af
    pop de
    pop bc
    pop hl
    ret

Perfect! Let’s give it a try.

Hey, that’s not too bad! I can see wiring that up to a button and pressing it relentlessly. It’s a bit rough, but it’s not bad for this first attempt.

That was mGBA, though, and I’ve had surprising problems before because I was reading or writing when the actual hardware wouldn’t let me. I guess it wouldn’t hurt to try in bgb. (warning: very bad)

OH NO

What has happened.

Tragedy

A lot of fussing around, reading about obscure trivia, and being directed to SamplePlayer taught me a valuable lesson: you cannot write to waveform RAM while the wave channel is playing.

Okay. No problem. I’ll just turn it off, write to wave RAM, then turn it back on. Turning it off clears the frequency, but that’s fine, I can just write it again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; Disable channel 3 to allow writing to wave RAM
    xor a
    ldh [rAUD3ENA], a

    ; ... do the copy ...

    ld a, $80
    ldh [rAUD3ENA], a
    ld a, LOW(CH3_FREQUENCY)
    ldh [rAUD3LOW], a
    ld a, $80 | HIGH(CH3_FREQUENCY)
    ldh [rAUD3HIGH], a

Okay! Perfect! I’m so ready for a meow!!!

why god why

This is what I get in mGBA and SameBoy. Ironically, it plays fine in bgb.

It seems I have come to an impasse.

Why

After a Herculean amount of debugging and discussion with people who actually know what they’re talking about, here’s what I understand to be happening.

When the wave channel first starts playing, it doesn’t correctly read the very first nybble; instead, it uses the high nybble of whatever was already in its own internal buffer.

Disabling the wave channel sets its internal buffer to all zeroes.

I disable the wave channel every time it plays. Effectively, every 32nd sample starting with the first is treated as zero, which is the most extreme negative value, which is why the playback looks like this (bearing in mind that mGBA’s audio is currently upside-down):

The above sound's waveform, which resembles the original, but with regularly spaced spikes

For whatever reason, bgb doesn’t emulate this spiking, so it plays fine. I’m told the spiking also happens on actual hardware, but the speakers are cheap so it’s harder to notice.

SamplePlayer isn’t much help here, because it’s subject to the same problem.

A ray of hope, dashed

But wait! There’s one last thing I can try. Pokémon Yellow has freeform sounds in it, and it doesn’t have this spiking! There’s even a fan disassembly of it!

Alas. Pokémon Yellow doesn’t use channel 3 to play back sounds. It uses channel 1.

How, you ask? Remember when I said earlier that hearing is really just detecting changes in volume? Pokémon Yellow plays a constant square wave and simply toggles it on and off, very rapidly. Channel 3 is 4-bit; the sounds Pokémon Yellow plays are 1-bit, on or off. It’s baffling, but it does work.

I don’t think it’ll work for me, since that means 32 times as many interrupts. In fact, Pokémon Yellow uses a busy loop as a timer, so it effectively freezes the entire rest of the game anytime it plays a Pikachu sound. I’d rather not do that, but… I don’t seem to have a lot of options.

And so I’ve reached a dead end. The spiking seems to be a fundamental bug with the Game Boy sound hardware. I’ve found evidence that it may even still exist in the GBA, which uses a superset of the same hardware. I can’t fix it, I don’t see how to work around it, and it sounds really incredibly bad.

After days of effort trying to get this to work, I had to shelve it.

The title of this post is a sort of pun, you see, a play on words—

To be continued

This work doesn’t correspond to a commit at all; it exists only as a local stash.

Next time: dialogue! And this time it works!

Cheezball Rising: Maps and sprites

Post Syndicated from Eevee original https://eev.ee/blog/2018/07/15/cheezball-rising-maps-and-sprites/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I get a little asset pipeline working and finally have a real map.

Previously: spring cleaning.
Next: resounding failure.

Recap

The last post only covered some minor problems (including, I grant you, being totally broken), so the current state of the game is basically unchanged from before.

A space cat roams around on a grassy background

That grass pattern, the grass sprite itself, and the color scheme are all hardcoded — written directly into the source code, by hand. If this game is going to get very far at all, I urgently need a better way to inject some art.

Constraints

The Game Boy imposes some fairly harsh constraints on the artwork — which is part of the charm! But now I have to figure out how to work within those constraints most effectively. Here’s what I’ve got to work with.

Bear in mind that I intend for the game to be based around 16×16, um, tiles. Okay, it’s extremely confusing that “tile” might refer either to the base size of the artwork or to the Game Boy’s native 8×8 tiles, so I’m going to call the art tiles and the Game Boy’s basic unit a character (which is what the manual does).

  • The background layer is a grid of 8×8 characters, each of which uses one of eight 4-color background palettes.

  • The object layer is a set of 8×16 character pairs, each of which uses one of eight 3-color object palettes. These palettes are 3-color because color 0 is always transparent.

  • No more than 40 objects can appear on screen at the same time. (There is a way to weasel past this limit, but it requires considerable trickery.)

  • No more than 10 objects can appear in the same row of pixels. (I believe this is a hard limit.)

  • There are three blocks of 256 chars each. I can divide this between the background and objects more or less however I want, though neither can have more than two blocks (= 512 chars).

I’m intending for the game to be based around a 16×16 grid, a fairly common size for the Game Boy. That makes me a little concerned about the per-row object limit — each entity will need to have two Game Boy objects side by side, so I’m really limited to only five entities sharing the same row of pixels. I can’t do much about that quite yet (and only have one entity anyway), but it’s likely to affect how I design maps and draw sprites.

The next biggest problem is colors. Each object palette can only have three colors, which in practice means a shadow/outline color, a highlight color, and a base color. This is why every NPC and overworld critter in Pokémon GSC and the Zeldas is basically monochromatic. They pull it off really well by making very effective use of the highlight and shadow colors.

Since 16×16 sprites are composed of multiple Game Boy objects, it’s possible to overcome this limit by giving each part of the sprite a different palette. Unfortunately, objects being 8×16 means the sprites are split vertically, when it would be most useful to have different colors for e.g. the head and body. I wish the Game Boy supported 16×8 objects! That’d help a ton with the per-row limit, too. Alas, a few decades too late to change it now.


As for the number of chars… well, let’s see. The whole screen is only 160×144, which is 20×18 or 360 chars, so I could allocate two blocks to the background and have 512 — more than enough to cover the entire screen in unique chars! (I expect one block to be more than enough for objects, since I can only show 80 object chars at once anyway.)

On the other hand, I’ll need to reserve some of that space for text and UI and whatnot, and each 16×16 tile is composed of four chars. If I very generously allocate a whole block to window dressing (enough for all of ISO-8859-1?), that leaves 256 chars, which is 64 tiles, which is a tileset that fits in an eight-by-eight square.

For comparison’s sake, even fox flux’s relatively limited tileset is a sixteen-tile square — four times as big. This feels a little dire.

But how can it be dire, when I have enough sprite space to fill the screen and then some?

Let’s see here. A pretty good chunk of the fox flux tileset is unused or outright blank. Some of these tiles are art for moving objects that happened to fit in the grid, and those wouldn’t be in the background tileset. And while all of the tiles are distinct, a lot of the basic terrain has some significant overlap:

A set of dirt tiles from fox flux, colored to indicate where different tiles have identical corners

All of the regions of the same color are identical. These 9 distinct tiles could fit into 20 chars if they shared the common parts, rather than the 36 required to naïvely cutting each one into four dedicated chars.

(The fox flux grid is 32×32, so everything is twice as big as it will be on the Game Boy, but you get the idea.)

I’m feeling a little better about this, especially knowing I do have enough space to cover the whole screen. Worst case, I could draw the map as though it were a single bitmap. I don’t want to have to rely on that if I can get away with it, though — I suspect I’d need to constantly load chars on the fly, and copying stuff around eats into my CPU budget surprisingly quickly.

Research

That does get me wondering: what, exactly, do the Oracle games do? I haven’t done any precise measurements, but I’m pretty sure they have more than sixty-four distinct map tiles throughout their large connected worlds. Let’s have a look!

Oracle of Ages and its live tilemap, in the graveyard, showing the graveyard tileset

Here I am in the graveyard near the start of Oracle of Ages. The “creepy tree” here is distinct and doesn’t really appear anywhere else, so I found it in the tile viewer (lower right) and will be keeping an eye on it. Note that only the left half of the face is visible; the right half is using the same tiles, flipped horizontally. (The colors are different because the tile viewer shows the literal colors, whereas the game itself is being drawn with a shader.)

Let’s walk left one screen.

Oracle of Ages and its live tilemap, outside of the graveyard

Now, this is interesting. The creepy tree is still on the screen here, so its tiles are naturally still loaded. But a bunch of tiles on the left — parts of the dungeon entrance and other graveyard things — have been replaced by town tiles. I’m several screens away from the town!

The next screen up has no creepy trees, but its tiles remain. Of course, they’d have to, since the creepy tree is still visible during a transition. I have to go left from there before the tree disappears:

Oracle of Ages and its live tilemap, with tiles spelling SHOP clearly visible

Wow! At a glance, this looks like enough tiles to draw the entire town.

This is fascinating. The Oracle games have several transitions between major areas, marked by fade-outs or palette changes — the purple-tinted graveyard is an obvious example. But it looks like there are also minor transitions that update the tileset while I’m still several screens away from where those tiles are used. The screens around the transition only use common tiles like grass and regular trees, so I never notice anything is happening.

That’s cute, clever, and an easy way to make screen transitions work without having to figure out what tiles are becoming unused as they slide off the screen!

At this point I realize I may be getting ahead of myself. Screen transitions? I don’t have a map yet! Hell, I don’t even have a camera. Time to back up and make something I can build on.

Designing a tileset

I’m pretty tired of manually translating art into bits. It’s 2018, dammit. I want to use all the regular tools I would use for this, I want the Game Boy’s limitations to be expressed as simply as possible, and I want minimal friction between the source artwork and the game.

Here’s my idea. I know I only have 8 palettes to work with, so I’m decreeing that tilesets will be stored as paletted PNGs. The first four colors in the image palette will become the first Game Boy palette; the next four colors become the second Game Boy palette; and so on. If I then resize Aseprite’s palette panel to be four colors wide, I’ll have an instant view of all my available combinations of colors.

This already has some problems — for starters, if the same color appears in multiple palettes (which will almost certainly happen, for the sake of cohesion), I’m very likely to confuse the hell out of myself. I also have no idea how to extend this into multiple tilesets, but for now I’ll pretend the entire game world only uses a single tileset.

I could instead dynamically infer the palettes based on what combinations of colors are actually used, but after more than a couple tiles, it would be a nightmare for a human to keep track of what those combinations are. With this approach, all a human needs to do is color-drop a pixel from a particular tile and look at what row the color’s in.

After a quick jaunt into the pixel mines, here are some tiles.

A small set of pastel yellow moon tiles

Or, as viewed in Aseprite:

The same set of tiles, as seen in an editor, with the four-color palette visible

That’s only one palette, but hopefully you can see what I’m going for here. It’s enough to get started.

At this point, I started writing a little Python script that used Pillow to inspect the colors and pixels and dump them out to rgbasm-flavored source code. The script itself is not especially interesting: run through each 8×8 block of pixels, look at each pixel’s palette index, mod 4 to get the index within the Game Boy palette, print out as backtick literals. (I could spit out raw binary data, but I wanted to be able to inspect the intermediate form easily. Maybe later.)

The results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
SECTION "Map dumping test", ROM0
TEST_PALETTES:
    dw %0101011110111101
    dw %0101011100011110
    dw %0100101010111100
    dw %0100011001111000
    ; ... enough zeroes to make eight palettes ...
; sorry, in the script I was calling them "tiles", not "chars"
TEST_TILES:
    ; tile 0 at 0, 0
    dw `00001000
    dw `00000000
    dw `00100000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `20000000
    dw `20000002
    ; ... etc ...

And hey, I already have code that can load palettes and chars, so all I have to do is swap out the old labels for these ones.

Now I have a tileset I can load into the game, which is very exciting, except that I can’t see any of them because I still don’t have a map. I could draw a test map by hand, I suppose, but the whole point of this exercise was to avoid ever doing that again.

Drawing a map

In keeping with the “it’s 2018 dammit” approach, I elect to use Tiled for drawing the maps. I’ve used it for several LÖVE games, and while its general-purposeness makes it a little clumsy at times, it’s flexible enough to express basically anything.

I make a tileset and create a map. I choose 256×256 pixels (16×16 tiles), the same size as the Game Boy screen buffer, and fill it with arbitrary terrain. In retrospect, I probably should’ve made it the size of the screen, since I still don’t have a camera. Oh, well.

Here, I hit a minor roadblock. I want to do as much work as possible upfront, so I want to store the map in the ROM as chars, not tiles. That means I need to know what chars make up each tile, which is determined by the script that converts the image to char data. Multiple maps might use the same tileset, and a map might use multiple tilesets, so it seems like I’ll need some intermediate build assets with this information…

(In retrospect again, I realize that the game may need to know about tiles rather than just chars, since there’ll surely be at least a few map tiles that act like entities — switches and the like — and those need to function as single units. I guess I’ll work that out later.)

This is all looking like an awful lot of messing around (and a lot of potential points of failure) before I can get anything on the dang screen. I waffle for a bit, then decide to start with a single step that simultaneously dumps the tiles and the map. I can split it up when I actually have more than one of either.

You can check out the resulting script if you like, but again, I don’t think it’s particularly interesting. It enforces a few more constraints than before, and adds a TEST_MAP_1 label containing all the char data, row by row. Loading that into VRAM is almost comically simple:

1
2
3
4
5
    ; Read from the test map
    ld hl, $9800
    ld de, TEST_MAP_1
    ld bc, 1024
    call copy16

The screen buffer is 32×32 chars, or 1024 bytes. As you may suspect, copy16 is like copy, but it takes a 16-bit count in bc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
; copy bc bytes from de to hl
; NOTE: bc must not be zero
copy16:
    ld a, [de]
    inc de
    ld [hl+], a
    dec bc
    ; dec bc doesn't set flags, so gotta check by hand
    ld a, b
    or a, c
    jr nz, copy16
    ret

Hm. It’s a little harder to justify the bc = 0 case as a feature here, since that would try to overwrite every single byte in the entire address space. Don’t do that, then.

Anise, in-game, walking on the moon tiles

Now, at long long last, I have a background with some actual art! It’s starting to feel like something! I’ve even got something resembling a workflow.

My desktop, showing the moon tiles in an image editor, the map put together in Tiled, and the game running in mGBA

All in a day’s work. Good time to call it, right?

Except

I just wrote this char loading code…

And there’s still one thing still hardcoded…

I wonder if I could do something about that…?

Sprites

Above, I conspicuously did not mention how I integrated the Python script into the build system. And, well, I didn’t do that. I ran it manually and put it somewhere and committed it all as-is. You currently (still!) can’t actually build the game without repeating my steps. You can’t even just put the output in the right place, because you also have to delete some debug output from the middle of the file.

It gets worse! Here’s how.

I have some Anise walking sprites, too, drawn in Aseprite. They’re pretty cute and I’d love to have them in the game, now that I have some Real Art™ for the background.

Star Anise, walking forwards

Why not throw these at the same script and hack them into animating?

Unfortunately, this introduces a bit of manual work, as animation often does. (My kingdom for a way to embed a small simple animation in a larger spritesheet in Aseprite!) I’ve typically animated every critter in its own Aseprite file — or stacked several vertically in the same file when their animations are similar enough — and then exported as a sheet with the frames running off horizontally. You can see this at work in fox flux, e.g. on its critter sheet.

But Star Anise introduces a wrinkle that prevents even that slightly clumsy workflow from working.

You may have noticed that the walking sprite above blows the color budget considerably, using a whopping five colors. The secret is that Anise himself fits in a 16×16 square, and then his antenna is a third 8×16 sprite drawn on top. I can’t simply export him as a spritesheet, because the antenna needs to be separate, and it’s not even aligned to the grid. It doesn’t even stay in the same place consistently!

I could maybe hack something together that would automatically pull the incompatible pixels into a separate sprite. I might need to, since — spoiler alert — there are an awful lot of Lunekos in this game. For now, though, I did the dumbest thing that works and copied his frames to their own sheet by hand.

Star Anise's walking frames laid out in a spritesheet

The background is actually cyan, not transparent. I had to do this because my setup expects multiple sets of four colors — the first color in an object palette is still there, even if it’s ignored — and only one color in an indexed PNG can be transparent. (Don’t @ me about PNG pixel formats.) I could’ve adjusted it to work with sets of three colors and put the transparent one at the end so the palette column trick still worked, but… this was easier.

Here’s the best part: I took the main function from my tile loading script, copy-pasted it within the same file, and edited the copy to dump these sprites sans map. So now not only is there no build system, but half of the loading script is inaccessible! Sorry. We’re getting into experiment territory and I am going to start making a lot of messes while I figure out what I actually want.

Using these within the game was just as easy as before — replace some labels with new ones — and the only real change was to use a third OAM slot for the antenna. (The antenna has to appear first; when sprites overlap, the one with the lowest index appears on top.)

That did make updating OAM a little clumsy; you may recall that before, I loaded the x and y positions into b and c, updated them, then wrote them back into OAM:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    ; set b/c to the y/x coordinates
    ld hl, oam_buffer
    ld b, [hl]
    inc hl
    ld c, [hl]
    bit BUTTON_LEFT, a
    jr z, .skip_left
    dec c
.skip_left:
    bit BUTTON_RIGHT, a
    jr z, .skip_right
    inc c
.skip_right:
    bit BUTTON_UP, a
    jr z, .skip_up
    dec b
.skip_up:
    bit BUTTON_DOWN, a
    jr z, .skip_down
    inc b
.skip_down:
    ld [hl], c
    dec hl
    ld [hl], b
    ld a, c
    add a, 8
    ld hl, oam_buffer + 5
    ld [hl], a
    dec hl
    ld [hl], b

The above approach required that I hardcode the 8-pixel offset between the left and right halves. With the antenna in the mix, I would’ve had to hardcode another more convoluted offset, and I didn’t like the sound of that. So I changed it to inc and dec the OAM coordinates directly and immediately:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    ; Anise update loop
    ; set b/c to the y/x coordinates
    ld bc, 4
    bit BUTTON_LEFT, a
    jr z, .skip_left
    ld hl, oam_buffer + 1
    dec [hl]
    add hl, bc
    dec [hl]
    add hl, bc
    dec [hl]
.skip_left:
    ; ... etc ...

Eventually I should stop doing this and have an actual canonical x/y position for Anise somewhere. But I didn’t do that yet.

I did also take this opportunity to change my LCDC flags so that object chars start counting from zero at $9000, fixing the misunderstanding I had before. That’s nice.

Anyway, tada, Star Anise can slide around, but now with his antenna.

Not good enough.

Animating

It’s time to animate something. And this time around, all I’ve got are bytes to work with. Oh, boy!

Right out of the gate, I have two options. I could load all of Anise’s sprites into VRAM upfront and change the char numbers in OAM to animate him, or I could reserve some specific chars and overwrite them to animate him.

The first choice makes sense for an entity that might exist multiple times at once, like enemies or… virtually anything in the game world, really. But there’s only ever one player, and he’s likely to have a whole lot of spritework, which I would prefer not to have clogging up my char space for the entire duration of the game. So while I might use the other approach for most other things, I’m going to animate Anise by overwriting the actual graphics. Every frame.

First things first. I’m going to need some state, which I’ve been avoiding by relying on OAM. At the very least, I need to know which way Anise is facing — which isn’t necessarily the direction he’s moving, because he should keep his facing when he stops. I also need to know which animation frame he’s on, and how many LCD frames are left until he should advance to the next one.

Let’s refer to the time between vblanks as a “tic” for now, to avoid the ambiguity of a “frame” when talking about animation.

A good start, then, would be some constants.

1
2
3
4
5
6
FACING_DOWN   EQU 0
FACING_UP     EQU 1
FACING_RIGHT  EQU 2
FACING_LEFT   EQU 3

ANIMATION_LENGTH EQU 5

ANIMATION_LENGTH is the length of every frame. I don’t especially want to give every frame its own distinct duration if I can avoid it; this will be complicated enough as it is. I fiddled with the frame duration in Aseprite for a bit and landed on 83ms as a nice speed, and that’s 5 tics.

I also need a place for this state, so I add some more stuff to my RAM block.

1
2
3
4
5
6
anise_facing:
    db
anise_frame:
    db
anise_frame_countdown:
    db

And initialize it in setup.

1
2
3
4
    ld a, FACING_DOWN
    ld [anise_facing], a
    ld a, ANIMATION_LENGTH
    ld [anise_frame_countdown], a

Presumably, one day, I’ll have multiple entities, and they’ll all share a similar structure, which I’ll have to traverse manually. For now, it’s easier to follow the code if I give every field its own label.

I have four levels of hierarchy here: the spriteset (which for now is always Anise’s), the pose (I only have one: walking), the facing, and the frame. I need to traverse all four, but luckily I can ignore the first two for now.

I don’t want to animate Anise when he’s not moving, so I changed the OAM updating code to also ld d, 1 if there’s any movement at all, and skip over all the animation stuff if d is still zero.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; ... read input ...

    ; This was before I knew the 'or a' trick; these two ops
    ; could be replaced with 'xor a; or d'
    ld a, d
    cp a, 0
    jp z, .no_movement

    ; ... all the animation code will go here ...

.no_movement:
    ; and after this we repeat the main loop

This does have the side effect that Anise will simply freeze in mid-walk when stopped, rather than returning to his standing pose. I still haven’t fixed that; I could special-case it, but I usually treat “standing” as its own one-frame animation, so it feels like something that ought to come when I implement poses.

Next I decrement the countdown, which is the number of tics left until the frame ought to change. If this is nonzero, I don’t need to do anything.

1
2
3
4
5
6
    ld a, [anise_frame_countdown]
    dec a
    ld [anise_frame_countdown], a
    jp nz, .no_movement
    ld a, ANIMATION_LENGTH
    ld [anise_frame_countdown], a

Again, this isn’t actually right. If Anise’s state changes, such as between standing and walking, then this should be ignored because he’s switching to a new animation. But this is a pose thing again, so I’m deferring it until later.

Next I need to advance the current frame. I don’t have modulo on hand and even simple ifs are kind of annoying, so I was naughty here and used bitops to roll from frame 3 to frame 0. This would obviously not work if the number of frames were not a power of two.

1
2
3
4
    ld a, [anise_frame]
    inc a
    and a, 4 - 1
    ld [anise_frame], a

Yet again, if Anise changes direction, the frame should be reset to zero… but it ain’t.

Now, let’s think for a second. I know what frame I want. I have a label for the upper-left corner of the spritesheet, and I want to get to the upper-left corner of the appropriate frame. Each frame has 3 objects; each object has 2 chars; each char is 16 bytes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
    ld hl, ANISE_TEST_TILES
    ; Skip ahead 3 sprites * the current frame
    ld bc, 3 * 2 * 16
    ; Remember, zero iterations is also possible
    or a
    jr z, .skip_advancing_frame
.advance_frame:    
    add hl, bc
    dec a
    jr nz, .advance_frame
.skip_advancing_frame:
    ; Copy the sprites into VRAM
    ; They're consecutive in both the data and VRAM, so only
    ; one copy is necessary.  And bc is already right!
    ld d, h
    ld e, l
    ld hl, $8000
    call copy16

Hey, look at that!

Star Anise walking around in-game, now animated

Only one small problem: I forgot about facing, so Anise will always face forwards no matter how he moves. Whoops!

Facing

I need to actually track which way Anise is facing, which is a surprisingly subtle question. He might even be facing away from his own direction of movement, if for example he was thrown backwards by some external force.

A decent first approximation is to use the last button that was pressed. (That’s still not quite right — if you hold down, hold down+right, and then release right, he should obviously face down. But it’s a start.)

I don’t yet track which buttons were pressed this frame, but it’s easy enough to add. While I’m at it, I might as well track which buttons were released, too. I amend the input reading code thusly, based on the straightforward insight that a button was pressed this frame iff it is currently 1 and was previously 0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; a now contains the current buttons
    ld hl, buttons
    ld b, [hl]                  ; b <- previous buttons
    ld [hl], a                  ; a -> current buttons
    cpl
    and a, b
    ld [buttons_released], a    ; a = ~new & old, i.e. released
    ld a, [hl]                  ; a <- current buttons
    cpl
    or a, b
    cpl
    ld [buttons_pressed], a     ; a = ~(~new | old), i.e. pressed

I like that cute trick for getting the pressed buttons. I need a & ~b, but cpl only works on a, so I would’ve had to juggle a bunch of registers. But applying De Morgan’s law produces ~(~a | b), which only requires complementing a. (Full disclosure: I didn’t actually try register juggling, and for all I know it could end up shorter somehow.)

Next I check the just-pressed buttons and updating facing accordingly. It looks a lot like the code for checking the currently-held buttons, except that I only use the first button I find.

1
2
3
4
5
6
7
8
    ld hl, anise_facing
    ld a, [buttons_pressed]
    bit BUTTON_LEFT, a
    jr z, .skip_left2
    ld [hl], FACING_LEFT
    jr .skip_down2
.skip_left2:
    ; ... you get the idea ...

And finally, amend the sprite choosing code to pick the right facing, too.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
    ld hl, ANISE_TEST_TILES

    ; Skip ahead a number of /rows/, corresponding to facing
    ld a, [anise_facing]
    and a, %11                      ; cap to 4, just in case
    jr z, .skip_stride_row
    ; This is like before, but times 4 frames
    ld bc, 4 * 3 * 2 * 16
.stride_row:
    add hl, bc
    dec a
    jr nz, .stride_row
.skip_stride_row:

    ; Bumping the frame here is convenient, since it leaves the
    ; frame in a for the next part
    ld a, [anise_frame]
    inc a
    and a, 4 - 1
    ld [anise_frame], a

    ; ... continue on with picking the frame ...

Hardcoding the number of frames here is… unfortunate. I should probably flip the spritesheet so the frames go down and each column is a facing; then there’ll always be a fixed number of columns to skip over.

But who cares about that? Look at Anise go! Yeah!

Star Anise walking around in-game, now animated in all four directions

Well, yes, there is one final problem, which is that the antenna is misaligned when walking left or right… because its positioning is different than when walking up or down, and I don’t have any easy way to encode that at the moment. It’s still like that, in fact. I’m sure I’ll fix it eventually.

More vblank woes

I didn’t run into this problem until a little while later, but I might as well mention it now. The above code writes into VRAM in the middle of updating entities — updating them very simply, perhaps, but updating nonetheless. If that updating takes longer than vblank, the write will fail.

I expected this, though not quite so soon. It’s a disadvantage of swapping the char data rather than the char references: 32× more writing to do, which will take 32× longer. The solution is similar to what I do for OAM: defer the write until the next vblank. I’m already doing that with Anise’s position, anyway, and it makes no sense to have his position and animation updated on different frames.

I ended up special-casing this for Anise, though it wouldn’t be too hard to extend this into a queue of tiles to copy. It’s nothing too world-shaking; I just store the address of Anise’s current sprite in RAM, then copy it over during vblank, just after the OAM DMA.

I did try doing this with one of the Game Boy Color’s new features, general-purpose DMA, which can copy from basically anywhere in ROM or RAM to basically anywhere in VRAM. It involves five registers: you write the source address in the first two, the destination in the next two, and the length in the fifth, which triggers the copy. The CPU simply freezes until the copy is done, so there are no goofy timing issues here.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ld hl, anise_sprites_address
    ld a, [hl+]
    ld [rHDMA1], a
    ld a, [hl]
    ld [rHDMA2], a
    ld a, HIGH($0000)
    ld [rHDMA3], a
    ld a, LOW($0000)
    ld [rHDMA4], a
    ; To copy X bytes, write X / 16 - 1 to this register
    ld a, (32 * 3) / 16 - 1
    ld [rHDMA5], a

General-purpose DMA can copy 16 bytes every 8 cycles, or ½ cycle per byte. The fastest possible manual copy would be an unrolled series of ld a, [hl+]; ld [bc], a; inc bc which takes a whopping 6 cycles per byte — twelve times slower! This is a neat feature.

FYI, it’s also possible to have a copy done piecemeal during hblanks, though that sounds a bit fragile to me.

Future work

I’ve laid some very basic groundwork here, and there’s plenty more to do, which I will get back to later! It’s just me hacking all this together, after all, and I like flitting between different systems.

I will definitely need to figure out how the heck multiple tilesets work and when they get switched out. How do I even use multiple tilesets, each with its own set of palettes? What’s the workflow if I want to use the same tiles with several different palettes, like how the graveyard in Oracle of Ages is tinted purple? And I didn’t even implement character de-duplication yet… which will require some metadata for each tile… aw, geez.

And I still haven’t fixed the build system! Maybe you can understand why I’m hesitant to impose more structure on this idea quite yet.

To be continued

That brings us to commit 59ff18. Except for a commit about the build that I skipped. Whatever. This post has been a little more draining to write, perhaps because it forced me to confront and explain a bunch of hokey decisions.

Next time: resounding failure!

Cheezball Rising: Spring cleaning

Post Syndicated from Eevee original https://eev.ee/blog/2018/07/13/cheezball-rising-spring-cleaning/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I tidy up some of the gigantic mess I’ve made thusfar.

Previously: writing a main loop, and finally getting something game-like.
Next: sprite and map loading.

Recap

After only a few long, winding posts’ worth of effort, I finally have a game, if you define “game” loosely as a thing that reacts when you press buttons.

A space cat roams around on a grassy background

Beautiful. But to make an omelette, you need to break a few eggs, and if it’s your first omelette then you might break some glassware too. As tiny as this game is, a couple things could use improvement.

Also, for narrative purposes, it’s much more interesting to put all these miscellaneous fixes together, rather than interrupting other posts with them. I didn’t actually do all this work in one lump in this order. Apologies to the die-hard non-fiction crowd.

It’s totally broken

Ah, the elephant in the room. The end of the previous post aligned with the first demo build, but if you downloaded it and tried to play it, you may have seen something that looks more like this:

Similar to the previous image, but with obvious graphical corruption

I said in the beginning that I liked mGBA and would be developing against it. That’s still true — it’s open source (and I’ve actually read some of it), it’s cross-platform, and it has some debug tools built in.

I also said that emulators are primarily designed to accept correct games, not necessarily to reject incorrect games. And that’s still very true.

I discovered this problem myself a little later (after the events of the next post), while shopping around a bit for emulators explicitly focused on accuracy. The one I keep being told to use is bgb, but it’s for Windows and Wine is kind of annoying, so I was exploring my other options; I found SameBoy (primarily for Mac, but with Linux and Windows builds sans debug features) and Gambatte (cross-platform, and the core for RetroArch’s Game Boy emulation). All three of them looked like the screenshot above.

Something was going very wrong when writing to VRAM. You can’t write to VRAM while the LCD is redrawing, so the most obvious cause is that… well… maybe the LCD is redrawing during my setup code.

Remember, on an actual Game Boy, the system doesn’t immediately start running what’s on the cartridge — it scrolls in the Nintendo logo first (or on a Color, does a fancier logo with a cool fanfare). That’s done by a tiny internal program called the boot ROM, and the state of the LCD when the boot ROM hands over control is undefined. I’m sure it’s consistent, but it’s not anything in particular, and for all I know it might be when the LCD is halfway through a redraw.

(Side note: I am violating Nintendo’s game submission requirements by consistently referring to it as a “cartridge” when in fact it is properly called a Game Pak. My bad.)

So what we’re seeing above is the result of VRAM becoming locked and unlocked as the LCD draws (remember, after every row is an hblank, during which time VRAM is accessible), while I’m trying to copy blocks of data there. In fact, every emulator I’ve tried shows a slightly different form of corruption, since this problem is very sensitive to timing accuracy. Super interesting!

I could wait for vblank and try to squeeze in all my setup code there, maybe even split across several vblanks. But since this is setup code and doesn’t run during gameplay, there’s a much easier solution: turn the screen off. That’s done with a bit in the LCDC register, which I currently configure at the end of my setup code; all I need to do is move that to the beginning and clear the appropriate bit instead.

1
2
    ld a, %00010111  ; $91 plus bit 2, minus bit 7
    ld [$ff40], a

Then, of course, set it again once I’m done. I did this with a couple macros, since it’s only a few instructions and it seems like the kind of thing I might need again later.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
DisableLCD: MACRO
    ld a, [$ff40]
    and a, %0111111
    ld [$ff40], a
ENDM

EnableLCD: MACRO
    ld a, [$ff40]
    or a, %10000000
    ld [$ff40], a
ENDM

; and, of course, stick an EnableLCD at the end of setup code

Note that when the screen is off, it’s off, and there are no vblank interrupts or anything else that might be triggered by the screen’s behavior. So, you know, don’t wait for vblank while the screen’s off. When the screen turns back on, it immediately starts redrawing from the first row, so don’t try to use VRAM right away either. Finally, on the original Game Boy, do not turn off the screen when it’s not in vblank, or you might physically damage the screen. It’s fine on the Game Boy Color, but… hell, I’m gonna edit this to wait for vblank anyway. Feels kinda inappropriate to abruptly turn off the screen halfway through drawing.

Anyway, that solves my goofy corruption problems, and now the game looks the same on all of these emulators! I also reported this misbehavior, and it’s since been fixed, so recent dev builds of mGBA also correctly render garbage for the first release. See, by not targeting the most accurate emulators, I’ve caused another emulator to become more accurate!

hardware.inc

I mentioned last time that I’d adopted hardware.inc. That’s in large part because I keep producing monstrosities like the previous snippet. Here are those macros with some symbolic constants:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
DisableLCD: MACRO
    ld a, [rLCDC]
    and a, $ff & ~LCDCF_ON
    ld [rLCDC], a
ENDM

EnableLCD: MACRO
    ld a, [rLCDC]
    or a, LCDCF_ON
    ld [rLCDC], a
ENDM

A breath of fresh air!

The $ff & is necessary because the argument needs to fit in a byte, but rgbasm’s integral preprocessor type is wider than a byte. I suppose I could also use LOW() here, or maybe there’s some other more straightforward solution.

Rearranging the buttons

In the previous post, I read the button states and crammed them into a single byte. I had a choice of whether to put the dpad low or the buttons low, but it didn’t seem to matter, so I picked arbitrarily: buttons high, dpad low.

It turns out I chose wrong! Also, it turns out there’s a “wrong” here! I’ve heard two compelling reasons to do it the other way. For one, hardware.inc contains constants for the bit offsets of the buttons, and it assumes the dpad is high. Why is this arbitrary data layout decision embedded in a list of hardware constants? Possibly for the second reason: on the GBA, input is available as a single word, and the lowest byte contains bits for all the buttons on the Game Boy — in the same order, with the dpad high.

So I’m switching this around and using hardware.incs constants. Easy change.

Fixing vblank

My original approach to waiting for vblank seemed simple enough: loop until vblank_flag is set, clear it, then continue on.

I’ve made a slight oversight here: what if the main loop does take longer than a frame? Then a vblank interrupt will fire in the middle of it and harmlessly set vblank_flag. But when the loop finally finishes and goes to wait for vblank again, the flag will already be set, and it’ll continue on immediately — regardless of the state of the screen! Whoops.

Again, the fix is simple: clear the flag before beginning to wait.

And while I’m at it, I see other uses for waiting for vblank in the near future, so I may as well pull this out into a function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
; idle until next vblank
wait_for_vblank:
    xor a                       ; clear the vblank flag
    ld [vblank_flag], a
.vblank_loop:
    halt                        ; wait for interrupt
    ld a, [vblank_flag]         ; was it a vblank interrupt?
    and a
    jr z, .vblank_loop          ; if not, keep waiting
    ret

Copy function

So far, I’ve done an awful lot of runtime copying by using the preprocessor. Consider the code for copying the DMA routine into HRAM:

1
2
3
4
5
6
7
8
    ; Copy the little DMA routine into high RAM
    ld bc, dma_copy
    ld hl, $ff80
    REPT dma_copy_end - dma_copy
    ld a, [bc]
    inc bc
    ld [hl+], a
    ENDR

This will repeat the ld/inc/ld dance 13 times in the built ROM. Which is fine, except that I’m about to have places where I do much more copying, and there’s only so much space in the ROM, and this is kind of ridiculous. So I guess I will finally write a copy function.

I’m calling it copy, not memcpy. What else am I going to copy, if not memory?

Attempt number 1 looked like this:

1
2
3
4
5
6
7
8
; copy d bytes from bc to hl
copy:
    ld a, [bc]
    inc bc
    ld [hl+], a
    dec d
    jr z, copy
    ret

I was then informed that it’s more idiomatic to use de as the source address and c as the count, possibly for some reason relating to the NES or SNES? I don’t remember. I’m totally on board for using c to mean a count, though, and started doing that elsewhere.

I went to change that, and actually make use of this function, and lo! I discovered a colossal bug. That last line, jr z, copy, will loop only if d was just decremented to zero. So this function will only ever copy one byte, unless you asked to copy only one byte, in which case it copies two.

This is not the first time I’ve gotten a condition backwards. I’ll get used to it eventually, I’m sure.

Oh, one other minor problem: if you ask to copy zero bytes, you’ll actually copy 256, since the zero check only comes after the decrement. (This is a recurring annoyance, actually, and makes while loops surprisingly clumsy to express.) So far I’ve only ever needed to copy a constant amount, so this hasn’t been a problem, but… I’ll just leave a comment pretending it’s a feature.

1
2
3
4
5
6
7
8
9
; copy c bytes from de to hl
; NOTE: c = 0 means to copy 256 bytes!
copy:
    ld a, [de]
    inc de
    ld [hl+], a
    dec c
    jr nz, copy
    ret

And here it is in action:

1
2
3
4
5
    ; Copy the little DMA routine into high RAM
    ld de, dma_copy
    ld hl, $FF80
    ld c, dma_copy_end - dma_copy
    call copy

Cool.

Of course, this is now significantly slower than the original unrolled version. The original took 13 × (2 + 2 + 2) = 78 cycles; the function adds 6 cycles for the call, 4 cycles for the ret, and 13 × (1 + 3) = 52 for the counting and jumping. As c goes to infinity, the function takes about ⅔ longer than unrolling.

If I feel like it, I could mitigate this somewhat by partially unrolling. First I’d mask off some lower bits of c — say, the lowest two — and copy that many bytes. Now the amount of copying left is a multiple of four, so I could shift c right twice and have another loop that copies four bytes at a time, amortizing the cost of the decrement and jump.

It’s not urgent enough for me to want to bother yet, and it’ll make relatively little difference for small copies like this DMA one, but I’m strongly considering it for copying a 16-bit amount.

Reset vectors

Now I have a couple utility functions like copy and wait_for_vblank. I don’t really care where they go, so I put them in their own SECTION and let the linker figure it out.

It took a while for me to notice where, exactly, the linker had put them: at $0000! These functions are small, and I have nothing explicitly placed before the interrupt handlers (which begin at $0040), so rgblink saw some empty space and filled it.

The thing is, the Game Boy has eight instructions of the form rst $xx that act as fast calls — each one jumps to a fixed low address (a “reset vector”), using less time and space than a call would. And those fixed $xx addresses are… $00, and every eight bytes afterwards.

I don’t have any immediate use for these — eight bytes isn’t a lot, though I guess copy could fit in there — but I probably don’t want arbitrary code ending up where they go, so for now I’ll stub them out like I stubbed out the interrupt handlers.

(I have been advised of one very good use for reset vectors: putting a crash handler at $38. Why? Because rst $38 is encoded as $ff, which is a fairly common byte to encounter if you accidentally jump into garbage. A lot of the Game Boy’s RAM is even initialized to $ff at startup.)

Idioms

I’m still discovering what’s considered idiomatic, but here are a couple tidbits.

The set of instructions is a little scattershot as far as arguments go. Several times early on, I wrote stuff like this:

1
2
3
  ld hl, some_address
  ld a, 133
  ld [hl], a

But I overlooked that there are instructions for both ld [hl], n8 and ld [n16], a, so the above can be reduced to two lines. There’s no such thing as ld [n16], n8, though.

A surprising number of instructions can use [hl] directly as an operand — even inc and dec, combining fetch/mutate/store into a single instruction.

xor a is twice as short and twice as fast as ld a, 0. I mean, we’re talking about a single byte and single cycle here, but no reason not to.

(xor a really means xor a, a, but since every boolean op instruction takes a as the first argument anyway, it can be omitted. I don’t like to omit it in most cases, since xor b doesn’t mention a at all and that seems misleading, but it feels appropriate when combining a with itself.)

or a (equivalently, and a) is a quick way to test whether a is zero, since boolean ops set the zero flag.

Color

This is neither here nor there, but since this post began with emulator differences, here’s another one.

The screen you’re reading this on is almost certainly backlit, but the original Game Boy Color screen was not. A fully white pixel on a Game Boy Color is turned off — it’s the color of the screen itself, in which you can probably see your own reflection.

Which raises a tricky question: what color is that? The game thinks it’s pure white, but the screen was a sort of pale yellow. So how should it be rendered in an emulator, on a modern backlit LCD monitor?

Compounding this problem is that Game Boy Color games can also run on the Game Boy Advance, which showed the colors yet slightly differently. And, of course, even monitors may be calibrated differently, in which case it all goes out the window.

It’s interesting to see different emulators’ opinions of how to render color:

The same screenshot, seen in several different emulators with different color schemes

This is exactly the same ROM. The top left is mGBA out of the box, which shows colors completely unaltered — usually fairly saturated. The top right is mGBA with its “gba-colors” shader enabled, which is supposed to replicate how colors appear on a GBA screen, but seems passingly similar to a GBC too. Then on the bottom are two emulators renowned for their accuracy, here wildly disagreeing with each other.

My Game Boy Color is currently in a box somewhere, and until I can find it, I can’t be sure who’s closer. All of these are perfectly fine interpretations of the same art, though.

I may or may not use the “gba-colors” shader, and may or may not fiddle with mGBA’s color settings over time. If the colors vary a bit in future screenshots, that’s probably why.

To be continued

This post doesn’t really correspond to a particular commit very well, since it’s all little stuff I did here and there. I hope you’ve enjoyed the breather, because it’s all downhill from here. In a good way, I mean. Like a rollercoaster.

Next time: map and sprite loading, which will explain how I got from grass to the moon texture in the screenshots above!

Cheezball Rising: Main loop, input, and a game

Post Syndicated from Eevee original https://eev.ee/blog/2018/07/05/cheezball-rising-main-loop-input-and-a-game/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

GitHub has intermittent prebuilt ROMs, or you can get them a week early on Patreon if you pledge $4. More details in the README!


In this issue, I fill in the remaining bits necessary to have something that looks like a game.

Previously: drawing a sprite.

Recap

So far, I have this.

A very gaudy striped background with half a cat on top

It took unfathomable amounts of effort, but it’s something! Now to improve this from a static image to something a bit more game-like.

Quick note: I’ve been advised to use the de facto standard hardware.inc file, which gives symbolic names to all the registers and some of the flags they use. I hadn’t introduced it yet while doing the work described in this post, but for the sake of readability, I’m going to pretend I did and use that file’s constants in the code snippets here.

Interrupts

To get much further, I need to deal with interrupts. And to explain interrupts, I need to briefly explain calls.

Assembly doesn’t really have functions, only addresses and jumps. That said, the Game Boy does have call and ret instructions. A call will push the PC register (program counter, the address of the current instruction) onto the stack and perform a jump; a ret will pop into the PC register, effectively jumping back to the source of the call.

There are no arguments, return values, or scoping; input and output must be mediated by each function, usually via registers. Of course, since registers are global, a “function” might trample over their values in the course of whatever work it does. A function can manually push and pop 16-bit register pairs to preserve their values, or leave it up to the caller for speed/space reasons. All the conventions are free for me to invent or ignore. A “function” can even jump directly to another function and piggyback on the second function’s ret, kind of like Perl’s goto &sub… which I realize is probably less common knowledge than how call/return work in assembly.

Interrupts, then, are calls that can happen at any time. When one of a handful of conditions occurs, the CPU can immediately (or, rather, just before the next instruction) call an interrupt handler, regardless of what it was already doing. When the handler returns, execution resumes in the interrupted code.

Of course, since they might be called anywhere, interrupt handlers need to be very careful about preserving the CPU state. Pushing af is especially important (and this is the one place where af is used as a pair), because a is necessary for getting almost anything done, and f holds the flags which most instructions will invisibly trample.

Naturally, I completely forgot about this the first time around.

The Game Boy has five interrupts, each with a handler at a fixed address very low in ROM. Each handler only has room for eight bytes’ worth of instructions, which is enough to do a very tiny amount of work — or to just jump elsewhere.

A good start is to populate each one with only the reti instruction, which returns as usual and re-enables interrupts. The CPU disables interrupts when it calls an interrupt handler (so they thankfully can’t interrupt themselves), and returning with only ret will leave them disabled.

Naturally, I completely forgot about this the first time around.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
; Interrupt handlers
SECTION "Vblank interrupt", ROM0[$0040]
    ; Fires when the screen finishes drawing the last physical
    ; row of pixels
    reti

SECTION "LCD controller status interrupt", ROM0[$0048]
    ; Fires on a handful of selectable LCD conditions, e.g.
    ; after repainting a specific row on the screen
    reti

SECTION "Timer overflow interrupt", ROM0[$0050]
    ; Fires at a configurable fixed interval
    reti

SECTION "Serial transfer completion interrupt", ROM0[$0058]
    ; Fires when the serial cable is done?
    reti

SECTION "P10-P13 signal low edge interrupt", ROM0[$0060]
    ; Fires when a button is released?
    reti

These will do nothing. I mean, obviously, but they’ll do even less than nothing until I enable them. Interrupts are enabled by the dedicated ei instruction, which enables any interrupts whose corresponding bit is set in the IE register ($ffff).

So… which one do I want?

Game loop

To have a game, I need a game loop. The basic structure of pretty much any loop looks like:

  1. Load stuff.
  2. Check for input.
  3. Update the game state.
  4. Draw the game state.
  5. GOTO 2

(If you’ve never seen a real game loop written out before, LÖVE’s default loop is a good example, though even a huge system like Unity follows the same basic structure.)

The Game Boy seems to introduce a wrinkle here. I don’t actually draw anything myself; rather, the hardware does the drawing, and I tell it what to draw by using the palette registers, OAM, and VRAM.

But in fact, this isn’t too far off from how LÖVE (or Unity) works! All the drawing I do is applied to a buffer, not the screen; once the drawing is complete, the main loop calls present(), which waits until vblank and then draws the buffer to the screen. So what you see on the screen is delayed by up to a frame, and the loop really has an extra “wait for vsync” step at 3½. Or, with a little rearrangement:

  1. Load stuff.
  2. Wait for vblank.
  3. Draw the game state.
  4. Check for input.
  5. Update the game state.
  6. GOTO 2

This is approaching something I can implement! It works out especially well because it does all the drawing as early as possible during vblank. That’s good, because the LCD operation looks something like this:

1
2
3
4
5
6
7
LCD redrawing...
LCD redrawing...
LCD redrawing...
LCD redrawing...
VBLANK
LCD idle
LCD idle

While the LCD is refreshing, I can’t (easily) update anything it might read from. I only have free control over VRAM et al. during a short interval after vblank, so I need to do all my drawing work right then to ensure it happens before the LCD starts refreshing again. Then I’m free to update the world while the LCD is busy.

First, right at the entry point, I enable the vblank interrupt. It’s bit 0 of the IE register, but hardware.inc has me covered.

1
2
3
4
5
main:
    ; Enable interrupts
    ld a, IEF_VBLANK
    ldh [rIE], a
    ei

Next I need to make the handler actually do something. The obvious approach is for the handler to call one iteration of the game loop, but there are a couple problems with that. For one, interrupts are disabled when a handler is called, so I would never get any other interrupts. I could explicitly re-enable interrupts, but that raises a bigger question: what happens if the game lags, and updating the world takes longer than a frame? With this approach, the game loop would interrupt itself and then either return back into itself somewhere and cause untold chaos, or take too long again and eventually overflow the stack. Neither is appealing.

An alternative approach, which I found in gb-template but only truly appreciated after some thought, is for the vblank handler to set a flag and immediately return. The game loop can then wait until the flag is set before each iteration, just like LÖVE does. If an update takes longer than a frame, no problem: the loop will always wait until the next vblank, and the game will simply run more slowly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
SECTION "Vblank interrupt", ROM0[$0040]
    push hl
    ld hl, vblank_flag
    ld [hl], 1
    pop hl
    reti

...

SECTION "Important twiddles", WRAM0[$C000]
; Reserve a byte in working RAM to use as the vblank flag
vblank_flag:
    db

The handler fits in eight bytes — the linker would yell at me if it didn’t, since another section starts at $0048! — and leaves all the registers in their previous states. As I mentioned before, I originally neglected to preserve registers, and some zany things started to happen as a and f were abruptly altered in the middle of other code. Whoops!

Now the main loop can look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
main:
    ; ... bunch of setup code ...

vblank_loop:
    ; Main loop: halt, wait for a vblank, then do stuff

    ; The halt instruction stops all CPU activity until the
    ; next interrupt, which saves on battery, or at least on
    ; CPU cycles on an emulator's host system.
    halt
    ; The Game Boy has some obscure hardware bug where the
    ; instruction after a halt is occasionally skipped over,
    ; so every halt should be followed by a nop.  This is so
    ; ubiquitous that rgbasm automatically adds a nop after
    ; every halt, so I don't even really need this here!
    nop

    ; Check to see whether that was a vblank interrupt (since
    ; I might later use one of the other interrupts, all of
    ; which would also cancel the halt).
    ld a, [vblank_flag]
    ; This sets the zero flag iff a is zero
    and a
    jr z, vblank_loop
    ; This always sets a to zero, and is shorter (and thus
    ; faster) than ld a, 0
    xor a, a
    ld [vblank_flag], a

    ; Use DMA to update object attribute memory.
    ; Do this FIRST to ensure that it happens before the screen starts to update again.
    call $FF80

    ; ... update everything ...

    jp vblank_loop

It’s looking all the more convenient that I have my own copy of OAM — I can update it whenever I want during this loop! I might need similar facilities later on for editing VRAM or changing palettes.

Doing something and reading input

I have a loop, but since nothing’s happening, that’s not especially obvious. Input would take a little effort, so I’ll try something simpler first: making Anise move around.

I don’t actually track Anise’s position anywhere right now, except for in the OAM buffer. Good enough. In my main loop, I add:

1
2
3
4
    ld hl, oam_buffer + 1
    ld a, [hl]
    inc a
    ld [hl], a

The second byte in each OAM entry is the x-coordinate, and indeed, this causes Anise’s torso to glide rightwards across the screen at 60ish pixels per second. Eventually the x-coordinate overflows, but that’s fine; it wraps back to zero and moves the sprite back on-screen from the left.

The half-cat is now sliding across the screen

Excellent. I mean, sorry, this is extremely hard to look at, but bear with me a second.

This would be a bit more game-like if I could control it with the buttons, so let’s read from them.

There are eight buttons: up, down, left, right, A, B, start, select. There are also eight bits in a byte. You might suspect that I can simply read an I/O register to get the current state of all eight buttons at once.

Ha, ha! You naïve fool. Of course it’s more convoluted than that. That single byte thing is a pretty good idea, though, so what I’ll do is read the input at the start of the frame and coax it into a byte that I can consult more easily later.

Turns out I pretty much have to do that, because button access is slightly flaky. Even the official manual advises reading the buttons several times to get a reliable result. Yikes.

Here’s how to do it. The buttons are wired in two groups of four: the dpad and everything else. Reading them is thus also done in two groups of four. I need to use the P1 register, which I assume is short for “player 1” and is so named because the people who designed this hardware had also designed the two-player NES?

Bits 5 and 6 of P1 determine which set of four buttons I want to read, and then the lower nybble contains the state of those buttons. Note that each bit is set to 1 if the button is released; I think this is a quirk of how they’re wired, and what I’m doing is extremely direct hardware access. Exciting! (Also very confusing on my first try, where Anise’s movement was inverted.)

The code, which is very similar to an example in the official manual, thus looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
    ; Poll input
    ; The direct hardware access is nonsense and unreliable, so
    ; just read once per frame and stick all the button states
    ; in a byte

    ; Bit 6 means to read the dpad
    ld a, $20
    ldh [rP1], a
    ; But it's unreliable, so do it twice
    ld a, [rP1]
    ld a, [rP1]
    ; This is 'complement', and flips all the bits in a, so now
    ; set bits will mean a button is held down
    cpl
    ; Store the lower four bits in b
    and a, $0f
    ld b, a

    ; Bit 5 means to read the buttons
    ld a, $10
    ldh [rP1], a
    ; Apparently this is even more unreliable??  No, really, the
    ; manual does this: two reads, then six reads
    ld a, [rP1]
    ld a, [rP1]
    ld a, [rP1]
    ld a, [rP1]
    ld a, [rP1]
    ld a, [rP1]
    ; Again, complement and mask off the lower four bits
    cpl
    and a, $0f
    ; b already contains four bits, so I need to shift something
    ; left by four...  but the shift instructions only go one
    ; bit at a time, ugh!  Luckily there's swap, which swaps the
    ; high and low nybbles in any register
    swap a
    ; Combine b's lower nybble with a's high nybble
    or a, b
    ; And finally store it in RAM
    ld [buttons], a

...

SECTION "Important twiddles", WRAM0[$C000]
vblank_flag:
    db
buttons:
    db

Phew. That was a bit of a journey, but now I have the button state as a single byte. To help with reading the buttons, I’ll also define a few constants labeling the individual bits. (There are instructions for reading a particular bit by number, so I don’t need to mask a single bit out.)

1
2
3
4
5
6
7
8
9
; Constants
BUTTON_RIGHT  EQU 0
BUTTON_LEFT   EQU 1
BUTTON_UP     EQU 2
BUTTON_DOWN   EQU 3
BUTTON_A      EQU 4
BUTTON_B      EQU 5
BUTTON_START  EQU 6
BUTTON_SELECT EQU 7

Now to adjust the sprite position based on what directions are held down. Delete the old code and replace it with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
    ; Set b/c to the y/x coordinates
    ld hl, oam_buffer
    ld b, [hl]
    inc hl
    ld c, [hl]

    ; This sets the z flag to match a particular bit in a
    bit BUTTON_LEFT, a
    ; If z, the bit is zero, so left isn't held down
    jr z, .skip_left
    ; Otherwise, left is held down, so decrement x
    dec c
.skip_left:

    ; The other three directions work the same way
    bit BUTTON_RIGHT, a
    jr z, .skip_right
    inc c
.skip_right:
    bit BUTTON_UP, a
    jr z, .skip_up
    dec b
.skip_up:
    bit BUTTON_DOWN, a
    jr z, .skip_down
    inc b
.skip_down:

    ; Finally, write the new coordinates back to the OAM
    ; buffer, which hl is still pointing into
    ld [hl], c
    dec hl
    ld [hl], b

Miraculously, Anise’s torso now moves around on command!

The half-cat is now moving according to button presses

Neat! But this still looks really, really, incredibly bad.

Aesthetics

It’s time to do something about this artwork.

First things first: I’m really tired of writing out colors by hand, in binary, so let’s fix that. In reality, I did this bit after adding better art, but doing it first is better for everyone.

I think I’ve mentioned before that rgbasm has (very, very rudimentary) support for macros, and this seems like a perfect use case for one. I’d like to be able to write colors out in typical rrggbb hex fashion, so I need to convert a 24-bit color to a 16-bit one.

1
2
3
4
5
6
dcolor: MACRO  ; $rrggbb -> gbc representation
_r = ((\1) & $ff0000) >> 16 >> 3
_g = ((\1) & $00ff00) >> 8  >> 3
_b = ((\1) & $0000ff) >> 0  >> 3
    dw (_r << 0) | (_g << 5) | (_b << 10)
    ENDM

This is going to need a whole paragraph of caveats.

A macro is contained between MACRO and ENDM. The assembler has a curious sort of universal assignment syntax, where even ephemeral constructs like macros are introduced by labels. Macros can take arguments, but they aren’t declared; they’re passed more like arguments to shell scripts, where the first argument is \1 and so forth. (There’s even a SHIFT command for accessing arguments beyond the ninth.) Also, passing strings to a macro is some kind of byzantine nightmare where you have to slap backslashes in just the right places and I will probably avoid doing it altogether if I can at all help it.

Oh, one other caveat: compile-time assignments like I have above must start in the first column. I believe this is because assignments are also labels, and labels have to start in the first column. It’s a bit weird and apparently rgbasm’s lexer is horrifying, but I’ll take it over writing my own assembler and stretching this project out any further.

Anyway, all of that lets me write dcolor $ff0044 somewhere and have it translated at compile time to the appropriate 16-bit value. (I used dcolor to parallel db and friends, but I’m strongly considering using CamelCase exclusively for macros? Guess it depends how heavily I use them.)

With that on hand, I can now doodle some little sprites in Aseprite and copy them in. This part is not especially interesting and involves a lot of squinting at zoomed-in sprites.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SECTION "Sprites", ROM0
PALETTE_BG0:
    dcolor $80c870  ; light green
    dcolor $48b038  ; darker green
    dcolor $000000  ; unused
    dcolor $000000  ; unused
PALETTE_ANISE:
    dcolor $000000  ; TODO
    dcolor $204048
    dcolor $20b0b0
    dcolor $f8f8f8
GRASS_SPRITE:
    dw `00000000
    dw `00000000
    dw `01000100
    dw `01010100
    dw `00010000
    dw `00000000
    dw `00000000
    dw `00000000
EMPTY_SPRITE:
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
    dw `00000000
ANISE_SPRITE:
    ; ... I'll revisit this momentarily

Gorgeous. You may notice that I put the colors as data instead of inlining them in code, which incidentally makes the code for setting the palette vastly shorter as well:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    ; Start setting the first color, and advance the internal
    ; pointer on every write
    ld a, %10000000
    ; BCPS = Background Color Palette Specification
    ldh [rBCPS], a

    ld hl, PALETTE_BG0
    REPT 8
    ld a, [hl+]
    ; Same, but Data
    ld [rBCPD], a
    ENDR

Loading sprites into VRAM also becomes a bit less of a mess:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
    ; Load some basic tiles
    ld hl, $8000

    ; Read the 16-byte empty sprite into tile 0
    ld bc, EMPTY_SPRITE
    REPT 16
    ld a, [bc]
    inc bc
    ld [hl+], a
    ENDR

    ; Read the grass sprite into tile 1, which immediately
    ; follows tile 0, so hl is already in the right place
    ld bc, GRASS_SPRITE
    REPT 16
    ld a, [bc]
    inc bc
    ld [hl+], a
    ENDR

Someday I should write an actual copy function, since at the moment, I’m using an alarming amount of space for pointlessly unrolled loops. Maybe later.

You may notice I now have two tiles, whereas before I was relying on filling the entire screen with one tile, tile 0. I want to dot the landscape with tile 1, which means writing a bit more to the actual background grid, which begins at $9800 and has one byte per tile.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
    ; Fill the screen buffer with a pattern of grass tiles,
    ; where every 2x2 block has a single grass at the top left.
    ; Note that the buffer is 32x32 tiles, and it ends at $9c00
    ld hl, $9800
.screen_fill_loop:
    ; Use tile 1 for every other tile in this row.  Note that
    ; REPTed part increments hl /twice/, thus skipping a tile
    ld a, $01
    REPT 16
    ld [hl+], a
    inc hl
    ENDR
    ; Skip an entire row of 32 tiles, which will remain empty.
    ; There is almost certainly a better way to do this, but I
    ; didn't do it.  (Hint: it's ld bc, $20; add hl, bc)
    REPT 32
    inc hl
    ENDR
    ; If we haven't reached $9c00 yet, continue looping
    ld a, h
    cp a, $9C
    jr c, .screen_fill_loop

Sorry for all these big blocks of code, but check out this payoff!

A very simple grassy background

POW! Gorgeous.

And hey, why stop there? With a little more pixel arting against a very reduced palette…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
SPRITE_ANISE_FRONT_1:
    dw `00000111
    dw `00001222
    dw `00012222
    dw `00121222
    dw `00121122
    dw `00121111
    dw `00121122
    dw `00121312
    dw `00121313
    dw `00012132
    dw `00001211
    dw `00000123
    dw `00100123
    dw `00011133
    dw `00000131
    dw `00000010
SPRITE_ANISE_FRONT_2:
    dw `11100000
    dw `22210000
    dw `22221000
    dw `22212100
    dw `22112100
    dw `11112100
    dw `22112100
    dw `21312100
    dw `31312100
    dw `23121000
    dw `11210000
    dw `32100000
    dw `32100000
    dw `33100000
    dw `13100000
    dw `01000000

Yes, I am having trouble deciding on a naming convention.

This is now a 16×16 sprite, made out of two 8×16 parts. This post has enough code blocks as it is, and the changes to make this work are relatively minor copy/paste work, so the quick version is:

  1. Set the LCDC flag (bit 2, or LCDCF_OBJ16) that makes objects be 8×16. This mode uses pairs of tiles, so an object that uses either tile 0 or 1 will draw both of them, with tile 0 on top of tile 1.
  2. Extend the code that loads object tiles to load four instead.
  3. Define a second sprite that’s 8 pixels to the right of the first one.
  4. Remove the hard-coded object palette, and instead load the PALETTE_ANISE that I sneakily included above. This time the registers are called rOCPS and rOCPD.

Finally, extend the code that moves the sprite to also move the second half:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
    ; Finally, write the new coordinates back to the OAM
    ; buffer, which hl is still pointing into
    ld [hl], c
    dec hl
    ld [hl], b
    ; This bit is new: copy the x-coord into a so I can add 8
    ; to it, then store both coords into the second sprite's
    ; OAM data
    ld a, c
    add a, 8
    ; I could've written this the other way around, but I did
    ; not, I guess because this structure mirrors the above?
    ld hl, oam_buffer + 5
    ld [hl], a
    dec hl
    ld [hl], b

Cross my fingers, and…

A little cat sprite atop the grassy background

Hey hey hey! That finally looks like something!

To be continued

It was a surprisingly long journey, but this brings us more or less up to commit 313a3e, which happens to be the first commit I made a release of! It’s been more than a week, so you can grab it on Patreon or GitHub. I strongly recommend playing it with a release of mGBA prior to 0.7, for… reasons that will become clear next time.

Next time: I’ll take a breather and clean up a few things.

Cheezball Rising: Drawing a sprite

Post Syndicated from Eevee original https://eev.ee/blog/2018/06/21/cheezball-rising-drawing-a-sprite/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

source codeprebuilt ROMs (a week early for $4) • works best with mGBA

In this issue, I figure out how to draw a sprite. This part was hard.

Previously: figuring out how to put literally anything on the goddamn screen.

Recap

Welcome back! I’ve started cobbling together a Pygments lexer for RGBDS’s assembly flavor, so hopefully the code blocks are more readable, and will become moreso over time.

When I left off last time, I had… um… this.

Vertical stripes of red, green, blue, and white

This is all on the background layer, which I mentioned before is a fixed grid of 8×8 tiles.

For anything that moves around freely, like the player, I need to use the object layer. So that’s an obvious place to go next.

Now, if you remember, I can define tiles by just writing to video RAM, and I define palettes with a goofy system involving writing them one byte at a time to the same magic address. You might expect defining objects to do some third completely different thing, and you’d be right!

Defining an object

Objects are defined in their own little chunk of RAM called OAM, for object attribute memory. They’re also made up of tiles, but each tile can be positioned at an arbitrary point on the screen.

OAM starts at $fe00 and each object takes four bytes — the y-coordinate, the x-coordinate, the tile number, and some flags — for a total of 160 bytes. There are some curiosities, like how the top left of the screen is (8, 10) rather than (0, 0), but I’ll figure out what’s up with that later. (I suppose if zeroes meant the upper left corner, there’d be a whole stack of tile 0 there all the time.)

Here’s the fun part: I can’t write directly to OAM? I guess??? Come to think of it, I don’t think the manual explicitly says I can’t, but it’s strongly implied. Hmm. I’ll look into that. But I didn’t at the time, so I’ll continue under the assumption that the following nonsense is necessary.

Because I “can’t” write directly, I need to use some shenanigans. First, I need something to write! This is an Anise game, so let’s go for Anise.

I’m on my laptop at this point without access to the source code for the LÖVE Anise game I started, so I have to rustle up a screenshot I took.

Cropped screenshot of Star Anise and some critters, all pixel art

Wait a second.

Even on the Game Boy Color, tiles are defined with two bits per pixel. That means an 8×8 tile has a maximum of four colors. For objects, the first color is transparent, so I really have three colors — which is exactly why most Game Boy Color protagonists have a main color, an outline/shadow color, and a highlight color.

Let’s check out that Anise in more detail.

Star Anise at 8×

Hm yes okay that’s more than three colors. I guess I’m going to need to draw some new sprites from scratch, somehow.

In the meantime, I optimistically notice that Star Anise’s body only uses three colors, and it’s 8×7! I could make a tile out of that! I painstakingly copy the pixels into a block of those backticks, which you can kinda see is his body if you squint a bit:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
SECTION "Sprites", ROM0
ANISE_SPRITE:
    dw `00000000
    dw `00001333
    dw `00001323
    dw `10001233
    dw `01001333
    dw `00113332
    dw `00003002
    dw `00003002

The dw notation isn’t an opcode; it tells the assembler to put two literal bytes of data in the final ROM. A word of data. (Each row of a tile is two bytes, remember.)

If you think about this too hard, you start to realize that both the data and code are just bytes, everything is arbitrary, and true meaning is found only in the way we perceive things rather than in the things themselves.

Note I didn’t specify an exact address for this section, so the linker will figure out somewhere to put it and make sure all the labels are right at the end.

Now I load this into tilespace, back in my main code:

1
2
3
4
5
6
7
8
    ; Define an object
    ld hl, $8800
    ld bc, ANISE_SPRITE
    REPT 16
    ld a, [bc]
    ld [hl+], a
    inc bc
    ENDR

This copies 16 bytes, starting from the ANISE_SPRITE label, to $8800.


Why $8800, not $8000? I’m so glad you asked!

There are actually three blocks of tile space, each with enough room for 128 tiles: one at $8000, one at $8800, and one at $9000. Object tiles always use the $8000 block followed by the $8800 block, whereas background tiles can use either $8000 + $8800 or $9000 + $8800. By default, background tiles use $8000 + $8800.

All of which is to say that I got very confused reading the manual (which spends like five pages explaining the above paragraph) and put the object tiles in the wrong place. Whoops. It’s fine; this just ends up being tile 128.

In my partial defense, looking at it now, I see the manual is wrong! Bit 4 of the LCD controller register ($ff40) controls whether the background uses tiles from $8000 + $8800 (1) or $9000 + $8800 (0). The manual says that this register defaults to $83, which has bit 4 off, suggesting that background tiles use $9000 + $8800 (i.e. start at $8800), but disassembly of the boot ROM shows that it actually defaults to $91, which has bit 4 on. Thanks a lot, Nintendo!

That was quite a diversion. Here’s a chart of where the dang tiles live. Note that the block at $8800 is always shared between objects and background tiles. Oh, and on the Game Boy Color, all three blocks are twice as big thanks to the magic of banking. I’ll get to banking… much later.

1
2
3
4
5
                            bit 4 ON (default)  bit 4 OFF
                            ------------------  ---------
$8000   obj tiles 0-127     bg tiles 0-127
$8800   obj tiles 128-255   bg tiles 128-255    bg tiles 128-255
$9000                                           bg tiles 0-127

Hokay. What else? I’m going to need a palette for this, and I don’t want to use that gaudy background palette. Actually, I can’t — the background and object layers have two completely separate sets of palettes.

Writing an object palette is exactly the same as writing a background palette, except with different registers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    ; This should look pretty familiar
    ld a, %10000000
    ld [$ff6a], a

    ld bc, %0000000000000000  ; transparent
    ld a, c
    ld [$ff6b], a
    ld a, b
    ld [$ff6b], a
    ld bc, %0010110100100101  ; dark
    ld a, c
    ld [$ff6b], a
    ld a, b
    ld [$ff6b], a
    ld bc, %0100000111001101  ; med
    ld a, c
    ld [$ff6b], a
    ld a, b
    ld [$ff6b], a
    ld bc, %0100001000010001  ; white
    ld a, c
    ld [$ff6b], a
    ld a, b
    ld [$ff6b], a

Riveting!

I wrote out those colors by hand. The original dark color, for example, was #264a59. That uses eight bits per channel, but the Game Boy Color only supports five (a factor of 8 difference), so first I rounded each channel to the nearest 8 and got #284858. Swap the channels to get 58 48 28 and convert to binary (sans the trailing zeroes) to get 01011 01001 00101.

Note to self: probably write a macro or whatever so I can define colors like a goddamn human being. Also why am I not putting the colors in a ROM section too?

Almost there. I still need to write out those four bytes that specify the tile and where it goes. I can’t actually write them to OAM yet, so I need some scratch space in regular RAMworking RAM.

1
2
3
SECTION "OAM Buffer", WRAM0[$C100]
oam_buffer:
    ds 4 * 40

The ds notation is another “data” variant, except it can take a size and reserves space for a whole string of data. Note that I didn’t put any actual data here — this section is in RAM, which only exists while the game is running, so there’d be nowhere to put data.

Also note that I gave an explicit address this time. The buffer has to start at an address ending in 00, for reasons that will become clear momentarily. The space from $c000 to $dfff is available as working RAM, and I chose $c100 for… reasons that will also become clear momentarily.

Now to write four bytes to it at runtime:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    ; Put an object on the screen
    ld hl, oam_buffer
    ; y-coord
    ld a, 64
    ld [hl+], a
    ; x-coord
    ld [hl+], a
    ; tile index
    ld a, 128
    ld [hl+], a
    ; attributes, including palette, which are all zero
    ld a, %00000000
    ld [hl+], a

(I tried writing directly to OAM on my first attempt. Nothing happened! Very exciting.)

But how to get this into OAM so it’ll actually show on-screen? For that, I need to do a DMA transfer.

DMA

DMA, or direct memory access, is one of those things the Game Boy programming manual seems to think everyone is already familiar with. It refers generally to features that allow some other hardware to access memory, without going through the CPU. In the case of the Game Boy, it’s used to copy data from working RAM to OAM. Only to OAM. It’s very specific.

Performing a DMA transfer is super easy! I write the high byte of the source address to the DMA register ($ff46), and then some magic happens, and 160 bytes from the source address appear in OAM. In other words:

1
2
3
    ld a, $c1       ; copy from $c100
    ld [$ff46], a   ; perform DMA transfer
    ; now $c000 through $c09f have been copied into OAM!

It’s almost too good to be true! And it is. There are some wrinkles.

First, the transfer takes some time, during which I almost certainly don’t want to be doing anything else.

Second, during the transfer, the CPU can only read from “high RAM” — $ff80 and higher. Wait, uh oh.

The usual workaround here is to copy a very short function into high RAM to perform the actual transfer and wait for it to finish, then call that instead of starting a transfer directly. Well, that sounds like a pain, so I break my rule of accounting for every byte and find someone else who’s done it. Conveniently enough, that post is by the author of the small template project I’ve been glancing at.

I end up with something like the following.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    ; Copy the little DMA routine into high RAM
    ld bc, DMA_BYTECODE
    ld hl, $ff80
    ; DMA routine is 13 bytes long
    REPT 13
    ld a, [bc]
    inc bc
    ld [hl+], a
    ENDR

; ...

SECTION "DMA Bytecode", ROM0
DMA_BYTECODE:
    db $F5, $3E, $C1, $EA, $46, $FF, $3E, $28, $3D, $20, $FD, $F1, $D9

That’s compiled assembly, written inline as bytes. Oh boy. The original code looks like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    ; start the transfer, as shown above
    ld a, $c1
    ld [$ff46], a

    ; wait 160 cycles/microseconds, the time it takes for the
    ; transfer to finish; this works because 'dec' is 1 cycle
    ; and 'jr' is 3, for 4 cycles done 40 times
    ld      a, 40
loop:
    dec     a
    jr      nz, loop

    ; return
    ret

Now you can see why I used $c100 for my OAM buffer: because it’s the address this person used.

(Hm, the opcode reference I usually use seems to have all the timings multiplied by a factor of 4 without comment? Odd. The rgbds reference is correct.)

(Also, here’s a fun fact: the stack starts at $fffe and grows backwards. If it grows too big, the very first thing it’ll overwrite is this DMA routine! I bet that’ll have some fun effects.)

At this point I have a thought. (Okay, I had the thought a bit later, but it works better narratively if I have it now.) I’ve already demonstrated that the line between code and data is a bit fuzzy here. So why does this code need to be pre-assembled?

And a similar thought: why is the length hardcoded? Surely, we can do a little better. What if we shuffle things around a bit…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SECTION "init", ROM0[$0100]
    nop
    ; Jump to a named label instead of an address
    jp main

SECTION "main", ROM0[$0150]
; DMA copy routine, copied into high RAM at startup.
; Never actually called where it is.
dma_copy:
    ld a, $c1
    ld [$ff46], a
    ld a, 40
.loop:
    dec a
    jr nz, .loop
    ret
dma_copy_end:
    nop

main:
    ; ... all previous code is here now ...

    ; Copy the little DMA routine into high RAM
    ld bc, dma_copy
    ld hl, $ff80
    ; DMA routine is 13 bytes long
    REPT dma_copy_end - dma_copy
    ld a, [bc]
    inc bc
    ld [hl+], a
    ENDR

This is very similar to what I just had, except that the code is left as code, and its length is computed by having another label at the end — so I’m free to edit it later if I want to. It all ends up as bytes in the ROM, so the code ends up exactly the same as writing out the bytes with db. Come to think of it, I don’t even need to hardcode the $c1 there; I could replace it with oam_buffer >> 8 and avoid repeating myself.

(I put the code at $0150 because rgbasm is very picky about subtracting labels, and will only do it if they both have fixed positions. These two labels would be the same distance apart no matter where I put the section, but I guess rgbasm isn’t smart enough to realize that.)

I’m actually surprised that the author of the above post didn’t think to do this? Maybe it’s dirty even by assembly standards.

Timing, vblank, and some cool trickery

Okay, so, as I was writing that last section, I got really curious about whether and when I’m actually allowed to write to OAM. Or tile RAM, for that matter.

I found/consulted the Game Boy dev wiki, and the rules match what’s in the manual, albeit with a chart that makes things a little more clear.

My understanding is as follows. The LCD draws the screen one row of pixels at a time, and each row has the following steps:

  1. Look through OAM to see if any sprites are on this row. OAM is inaccessible to the CPU.

  2. Draw the row. OAM, VRAM, and palettes are all inaccessible.

  3. Finish the row and continue on to the beginning of the next row. This takes a nonzero amount of time, called the horizontal blanking period, during which the CPU can access everything freely.

Once the LCD reaches the bottom, it continues to “draw” a number of faux rows below the bottom of the visible screen (vertical blanking), and the CPU can again do whatever it wants. Eventually it returns to the top-left corner to draw again, concluding a single frame. The entire process happens 59.7 times per second.

There’s one exception: DMA transfers can happen any time, but the LCD will simply not draw sprites during the transfer.

So I probably shouldn’t be writing to tiles and palettes willy-nilly. I suspect I got away with it because it happened in that first OAM-searching stage… and/or because I did it on emulators which are a bit more flexible than the original hardware.

In fact…

Same screenshot as above, but the first row of pixels is corrupt

I took this screenshot by loading the ROM I have so far, pausing it, resetting it, and then advancing a single frame. This is the very first frame my game shows. If you look closely at the first row of pixels, you can see they’re actually corrupt — they’re being drawn before I’ve set up the palette! You can even see each palette entry taking effect along the row.

This is very cool. It also means my current code would not work at all on actual hardware. I should probably just turn the screen off while I’m doing setup like this.

It’s interesting that only OAM gets a special workaround in the form of a DMA transfer — I imagine because sprites move around much more often than the tileset changes — but having the LCD stop drawing sprites in the meantime is quite a limitation. Surely, you’d only want to do a DMA transfer during vblank anyway? It is much faster than copying by hand, so I’ll still take it.

All of this is to say: I’m gonna need to care about vblanks.


Incidentally, the presence of hblank is very cool and can be used for a number of neat effects, especially when combined with the Game Boy’s ability to call back into user code when the LCD reaches a specific row:

  • The GBC Zelda games use it for map scrolling. The status bar at the top is in one of the two background maps, and as soon as that finishes drawing, the game switches to the other one, which contains the world.

  • Those same games also use it for a horizontal wavy effect, both when warping around and when underwater — all they need to do is change the background layer’s x offset during each hblank!

  • The wiki points out that OAM could be written to in the middle of a screen update, thus bypassing the 40-object restriction: draw 40 objects on the top half of the screen, swap out OAM midway, and then the LCD will draw a different 40 on the bottom half!

  • I imagine you could also change palettes midway through a redraw and exceed the usual limit of 56 colors on screen at a time! No telling whether this sort of trick would work on an emulator, though.

I am very excited at the prospects here.

I’m also slightly terrified. I have a fixed amount of time between frames, and with the LCD as separate hardware, there’s no such thing as a slow frame. If I don’t finish, things go bad. And that time is measured in instructions — an ld always takes the same number of cycles! There’s no faster computer or reducing GC pressure. There’s just me. Yikes.

Back to drawing a sprite

I haven’t had a single new screenshot this entire post! This is ridiculous. All I want is to draw a thing to the screen.

I have some data in my OAM buffer. I have DMA set up. All I should need to do now is start a transfer.

1
    call $ff80

And… nothing. mGBA’s memory viewer confirms everything’s in the right place, but nothing’s on the screen.

Whoops! Remember that LCD controller register, and how it defaults to $91? Well, bit 1 is whether to show objects at all, and it defaults to off. So let’s fix that.

1
2
    ld a, %10010011  ; $91 plus bit 2
    ld [$ff40], a
The same gaudy background, but now with a partial Anise sprite on top

SUCCESS!

It doesn’t look like much, but it took a lot of flailing to get here, and I was overjoyed when I first saw it. The rest should be a breeze! Right?

To be continued

That doesn’t even get us all the way through commit 1b17c7, but this is already more than enough.

Next time: input, and moderately less eye-searing art!

Cheezball Rising: A new Game Boy Color game

Post Syndicated from Eevee original https://eev.ee/blog/2018/06/19/cheezball-rising-a-new-game-boy-color-game/

This is a series about Star Anise Chronicles: Cheezball Rising, an expansive adventure game about my cat for the Game Boy Color. Follow along as I struggle to make something with this bleeding-edge console!

source codeprebuilt ROMs (a week early for $4) • works best with mGBA

In this issue, I figure out how to put literally anything on the goddamn screen, then add a splash of color.

The plan

I’m making a Game Boy Color game!

I have no— okay, not much idea what I’m doing, so I’m going to document my progress as I try to forge a 90s handheld game out of nothing.

I do usually try to keep tech stuff accessible, but this is going to get so arcane that that might be a fool’s errand. Think of this as less of an extended tutorial, more of a long-form Twitter.

Also, I’ll be posting regular builds on Patreon for $4 supporters, which will be available a week later for everyone else. I imagine they’ll generally stay in lockstep with the posts, unless I fall behind on the writing part. But when has that ever happened?

Your very own gamedev legend is about to unfold! A world of dreams and adventures with gbz80 assembly awaits! Let’s go!

Prerequisites

First things first. I have a teeny bit of experience with Game Boy hacking, so I know I need:

  • An emulator. I have no way to run arbitrary code on an actual Game Boy Color, after all. I like mGBA, which strives for accuracy and has some debug tools built in.

    There’s already a serious pitfall here: emulators are generally designed to run games that would work correctly on the actual hardware, but they won’t necessarily reject games that wouldn’t work on actual hardware. In other words, something that works in an emulator might still not work on a real GBC. I would of course prefer that this game work on the actual console it’s built for, but I’ll worry about that later.

  • An assembler, which can build Game Boy assembly code into a ROM. I pretty much wrote one of these myself already for the Pokémon shenanigans, but let’s go with something a little more robust here. I’m using RGBDS, which has a couple nice features like macros and a separate linking step. It compiles super easily, too.

    I also hunted down a vim syntax file, uh, somewhere. I can’t remember which one it was now, and it’s kind of glitchy anyway.

  • Some documentation. I don’t know exactly how this surfaced, but the actual official Game Boy programming manual is on archive.org. It glosses over some things and assumes some existing low-level knowledge, but for the most part it’s a very solid reference.

For everything else, there’s Google, and also the curated awesome-gbdev list of resources.

That list includes several skeleton projects for getting started, but I’m not going to use them. I want to be able to account for every byte of whatever I create. I will, however, refer to them if I get stuck early on. (Spoilers: I get stuck early on.)

And that’s it! The rest is up to me.

Making nothing from nothing

Might as well start with a Makefile. The rgbds root documentation leads me to the following incantation:

1
2
3
4
all:
        rgbasm -o main.o main.rgbasm
        rgblink -o gamegirl.gb main.o
        rgbfix -v -p 0 gamegirl.gb

(I, uh, named this project “gamegirl” before I figured out what it was going to be. It’s a sort of witticism, you see.)

This works basically like every C compiler under the sun, as you might expect: every source file compiles to an object file, then a linker bundles all the object files into a ROM. If I only change one source file, I only have to rebuild one object file.

Of course, this Makefile is terrible garbage and will rebuild the entire project unconditionally every time, but at the moment that takes a fraction of a second so I don’t care.

The extra rgbfix step is new, though — it adds the Nintendo logo (the one you see when you start up a Game Boy) to the header at the beginning of the ROM. Without this, the console will assume the cartridge is dirty or missing or otherwise unreadable, and will refuse to do anything at all. (I could also bake the logo into the source itself, but given that it’s just a fixed block of bytes and rgbfix is bundled with the assembler, I see no reason to bother with that.)

All I need now is a source file, main.rgbasm, which I populate with:

1

Nothing! I don’t know what I expect from this, but I’m curious to see what comes out. And what comes out is a working ROM!

A completely blank screen

Maybe “working” is a strong choice of word, given that it doesn’t actually do anything.

Doing something

It would be fantastic to put something on the screen. This turned out to be harder than expected.

First attempt. I know that the Game Boy starts running code at $0150, immediately after the end of the header. So I’ll put some code there.

A brief Game Boy graphics primer: there are two layers, the background and objects. (There’s also a third layer, the window, which I don’t entirely understand yet.) The background is a grid of 8×8 tiles, two bits per pixel, for a total of four shades of gray. Objects can move around freely, but they lose color 0 to transparency, so they can only use three colors.

There are lots more interesting details and restrictions, which I will think about more later.

Drawing objects is complicated, and all I want to do right now is get something. I’m pretty sure the background defaults to showing all tile 0, so I’ll try replacing tile 0 with a gradient and see what happens.

Tiles are 8×8 and two bits per pixel, which means each row takes two bytes, and the whole tile is 16 bytes. Tiles are defined in one big contiguous block starting at $8000 — or, maybe $8800, sometimes — so all I need to do is:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SECTION "main", ROM0[$0150]
    ld hl, $8000
    ld a, %00011011
    REPT 16
    ld [hl+], a
    ENDR

_halt:
    ; Do nothing, forever
    halt
    nop
    jr _halt

If you are not familiar with assembly, this series is going to be a wild ride. But here’s a very very brief primer.

Assembly language — really, an assembly language — is little more than a set of human-readable names for the primitive operations a CPU knows how to do. And those operations, by and large, consist of moving bytes around. The names tend to be very short, because you end up typing them a lot.

Most of the work is done in registers, which are a handful of spaces for storing bytes right on the CPU. At this level, RAM is relatively slow — it’s further away, outside the chip — so you want to do as much work as possible in registers. Indeed, most operations can only be done on registers, so there’s a lot of fetching stuff from RAM and operating on it and then putting it back in RAM.

The Game Boy CPU, a modified Z80, has eight byte-sized registers. They’re often referred to in pairs, because they can be paired up to make a 16-bit values (giving you access to a full 64KB address space). And they are: af, bc, de, hl.

The af pair is special. The f register is used for flags, such as whether the last instruction caused an overflow, so it’s not generally touched directly. The a register is called the accumulator and is most commonly used for math operations — in fact, a lot of math operations can only be done on a. The hl register is most often used for addresses, and there are a couple instructions specific to hl that are convenient for memory access. (The h and l even refer to the high and low byte of an address.) The other two pairs aren’t especially noteworthy.

Also! Not every address is actually RAM; the address space ($0000 through $ffff) is carved into several distinct areas, which we will see as I go along. $8000 is the beginning of display RAM, which the screen reads from asynchronously. Also, a lot of addresses above $ff00 (also called “registers”) are special and control hardware in some way, or even perform some action when written to.

With that in mind, here’s the above code with explanatory comments:

TODO need to change this to write a single byte

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
; This is a directive for the assembler to put the following
; code at $0150 in the final ROM.
SECTION "main", ROM0[$0150]
    ; Put the hex value $8000 into registers hl.  Really, that
    ; means put $80 into h and $00 into l.
    ld hl, $8000

    ; Put this binary value into registers a.
    ; It's just 0 1 2 3, a color gradient.
    ld a, %00011011

    ; This is actually a macro this particular assembler
    ; understands, which will repeat the following code 16
    ; times, exactly as if I'd copy-pasted it.
    REPT 16

    ; The brackets (sometimes written as parens) mean to use hl
    ; as a position in RAM, rather than operating on hl itself.
    ; So this copies a into the position in RAM given by
    ; hl (initially $8000), and the + adds 1 to hl afterwards.
    ; This is one reason hl is nice for storing addresses: the +
    ; variant is handy for writing a sequence of bytes to RAM,
    ; and it only exists for hl.
    ld [hl+], a

    ; End the REPT block
    ENDR

; This is a label, used to refer to some position in the code.
; It only exists in the source file.
_halt:
    ; Stop all CPU activity until there's an interrupt.  I
    ; haven't turned any interrupts on, so this stops forever.
    halt

    ; The Game Boy hardware has a bug where, under rare and
    ; unspecified conditions, the instruction after a halt will
    ; be skipped.  So every halt should be followed by a nop,
    ; "no operation", which does nothing.
    nop

    ; This jumps back up to the label.  It's short for "jump
    ; relative", and will end up as an instruction saying
    ; something like "jump backwards five bytes", or however far
    ; back _halt is.  (Different instructions can be different
    ; lengths.)
    jr _halt

Okay! Glad you’re all caught up. The rgbds documentation includes a list of all the available operations (as well as assembler syntax), and once you get used to the short names, I also like this very compact chart of all the instructions and how they compile to machine code. (Note that that chart spells [hl+] as (HLI), for “increment” — the human-readable names are somewhat arbitrary and can sometimes vary between assemblers.)

Now, let’s see what this does!


A completely blank screen, still

Wow! It’s… still nothing. Hang on.

If I open the debugger and hit Break, I find out that the CPU is at address $0120 — before my code — and is on an instruction DD. What’s DD? Well, according to this convenient chart, it’s… nothing. That’s not an instruction.

Hmm.

Problem solving

Maybe it’s time to look at one of those skeleton projects after all. I crack open the smallest one, gb-template, and it seems to be doing the same thing: its code istarts at $0150.

It takes me a bit to realize my mistake here. Practically every Game Boy game starts its code at $0150, but that’s not what the actual hardware specifies. The real start point is $0100, which is immediately before the header! There are only four bytes before the header, just enough for… a jump instruction.

Okay! No problem.

1
2
3
SECTION "entry point", ROM0[$0100]
    nop
    jp $0150

Why the nop? I have no idea, but all of these boilerplate projects do it.

Black screen with repeating columns of white

Uhh.

Well, that’s weird. Not only is the result black and white when I definitely used all four shades, but the whites aren’t even next to each other. (I also had a strange effect where the screen reverted to all white after a few seconds, but can’t reproduce it now; it was fixed by the same steps, though, so it may have been a quirk of a particular mGBA build.)

I’ll save you my head-scratching. I made two mistakes here. Arguably, three!

First: believe it or not, I have to specify the palette. Even in original uncolored Game Boy mode! I can see how that’s nice for doing simple fade effects or flashing colors, but I didn’t suspect it would be necessary. The monochrome palette lives at $ff47 (one of those special high addresses), so I do this before anything else:

1
2
    ld a, %11100100         ; 3 2 1 0
    ld [$ff47], a

I should really give names to some of these special addresses, but for now I’m more interested in something that works than something that’s nice to read.

Second: I specified the colors wrong. I assumed that eight pixels would fit into two bytes as AaBbCcDd EeFfGgHh, perhaps with some rearrangement, but a closer look at Nintendo’s manual reveals that they need to be ABCDEFGH abcdefgh, with the two bits for each pixel split across each byte! Wild.

Handily, rgbds has syntax for writing out pixel values directly: a backtick followed by eight of 0, 1, 2, and 3. I just have to change my code a bit to write two bytes, eight times each. By putting a 16-bit value in a register pair like bc, I can read its high and low bytes out individually via the b and c registers.

1
2
3
4
5
6
7
8
    ld hl, $8000
    ld bc, `00112233
    REPT 8
    ld a, b
    ld [hl+], a
    ld a, c
    ld [hl+], a
    ENDR

Third: strictly speaking, I don’t think I should be writing to $8000 while the screen is on, because the screen may be trying to read from it at the same time. It does happen to work in this emulator, but I have no idea whether it would work on actual hardware. I’m not going to worry too much about this test code; most likely, tile loading will happen all in one place in the real game, and I can figure out any issues then.

This is one of those places where the manual is oddly vague. It dedicates two whole pages to diagrams of how sprites are drawn when they overlap, yet when I can write to display RAM is left implicit.

Well, whatever. It works on my machine.

Stripes of varying shades of gray

Success! I made a thing for the Game Boy.

Ah, but what I wanted was a thing for the Game Boy Color. That shouldn’t be too much harder.

Now in Technicolor

First I update my Makefile to pass the -C flag to rgbfix. That tells it to set a flag in the ROM header to indicate that this game is only intended for the Game Boy Color, and won’t work on the original Game Boy. (In order to pass Nintendo certification, I’ll need an error screen when the game is run on a non-Color Game Boy, but that can come later. Also, I don’t actually know how to do that.)

Oh, and I’ll change the file extension from .gb to .gbc. And while I’m in here, I might as well repeat myself slightly less in this bad, bad Makefile.

1
2
3
4
5
6
7
8
TARGET := gamegirl.gbc

all: $(TARGET)

$(TARGET):
        rgbasm -o main.o main.rgbasm
        rgblink -o $(TARGET) main.o
        rgbfix -C -v -p 0 $(TARGET)

I think := is the one I want, right? Christ, who can remember how this syntax works.

Next I need to define a palette. Again, everything defaults to palette zero, so I’ll update that and not have to worry about specifying a palette for every tile.

This part is a bit weird. Unlike tiles, there’s not a block of addresses somewhere that contains all the palettes. Instead, I have to write the palette to a single address one byte at a time, and the CPU will put it… um… somewhere.

(I think this is because the entire address space was already carved up for the original Game Boy, and they just didn’t have room to expose palettes, but they still had a few spare high addresses they could use for new registers.)

Two registers are involved here. The first, $ff68, specifies which palette I’m writing to. It has a bunch of parts, but since I’m writing to the first color of palette zero, I can leave it all zeroes. The one exception is the high bit, which I’ll explain in just a moment.

1
2
    ld a, %10000000
    ld [$ff68], a

The other, $ff69, does the actual writing. Each color in a palette is two bytes, and a palette contains four colors, so I need to write eight bytes to this same address. The high bit in $ff68 is helpful here: it means that every time I write to $ff69, it should increment its internal position by one. This is kind of like the [hl+] I used above: after every write, the address increases, so I can just write all the data in sequence.

But first I need some colors! Game Boy Color colors are RGB555, which means each color is five bits (0–31) and a full color fits in two bytes: 0bbbbbgg gggrrrrr.

(I got this backwards initially and thought the left bits were red and the right bits were blue.)

Thus, I present, palette loading by hand. Like before, I put the 16-bit color in bc and then write out the contents of b and c. (Before, the backtick syntax put the bytes in the right order; colors are little-endian, hence why I write c before b.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
    ld bc, %0111110000000000  ; blue
    ld a, c
    ld [$ff69], a
    ld a, b
    ld [$ff69], a
    ld bc, %0000001111100000  ; green
    ld a, c
    ld [$ff69], a
    ld a, b
    ld [$ff69], a
    ld bc, %0000000000011111  ; red
    ld a, c
    ld [$ff69], a
    ld a, b
    ld [$ff69], a
    ld bc, %0111111111111111  ; white
    ld a, c
    ld [$ff69], a
    ld a, b
    ld [$ff69], a

Rebuild, and:

Same as before, but now the stripes are colored

What a glorious eyesore!

To be continued

That brings us up to commit 212344 and works as a good stopping point.

Next time: sprites! Maybe even some real art?

GAMES MADE QUICK??? 2.0

Post Syndicated from Eevee original https://eev.ee/release/2018/01/23/games-made-quick-2-0/

🔗 GAMES MADE QUICK??? 2.0 on itch

I realize, with all the cognitive speed and grace of a cat falling out of a chair, that I have my own website where I can announce things that I am doing.

Here is a thing that I am having done: it’s GAMES MADE QUICK??? 2.0, a game jam that runs concurrently with Games Done Quick. The inspiration was that I once spent the entire week of AGDQ doing nothing but watching the stream, which completely ruined my momentum and cost me the following week as well while I struggled to get back up to speed. What a catastrophe!

So my solution was to spend the week making a game instead, which prompted someone to suggest that I make a jam out of it, and so I did. The results were NEON PHASE and also the original GAMES MADE QUICK???.

It’s a bit late to join now, but look forward to the jam during SGDQ, which runs the last week of June! In the meantime, perhaps peruse the fruits of this season’s labor, or at least glance over my thoughts on some of them.

Previously:

Physics cheats

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/06/physics-cheats/

Anonymous asks:

something about how we tweak physics to “work” better in games?

Ho ho! Work. Get it? Like in physics…?

Hitboxes

Hitbox” is perhaps not the most accurate term, since the shape used for colliding with the environment and the shape used for detecting damage might be totally different. They’re usually the same in simple platformers, though, and that’s what most of my games have been.

The hitbox is the biggest physics fudge by far, and it exists because of a single massive approximation that (most) games make: you’re controlling a single entity in the abstract, not a physical body in great detail.

That is: when you walk with your real-world meat shell, you perform a complex dance of putting one foot in front of the other, a motion you spent years perfecting. When you walk in a video game, you press a single “walk” button. Your avatar may play an animation that moves its legs back and forth, but since you’re not actually controlling the legs independently (and since simulating them is way harder), the game just treats you like a simple shape. Fairly often, this is a box, or something very box-like.

An Eevee sprite standing on faux ground; the size of the underlying image and the hitbox are outlined

Since the player has no direct control over the exact placement of their limbs, it would be slightly frustrating to have them collide with the world. This is especially true in cases like the above, where the tail and left ear protrude significantly out from the main body. If that Eevee wanted to stand against a real-world wall, she would simply tilt her ear or tail out of the way, so there’s no reason for the ear to block her from standing against a game wall. To compensate for this, the ear and tail are left out of the collision box entirely and will simply jut into a wall if necessary — a goofy affordance that’s so common it doesn’t even register as unusual. As a bonus (assuming this same box is used for combat), she won’t take damage from projectiles that merely graze past an ear.

(One extra consideration for sprite games in particular: the hitbox ought to be horizontally symmetric around the sprite’s pivot — i.e. the point where the entity is truly considered to be standing — so that the hitbox doesn’t abruptly move when the entity turns around!)

Corners

Treating the player (and indeed most objects) as a box has one annoying side effect: boxes have corners. Corners can catch on other corners, even by a single pixel. Real-world bodies tend to be a bit rounder and squishier and this can tolerate grazing a corner; even real-world boxes will simply rotate a bit.

Ah, but in our faux physics world, we generally don’t want conscious actors (such as the player) to rotate, even with a realistic physics simulator! Real-world bodies are made of parts that will generally try to keep you upright, after all; you don’t tilt back and forth much.

One way to handle corners is to simply remove them from conscious actors. A hitbox doesn’t have to be a literal box, after all. A popular alternative — especially in Unity where it’s a standard asset — is the pill-shaped capsule, which has semicircles/hemispheres on the top and bottom and a cylindrical body in 3D. No corners, no problem.

Of course, that introduces a new problem: now the player can’t balance precariously on edges without their rounded bottom sliding them off. Alas.

If you’re stuck with corners, then, you may want to use a corner bump, a term I just made up. If the player would collide with a corner, but the collision is only by a few pixels, just nudge them to the side a bit and carry on.

An Eevee sprite trying to move sideways into a shallow ledge; the game bumps her upwards slightly, so she steps onto it instead

When the corner is horizontal, this creates stairs! This is, more or less kinda, how steps work in Doom: when the player tries to cross from one sector into another, if the height difference is 24 units or less, the game simply bumps them upwards to the height of the new floor and lets them continue on.

Implementing this in a game without Doom’s notion of sectors is a little trickier. In fact, I still haven’t done it. Collision detection based on rejection gets it for free, kinda, but it’s not very deterministic and it breaks other things. But that’s a whole other post.

Gravity

Gravity is pretty easy. Everything accelerates downwards all the time. What’s interesting are the exceptions.

Jumping

Jumping is a giant hack.

Think about how actual jumping works: you tense your legs, which generally involves bending your knees first, and then spring upwards. In a platformer, you can just leap whenever you feel like it, which is nonsense. Also you go like twenty feet into the air?

Worse, most platformers allow variable-height jumping, where your jump is lower if you let go of the jump button while you’re in the air. Normally, one would expect to have to decide how much force to put into the jump beforehand.

But of course this is about convenience of controls: when jumping is your primary action, you want to be able to do it immediately, without any windup for how high you want to jump.

(And then there’s double jumping? Come on.)

Air control is a similar phenomenon: usually you’d jump in a particular direction by controlling how you push off the ground with your feet, but in a video game, you don’t have feet! You only have the box. The compromise is to let you control your horizontal movement to a limit degree in midair, even though that doesn’t make any sense. (It’s way more fun, though, and overall gives you more movement options, which are good to have in an interactive medium.)

Air control also exposes an obvious place that game physics collide with the realistic model of serious physics engines. I’ve mentioned this before, but: if you use Real Physics™ and air control yourself into a wall, you might find that you’ll simply stick to the wall until you let go of the movement buttons. Why? Remember, player movement acts as though an external force were pushing you around (and from the perspective of a Real™ physics engine, this is exactly how you’d implement it) — so air-controlling into a wall is equivalent to pushing a book against a wall with your hand, and the friction with the wall holds you in place. Oops.

Ground sticking

Another place game physics conflict with physics engines is with running to the top of a slope. On a real hill, of course, you land on top of the slope and are probably glad of it; slopes are hard to climb!

An Eevee moves to the top of a slope, and rather than step onto the flat top, she goes flying off into the air

In a video game, you go flying. Because you’re a box. With momentum. So you hit the peak and keep going in the same direction. Which is diagonally upwards.

Projectiles

To make them more predictable, projectiles generally aren’t subject to gravity, at least as far as I’ve seen. The real world does not have such an exemption. The real world imposes gravity even on sniper rifles, which in a video game are often implemented as an instant trace unaffected by anything in the world because the bullet never actually exists in the world.

Resistance

Ah. Welcome to hell.

Water

Water is an interesting case, and offhand I don’t know the gritty details of how games implement it. In the real world, water applies a resistant drag force to movement — and that force is proportional to the square of velocity, which I’d completely forgotten until right now. I am almost positive that no game handles that correctly. But then, in real-world water, you can push against the water itself for movement, and games don’t simulate that either. What’s the rough equivalent?

The Sonic Physics Guide suggests that Sonic handles it by basically halving everything: acceleration, max speed, friction, etc. When Sonic enters water, his speed is cut; when Sonic exits water, his speed is increased.

That last bit feels validating — I could swear Metroid Prime did the same thing, and built my own solution around it, but couldn’t remember for sure. It makes no sense, of course, for a jump to become faster just because you happened to break the surface of the water, but it feels fantastic.

The thing I did was similar, except that I didn’t want to add a multiplier in a dozen places when you happen to be underwater (and remember which ones need it to be squared, etc.). So instead, I calculate everything completely as normal, so velocity is exactly the same as it would be on dry land — but the distance you would move gets halved. The effect seems to be pretty similar to most platformers with water, at least as far as I can tell. It hasn’t shown up in a published game and I only added this fairly recently, so I might be overlooking some reason this is a bad idea.

(One reason that comes to mind is that velocity is now a little white lie while underwater, so anything relying on velocity for interesting effects might be thrown off. Or maybe that’s correct, because velocity thresholds should be halved underwater too? Hm!)

Notably, air is also a fluid, so it should behave the same way (just with different constants). I definitely don’t think any games apply air drag that’s proportional to the square of velocity.

Friction

Friction is, in my experience, a little handwaved. Probably because real-world friction is so darn complicated.

Consider that in the real world, we want very high friction on the surfaces we walk on — shoes and tires are explicitly designed to increase it, even. We move by bracing a back foot against the ground and using that to push ourselves forward, so we want the ground to resist our push as much as possible.

In a game world, we are a box. We move by being pushed by some invisible outside force, so if the friction between ourselves and the ground is too high, we won’t be able to move at all! That’s complete nonsense physically, but it turns out to be handy in some cases — for example, highish friction can simulate walking through deep mud, which should be difficult due to fluid drag and low friction.

But the best-known example of the fakeness of game friction is video game ice. Walking on real-world ice is difficult because the low friction means low grip; your feet are likely to slip out from under you, and you’ll simply fall down and have trouble moving at all. In a video game, you can’t fall down, so you have the opposite experience: you spend most of your time sliding around uncontrollably. Yet ice is so common in video games (and perhaps so uncommon in places I’ve lived) that I, at least, had never really thought about this disparity until an hour or so ago.

Game friction vs real-world friction

Real-world friction is a force. It’s the normal force (which is the force exerted by the object on the surface) times some constant that depends on how the two materials interact.

Force is mass times acceleration, and platformers often ignore mass, so friction ought to be an acceleration — applied against the object’s movement, but never enough to push it backwards.

I haven’t made any games where variable friction plays a significant role, but my gut instinct is that low friction should mean the player accelerates more slowly but has a higher max speed, and high friction should mean the opposite. I see from my own source code that I didn’t even do what I just said, so let’s defer to some better-made and well-documented games: Sonic and Doom.

In Sonic, friction is a fixed value subtracted from the player’s velocity (regardless of direction) each tic. Sonic has a fixed framerate, so the units are really pixels per tic squared (i.e. acceleration), multiplied by an implicit 1 tic per tic. So far, so good.

But Sonic’s friction only applies if the player isn’t pressing or . Hang on, that isn’t friction at all; that’s just deceleration! That’s equivalent to jogging to a stop. If friction were lower, Sonic would take longer to stop, but otherwise this is only tangentially related to friction.

(In fairness, this approach would decently emulate friction for non-conscious sliding objects, which are never going to be pressing movement buttons. Also, we don’t have the Sonic source code, and the name “friction” is a fan invention; the Sonic Physics Guide already uses “deceleration” to describe the player’s acceleration when turning around.)

Okay, let’s try Doom. In Doom, the default friction is 90.625%.

Hang on, what?

Yes, in Doom, friction is a multiplier applied every tic. Doom runs at 35 tics per second, so this is a multiplier of 0.032 per second. Yikes!

This isn’t anything remotely like real friction, but it’s much easier to implement. With friction as acceleration, the game has to know both the direction of movement (so it can apply friction in the opposite direction) and the magnitude (so it doesn’t overshoot and launch the object in the other direction). That means taking a semi-costly square root and also writing extra code to cap the amount of friction. With a multiplier, neither is necessary; just multiply the whole velocity vector and you’re done.

There are some downsides. One is that objects will never actually stop, since multiplying by 3% repeatedly will never produce a result of zero — though eventually the speed will become small enough to either slip below a “minimum speed” threshold or simply no longer fit in a float representation. Another is that the units are fairly meaningless: with Doom’s default friction of 90.625%, about how long does it take for the player to stop? I have no idea, partly because “stop” is ambiguous here! If friction were an acceleration, I could divide it into the player’s max speed to get a time.

All that aside, what are the actual effects of changing Doom’s friction? What an excellent question that’s surprisingly tricky to answer. (Note that friction can’t be changed in original Doom, only in the Boom port and its derivatives.) Here’s what I’ve pieced together.

Doom’s “friction” is really two values. “Friction” itself is a multiplier applied to moving objects on every tic, but there’s also a move factor which defaults to \(\frac{1}{32} = 0.03125\) and is derived from friction for custom values.

Every tic, the player’s velocity is multiplied by friction, and then increased by their speed times the move factor.

$$
v(n) = v(n – 1) \times friction + speed \times move factor
$$

Eventually, the reduction from friction will balance out the speed boost. That happens when \(v(n) = v(n – 1)\), so we can rearrange it to find the player’s effective max speed:

$$
v = v \times friction + speed \times move factor \\
v – v \times friction = speed \times move factor \\
v = speed \times \frac{move factor}{1 – friction}
$$

For vanilla Doom’s move factor of 0.03125 and friction of 0.90625, that becomes:

$$
v = speed \times \frac{\frac{1}{32}}{1 – \frac{29}{32}} = speed \times \frac{\frac{1}{32}}{\frac{3}{32}} = \frac{1}{3} \times speed
$$

Curiously, “speed” is three times the maximum speed an actor can actually move. Doomguy’s run speed is 50, so in practice he moves a third of that, or 16⅔ units per tic. (Of course, this isn’t counting SR40, a bug that lets Doomguy run ~40% faster than intended diagonally.)

So now, what if you change friction? Even more curiously, the move factor is calculated completely differently depending on whether friction is higher or lower than the default Doom amount:

$$
move factor = \begin{cases}
\frac{133 – 128 \times friction}{544} &≈ 0.244 – 0.235 \times friction & \text{ if } friction \ge \frac{29}{32} \\
\frac{81920 \times friction – 70145}{1048576} &≈ 0.078 \times friction – 0.067 & \text{ otherwise }
\end{cases}
$$

That’s pretty weird? Complicating things further is that low friction (which means muddy terrain, remember) has an extra multiplier on its move factor, depending on how fast you’re already going — the idea is apparently that you have a hard time getting going, but it gets easier as you find your footing. The extra multiplier maxes out at 8, which makes the two halves of that function meet at the vanilla Doom value.

A graph of the relationship between friction and move factor

That very top point corresponds to the move factor from the original game. So no matter what you do to friction, the move factor becomes lower. At 0.85 and change, you can no longer move at all; below that, you move backwards.

From the formula above, it’s easy to see what changes to friction and move factor will do to Doomguy’s stable velocity. Move factor is in the numerator, so increasing it will increase stable velocity — but it can’t increase, so stable velocity can only ever decrease. Friction is in the denominator, but it’s subtracted from 1, so increasing friction will make the denominator a smaller value less than 1, i.e. increase stable velocity. Combined, we get this relationship between friction and stable velocity.

A graph showing stable velocity shooting up dramatically as friction increases

As friction approaches 1, stable velocity grows without bound. This makes sense, given the definition of \(v(n)\) — if friction is 1, the velocity from the previous tic isn’t reduced at all, so we just keep accelerating freely.

All of this is why I’m wary of using multipliers.

Anyway, this leaves me with one last question about the effects of Doom’s friction: how long does it take to reach stable velocity? Barring precision errors, we’ll never truly reach stable velocity, but let’s say within 5%. First we need a closed formula for the velocity after some number of tics. This is a simple recurrence relation, and you can write a few terms out yourself if you want to be sure this is right.

$$
v(n) = v_0 \times friction^n + speed \times move factor \times \frac{friction^n – 1}{friction – 1}
$$

Our initial velocity is zero, so the first term disappears. Set this equal to the stable formula and solve for n:

$$
speed \times move factor \times \frac{friction^n – 1}{friction – 1} = (1 – 5\%) \times speed \times \frac{move factor}{1 – friction} \\
friction^n – 1 = -(1 – 5\%) \\
n = \frac{\ln 5\%}{\ln friction}
$$

Speed” and move factor disappear entirely, which makes sense, and this is purely a function of friction (and how close we want to get). For vanilla Doom, that comes out to 30.4, which is a little less than a second. For other values of friction:

A graph of time to stability which leaps upwards dramatically towards the right

As friction increases (which in Doom terms means the surface is more slippery), it takes longer and longer to reach stable speed, which is in turn greater and greater. For lesser friction (i.e. mud), stable speed is lower, but reached fairly quickly. (Of course, the extra “getting going” multiplier while in mud adds some extra time here, but including that in the graph is a bit more complicated.)

I think this matches with my instincts above. How fascinating!

What’s that? This is way too much math and you hate it? Then don’t use multipliers in game physics.

Uh

That was a hell of a diversion!

I guess the goofiest stuff in basic game physics is really just about mapping player controls to in-game actions like jumping and deceleration; the rest consists of hacks to compensate for representing everything as a box.

Coaxing 2D platforming out of Unity

Post Syndicated from Eevee original https://eev.ee/blog/2017/10/13/coaxing-2d-platforming-out-of-unity/

An anonymous donor asked a question that I can’t even begin to figure out how to answer, but they also said anything else is fine, so here’s anything else.

I’ve been avoiding writing about game physics, since I want to save it for ✨ the book I’m writing ✨, but that book will almost certainly not touch on Unity. Here, then, is a brief run through some of the brick walls I ran into while trying to convince Unity to do 2D platforming.

This is fairly high-level — there are no blocks of code or helpful diagrams. I’m just getting this out of my head because it’s interesting. If you want more gritty details, I guess you’ll have to wait for ✨ the book ✨.

The setup

I hadn’t used Unity before. I hadn’t even used a “real” physics engine before. My games so far have mostly used LÖVE, a Lua-based engine. LÖVE includes box2d bindings, but for various reasons (not all of them good), I opted to avoid them and instead write my own physics completely from scratch. (How, you ask? ✨ Book ✨!)

I was invited to work on a Unity project, Chaos Composer, that someone else had already started. It had basic movement already implemented; I taught myself Unity’s physics system by hacking on it. It’s entirely possible that none of this is actually the best way to do anything, since I was really trying to reproduce my own homegrown stuff in Unity, but it’s the best I’ve managed to come up with.

Two recurring snags were that you can’t ask Unity to do multiple physics updates in a row, and sometimes getting the information I wanted was difficult. Working with my own code spoiled me a little, since I could invoke it at any time and ask it anything I wanted; Unity, on the other hand, is someone else’s black box with a rigid interface on top.

Also, wow, Googling for a lot of this was not quite as helpful as expected. A lot of what’s out there is just the first thing that works, and often that’s pretty hacky and imposes severe limits on the game design (e.g., “this won’t work with slopes”). Basic movement and collision are the first thing you do, which seems to me like the worst time to be locking yourself out of a lot of design options. I tried very (very, very, very) hard to minimize those kinds of constraints.

Problem 1: Movement

When I showed up, movement was already working. Problem solved!

Like any good programmer, I immediately set out to un-solve it. Given a “real” physics engine like Unity prominently features, you have two options: ⓐ treat the player as a physics object, or ⓑ don’t. The existing code went with option ⓑ, like I’d done myself with LÖVE, and like I’d seen countless people advise. Using a physics sim makes for bad platforming.

But… why? I believed it, but I couldn’t concretely defend it. I had to know for myself. So I started a blank project, drew some physics boxes, and wrote a dozen-line player controller.

Ah! Immediate enlightenment.

If the player was sliding down a wall, and I tried to move them into the wall, they would simply freeze in midair until I let go of the movement key. The trouble is that the physics sim works in terms of forces — moving the player involves giving them a nudge in some direction, like a giant invisible hand pushing them around the level. Surprise! If you press a real object against a real wall with your real hand, you’ll see the same effect — friction will cancel out gravity, and the object will stay in midair..

Platformer movement, as it turns out, doesn’t make any goddamn physical sense. What is air control? What are you pushing against? Nothing, really; we just have it because it’s nice to play with, because not having it is a nightmare.

I looked to see if there were any common solutions to this, and I only really found one: make all your walls frictionless.

Game development is full of hacks like this, and I… don’t like them. I can accept that minor hacks are necessary sometimes, but this one makes an early and widespread change to a fundamental system to “fix” something that was wrong in the first place. It also imposes an “invisible” requirement, something I try to avoid at all costs — if you forget to make a particular wall frictionless, you’ll never know unless you happen to try sliding down it.

And so, I swiftly returned to the existing code. It wasn’t too different from what I’d come up with for LÖVE: it applied gravity by hand, tracked the player’s velocity, computed the intended movement each frame, and moved by that amount. The interesting thing was that it used MovePosition, which schedules a movement for the next physics update and stops the movement if the player hits something solid.

It’s kind of a nice hybrid approach, actually; all the “physics” for conscious actors is done by hand, but the physics engine is still used for collision detection. It’s also used for collision rejection — if the player manages to wedge themselves several pixels into a solid object, for example, the physics engine will try to gently nudge them back out of it with no extra effort required on my part. I still haven’t figured out how to get that to work with my homegrown stuff, which is built to prevent overlap rather than to jiggle things out of it.

But wait, what about…

Our player is a dynamic body with rotation lock and no gravity. Why not just use a kinematic body?

I must be missing something, because I do not understand the point of kinematic bodies. I ran into this with Godot, too, which documented them the same way: as intended for use as players and other manually-moved objects. But by default, they don’t even collide with other kinematic bodies or static geometry. What? There’s a checkbox to turn this on, which I enabled, but then I found out that MovePosition doesn’t stop kinematic bodies when they hit something, so I would’ve had to cast along the intended path of movement to figure out when to stop, thus duplicating the same work the physics engine was about to do.

But that’s impossible anyway! Static geometry generally wants to be made of edge colliders, right? They don’t care about concave/convex. Imagine the player is standing on the ground near a wall and tries to move towards the wall. Both the ground and the wall are different edges from the same edge collider.

If you try to cast the player’s hitbox horizontally, parallel to the ground, you’ll only get one collision: the existing collision with the ground. Casting doesn’t distinguish between touching and hitting. And because Unity only reports one collision per collider, and because the ground will always show up first, you will never find out about the impending wall collision.

So you’re forced to either use raycasts for collision detection or decomposed polygons for world geometry, both of which are slightly worse tools for no real gain.

I ended up sticking with a dynamic body.


Oh, one other thing that doesn’t really fit anywhere else: keep track of units! If you’re adding something called “velocity” directly to something called “position”, something has gone very wrong. Acceleration is distance per time squared; velocity is distance per time; position is distance. You must multiply or divide by time to convert between them.

I never even, say, add a constant directly to position every frame; I always phrase it as velocity and multiply by Δt. It keeps the units consistent: time is always in seconds, not in tics.

Problem 2: Slopes

Ah, now we start to get off in the weeds.

A sort of pre-problem here was detecting whether we’re on a slope, which means detecting the ground. The codebase originally used a manual physics query of the area around the player’s feet to check for the ground, which seems to be somewhat common, but that can’t tell me the angle of the detected ground. (It’s also kind of error-prone, since “around the player’s feet” has to be specified by hand and may not stay correct through animations or changes in the hitbox.)

I replaced that with what I’d eventually settled on in LÖVE: detect the ground by detecting collisions, and looking at the normal of the collision. A normal is a vector that points straight out from a surface, so if you’re standing on the ground, the normal points straight up; if you’re on a 10° incline, the normal points 10° away from straight up.

Not all collisions are with the ground, of course, so I assumed something is ground if the normal pointed away from gravity. (I like this definition more than “points upwards”, because it avoids assuming anything about the direction of gravity, which leaves some interesting doors open for later on.) That’s easily detected by taking the dot product — if it’s negative, the collision was with the ground, and I now have the normal of the ground.

Actually doing this in practice was slightly tricky. With my LÖVE engine, I could cram this right into the middle of collision resolution. With Unity, not quite so much. I went through a couple iterations before I really grasped Unity’s execution order, which I guess I will have to briefly recap for this to make sense.

Unity essentially has two update cycles. It performs physics updates at fixed intervals for consistency, and updates everything else just before rendering. Within a single frame, Unity does as many fixed physics updates as it has spare time for (which might be zero, one, or more), then does a regular update, then renders. User code can implement either or both of Update, which runs during a regular update, and FixedUpdate, which runs just before Unity does a physics pass.

So my solution was:

  • At the very end of FixedUpdate, clear the actor’s “on ground” flag and ground normal.

  • During OnCollisionEnter2D and OnCollisionStay2D (which are called from within a physics pass), if there’s a collision that looks like it’s with the ground, set the “on ground” flag and ground normal. (If there are multiple ground collisions, well, good luck figuring out the best way to resolve that! At the moment I’m just taking the first and hoping for the best.)

That means there’s a brief window between the end of FixedUpdate and Unity’s physics pass during which a grounded actor might mistakenly believe it’s not on the ground, which is a bit of a shame, but there are very few good reasons for anything to be happening in that window.

Okay! Now we can do slopes.

Just kidding! First we have to do sliding.

When I first looked at this code, it didn’t apply gravity while the player was on the ground. I think I may have had some problems with detecting the ground as result, since the player was no longer pushing down against it? Either way, it seemed like a silly special case, so I made gravity always apply.

Lo! I was a fool. The player could no longer move.

Why? Because MovePosition does exactly what it promises. If the player collides with something, they’ll stop moving. Applying gravity means that the player is trying to move diagonally downwards into the ground, and so MovePosition stops them immediately.

Hence, sliding. I don’t want the player to actually try to move into the ground. I want them to move the unblocked part of that movement. For flat ground, that means the horizontal part, which is pretty much the same as discarding gravity. For sloped ground, it’s a bit more complicated!

Okay but actually it’s less complicated than you’d think. It can be done with some cross products fairly easily, but Unity makes it even easier with a couple casts. There’s a Vector3.ProjectOnPlane function that projects an arbitrary vector on a plane given by its normal — exactly the thing I want! So I apply that to the attempted movement before passing it along to MovePosition. I do the same thing with the current velocity, to prevent the player from accelerating infinitely downwards while standing on flat ground.

One other thing: I don’t actually use the detected ground normal for this. The player might be touching two ground surfaces at the same time, and I’d want to project on both of them. Instead, I use the player body’s GetContacts method, which returns contact points (and normals!) for everything the player is currently touching. I believe those contact points are tracked by the physics engine anyway, so asking for them doesn’t require any actual physics work.

(Looking at the code I have, I notice that I still only perform the slide for surfaces facing upwards — but I’d want to slide against sloped ceilings, too. Why did I do this? Maybe I should remove that.)

(Also, I’m pretty sure projecting a vector on a plane is non-commutative, which raises the question of which order the projections should happen in and what difference it makes. I don’t have a good answer.)

(I note that my LÖVE setup does something slightly different: it just tries whatever the movement ought to be, and if there’s a collision, then it projects — and tries again with the remaining movement. But I can’t ask Unity to do multiple moves in one physics update, alas.)

Okay! Now, slopes. But actually, with the above work done, slopes are most of the way there already.

One obvious problem is that the player tries to move horizontally even when on a slope, and the easy fix is to change their movement from speed * Vector2.right to speed * new Vector2(ground.y, -ground.x) while on the ground. That’s the ground normal rotated a quarter-turn clockwise, so for flat ground it still points to the right, and in general it points rightwards along the ground. (Note that it assumes the ground normal is a unit vector, but as far as I’m aware, that’s true for all the normals Unity gives you.)

Another issue is that if the player stands motionless on a slope, gravity will cause them to slowly slide down it — because the movement from gravity will be projected onto the slope, and unlike flat ground, the result is no longer zero. For conscious actors only, I counter this by adding the opposite factor to the player’s velocity as part of adding in their walking speed. This matches how the real world works, to some extent: when you’re standing on a hill, you’re exerting some small amount of effort just to stay in place.

(Note that slope resistance is not the same as friction. Okay, yes, in the real world, virtually all resistance to movement happens as a result of friction, but bracing yourself against the ground isn’t the same as being passively resisted.)

From here there are a lot of things you can do, depending on how you think slopes should be handled. You could make the player unable to walk up slopes that are too steep. You could make walking down a slope faster than walking up it. You could make jumping go along the ground normal, rather than straight up. You could raise the player’s max allowed speed while running downhill. Whatever you want, really. Armed with a normal and awareness of dot products, you can do whatever you want.

But first you might want to fix a few aggravating side effects.

Problem 3: Ground adherence

I don’t know if there’s a better name for this. I rarely even see anyone talk about it, which surprises me; it seems like it should be a very common problem.

The problem is: if the player runs up a slope which then abruptly changes to flat ground, their momentum will carry them into the air. For very fast players going off the top of very steep slopes, this makes sense, but it becomes visible even for relatively gentle slopes. It was a mild nightmare in the original release of our game Lunar Depot 38, which has very “rough” ground made up of lots of shallow slopes — so the player is very frequently slightly off the ground, which meant they couldn’t jump, for seemingly no reason. (I even had code to fix this, but I disabled it because of a silly visual side effect that I never got around to fixing.)

Anyway! The reason this is a problem is that game protagonists are generally not boxes sliding around — they have legs. We don’t go flying off the top of real-world hilltops because we put our foot down until it touches the ground.

Simulating this footfall is surprisingly fiddly to get right, especially with someone else’s physics engine. It’s made somewhat easier by Cast, which casts the entire hitbox — no matter what shape it is — in a particular direction, as if it had moved, and tells you all the hypothetical collisions in order.

So I cast the player in the direction of gravity by some distance. If the cast hits something solid with a ground-like collision normal, then the player must be close to the ground, and I move them down to touch it (and set that ground as the new ground normal).

There are some wrinkles.

Wrinkle 1: I only want to do this if the player is off the ground now, but was on the ground last frame, and is not deliberately moving upwards. That latter condition means I want to skip this logic if the player jumps, for example, but also if the player is thrust upwards by a spring or abducted by a UFO or whatever. As long as external code goes through some interface and doesn’t mess with the player’s velocity directly, that shouldn’t be too hard to track.

Wrinkle 2: When does this logic run? It needs to happen after the player moves, which means after a Unity physics pass… but there’s no callback for that point in time. I ended up running it at the beginning of FixedUpdate and the beginning of Update — since I definitely want to do it before rendering happens! That means it’ll sometimes happen twice between physics updates. (I could carefully juggle a flag to skip the second run, but I… didn’t do that. Yet?)

Wrinkle 3: I can’t move the player with MovePosition! Remember, MovePosition schedules a movement, it doesn’t actually perform one; that means if it’s called twice before the physics pass, the first call is effectively ignored. I can’t easily combine the drop with the player’s regular movement, for various fiddly reasons. I ended up doing it “by hand” using transform.Translate, which I think was the “old way” to do manual movement before MovePosition existed. I’m not totally sure if it activates triggers? For that matter, I’m not sure it even notices collisions — but since I did a full-body Cast, there shouldn’t be any anyway.

Wrinkle 4: What, exactly, is “some distance”? I’ve yet to find a satisfying answer for this. It seems like it ought to be based on the player’s current speed and the slope of the ground they’re moving along, but every time I’ve done that math, I’ve gotten totally ludicrous answers that sometimes exceed the size of a tile. But maybe that’s not wrong? Play around, I guess, and think about when the effect should “break” and the player should go flying off the top of a hill.

Wrinkle 5: It’s possible that the player will launch off a slope, hit something, and then be adhered to the ground where they wouldn’t have hit it. I don’t much like this edge case, but I don’t see a way around it either.

This problem is surprisingly awkward for how simple it sounds, and the solution isn’t entirely satisfying. Oh, well; the results are much nicer than the solution. As an added bonus, this also fixes occasional problems with running down a hill and becoming detached from the ground due to precision issues or whathaveyou.

Problem 4: One-way platforms

Ah, what a nightmare.

It took me ages just to figure out how to define one-way platforms. Only block when the player is moving downwards? Nope. Only block when the player is above the platform? Nuh-uh.

Well, okay, yes, those approaches might work for convex players and flat platforms. But what about… sloped, one-way platforms? There’s no reason you shouldn’t be able to have those. If Super Mario World can do it, surely Unity can do it almost 30 years later.

The trick is, again, to look at the collision normal. If it faces away from gravity, the player is hitting a ground-like surface, so the platform should block them. Otherwise (or if the player overlaps the platform), it shouldn’t.

Here’s the catch: Unity doesn’t have conditional collision. I can’t decide, on the fly, whether a collision should block or not. In fact, I think that by the time I get a callback like OnCollisionEnter2D, the physics pass is already over.

I could go the other way and use triggers (which are non-blocking), but then I have the opposite problem: I can’t stop the player on the fly. I could move them back to where they hit the trigger, but I envision all kinds of problems as a result. What if they were moving fast enough to activate something on the other side of the platform? What if something else moved to where I’m trying to shove them back to in the meantime? How does this interact with ground detection and listing contacts, which would rightly ignore a trigger as non-blocking?

I beat my head against this for a while, but the inability to respond to collision conditionally was a huge roadblock. It’s all the more infuriating a problem, because Unity ships with a one-way platform modifier thing. Unfortunately, it seems to have been implemented by someone who has never played a platformer. It’s literally one-way — the player is only allowed to move straight upwards through it, not in from the sides. It also tries to block the player if they’re moving downwards while inside the platform, which invokes clumsy rejection behavior. And this all seems to be built into the physics engine itself somehow, so I can’t simply copy whatever they did.

Eventually, I settled on the following. After calculating attempted movement (including sliding), just at the end of FixedUpdate, I do a Cast along the movement vector. I’m not thrilled about having to duplicate the physics engine’s own work, but I do filter to only things on a “one-way platform” physics layer, which should at least help. For each object the cast hits, I use Physics2D.IgnoreCollision to either ignore or un-ignore the collision between the player and the platform, depending on whether the collision was ground-like or not.

(A lot of people suggested turning off collision between layers, but that can’t possibly work — the player might be standing on one platform while inside another, and anyway, this should work for all actors!)

Again, wrinkles! But fewer this time. Actually, maybe just one: handling the case where the player already overlaps the platform. I can’t just check for that with e.g. OverlapCollider, because that doesn’t distinguish between overlapping and merely touching.

I came up with a fairly simple fix: if I was going to un-ignore the collision (i.e. make the platform block), and the cast distance is reported as zero (either already touching or overlapping), I simply do nothing instead. If I’m standing on the platform, I must have already set it blocking when I was approaching it from the top anyway; if I’m overlapping it, I must have already set it non-blocking to get here in the first place.

I can imagine a few cases where this might go wrong. Moving platforms, especially, are going to cause some interesting issues. But this is the best I can do with what I know, and it seems to work well enough so far.

Oh, and our player can deliberately drop down through platforms, which was easy enough to implement; I just decide the platform is always passable while some button is held down.

Problem 5: Pushers and carriers

I haven’t gotten to this yet! Oh boy, can’t wait. I implemented it in LÖVE, but my way was hilariously invasive; I’m hoping that having a physics engine that supports a handwaved “this pushes that” will help. Of course, you also have to worry about sticking to platforms, for which the recommended solution is apparently to parent the cargo to the platform, which sounds goofy to me? I guess I’ll find out when I throw myself at it later.

Overall result

I ended up with a fairly pleasant-feeling system that supports slopes and one-way platforms and whatnot, with all the same pieces as I came up with for LÖVE. The code somehow ended up as less of a mess, too, but it probably helps that I’ve been down this rabbit hole once before and kinda knew what I was aiming for this time.

Animation of a character running smoothly along the top of an irregular dinosaur skeleton

Sorry that I don’t have a big block of code for you to copy-paste into your project. I don’t think there are nearly enough narrative discussions of these fundamentals, though, so hopefully this is useful to someone. If not, well, look forward to ✨ my book, that I am writing ✨!

Weekly roundup: Games, mostly

Post Syndicated from Eevee original https://eev.ee/dev/2017/08/22/weekly-roundup-games-mostly/

  • cc: I fixed an obscure timing issue and… well, that’s all, how exciting.

    I should really talk about this game more, but it’s big and I’m not the one designing it and I don’t have a good sense of how much we want to keep under wraps yet?

  • blog: I wrote a stream of consciousness about how Nazis are bad.

  • potluck: What? Yes! I worked on potluck a bit, believe it or not. I’ve decided to try procedurally generating the whole game — something I’ve wanted to do for a while, and a decision that has piqued my interest in potluck considerably. Step one was to clean up all my map code, which was entangled with parsing Tiled’s JSON format, to make it actually possible to generate a map. I finally did that and made an extremely basic proof of concept that just varies the floor height.

  • fox flux: The usual brief work on player sprites. The game has a lot of them.

  • gamedev: I made a video game with glip again! It was a birthday present for two of our friends, and it’s extremely specific to them and basically incomprehensible to anyone else, so I haven’t decided yet whether it’d be appropriate to release publicly. But we made something pretty coherent on a whim in two and a half days and that’s nice.

I’m currently working on veekun, which has finally progressed to the point that it has data appearing within the website! Hallelujah. I expect there’ll be plenty of stuff to clean up, but this is a tremendous leap forwards. I’ll be so glad to have this off my plate at last, argh.

A few tidbits on networking in games

Post Syndicated from Eevee original https://eev.ee/blog/2017/05/22/a-few-tidbits-on-networking-in-games/

Nova Dasterin asks, via Patreon:

How about do something on networking code, for some kind of realtime game (platformer or MMORPG or something). 😀

Ah, I see. You’re hoping for my usual detailed exploration of everything I know about networking code in games.

Well, joke’s on you! I don’t know anything about networking.

Wait… wait… maybe I know one thing.

Doom

Surprise! The thing I know is, roughly, how multiplayer Doom works.

Doom is 100% deterministic. Its random number generator is really a list of shuffled values; each request for a random number produces the next value in the list. There is no seed, either; a game always begins at the first value in the list. Thus, if you play the game twice with exactly identical input, you’ll see exactly the same playthrough: same damage, same monster behavior, and so on.

And that’s exactly what a Doom demo is: a file containing a recording of player input. To play back a demo, Doom runs the game as normal, except that it reads input from a file rather than the keyboard.

Multiplayer works the same way. Rather than passing around the entirety of the world state, Doom sends the player’s input to all the other players. Once a node has received input from every connected player, it advances the world by one tic. There’s no client or server; every peer talks to every other peer.

You can read the code if you want to, but at a glance, I don’t think there’s anything too surprising here. Only sending input means there’s not that much to send, and the receiving end just has to queue up packets from every peer and then play them back once it’s heard from everyone. The underlying transport was pluggable (this being the days before we’d even standardized on IP), which complicated things a bit, but the Unix port that’s on GitHub just uses UDP. The Doom Wiki has some further detail.

This approach is very clever and has a few significant advantages. Bandwidth requirements are fairly low, which is important if it happens to be 1993. Bandwidth and processing requirements are also completely unaffected by the size of the map, since map state never touches the network.

Unfortunately, it has some drawbacks as well. The biggest is that, well, sometimes you want to get the world state back in sync. What if a player drops and wants to reconnect? Everyone has to quit and reconnect to one another. What if an extra player wants to join in? It’s possible to load a saved game in multiplayer, but because the saved game won’t have an actor for the new player, you can’t really load it; you’d have to start fresh from the beginning of a map.

It’s fairly fundamental that Doom allows you to save your game at any moment… but there’s no way to load in the middle of a network game. Everyone has to quit and restart the game, loading the right save file from the command line. And if some players load the wrong save file… I’m not actually sure what happens! I’ve seen ZDoom detect the inconsistency and refuse to start the game, but I suspect that in vanilla Doom, players would have mismatched world states and their movements would look like nonsense when played back in each others’ worlds.

Ah, yes. Having the entire game state be generated independently by each peer leads to another big problem.

Cheating

Maybe this wasn’t as big a deal with Doom, where you’d probably be playing with friends or acquaintances (or coworkers). Modern games have matchmaking that pits you against strangers, and the trouble with strangers is that a nontrivial number of them are assholes.

Doom is a very moddable game, and it doesn’t check that everyone is using exactly the same game data. As long as you don’t change anything that would alter the shape of the world or change the number of RNG rolls (since those would completely desynchronize you from other players), you can modify your own game however you like, and no one will be the wiser. For example, you might change the light level in a dark map, so you can see more easily than the other players. Lighting doesn’t affect the game, only how its drawn, and it doesn’t go over the network, so no one would be the wiser.

Or you could alter the executable itself! It knows everything about the game state, including the health and loadout of the other players; altering it to show you this information would give you an advantage. Also, all that’s sent is input; no one said the input had to come from a human. The game knows where all the other players are, so you could modify it to generate the right input to automatically aim at them. Congratulations; you’ve invented the aimbot.

I don’t know how you can reliably fix these issues. There seems to be an entire underground ecosystem built around playing cat and mouse with game developers. Perhaps the most infamous example is World of Warcraft, where people farm in-game gold as automatically as possible to sell to other players for real-world cash.

Egregious cheating in multiplayer really gets on my nerves; I couldn’t bear knowing that it was rampant in a game I’d made. So I will probably not be working on anything with random matchmaking anytime soon.

Starbound

Let’s jump to something a little more concrete and modern.

Starbound is a procedurally generated universe exploration game — like Terraria in space. Or, if you prefer, like Minecraft in space and also flat. Notably, it supports multiplayer, using the more familiar client/server approach. The server uses the same data files as single-player, but it runs as a separate process; if you want to run a server on your own machine, you run the server and then connect to localhost with the client.

I’ve run a server before, but that doesn’t tell me anything about how it works. Starbound is an interesting example because of the existence of StarryPy — a proxy server that can add some interesting extra behavior by intercepting packets going to and from the real server.

That means StarryPy necessarily knows what the protocol looks like, and perhaps we can glean some insights by poking around in it. Right off the bat there’s a list of all the packet types and rough shapes of their data.

I modded StarryPy to print out every single decoded packet it received (from either the client or the server), then connected and immediately disconnected. (Note that these aren’t necessarily TCP packets; they’re just single messages in the Starbound protocol.) Here is my quick interpretation of what happens:

  1. The client and server briefly negotiate a connection. The password, if any, is sent with a challenge and response.

  2. The client sends a full description of its “ship world” — the player’s ship, which they take with them to other servers. The server sends a partial description of the planet the player is either on, or orbiting.

  3. From here, the server and client mostly communicate world state in the form of small delta updates. StarryPy doesn’t delve into the exact format here, unfortunately. The world basically freezes around you during a multiplayer lag spike, though, so it’s safe to assume that the vast bulk of game simulation happens server-side, and the effects are broadcast to clients.

The protocol has specific message types for various player actions: damaging tiles, dropping items, connecting wires, collecting liquids, moving your ship, and so on. So the basic model is that the player can attempt to do stuff with the chunk of the world they’re looking at, and they’ll get a reaction whenever the server gets back to them.

(I’m dimly aware that some subset of object interactions can happen client-side, but I don’t know exactly which ones. The implications for custom scripted objects are… interesting. Actually, those are slightly hellish in general; Starbound is very moddable, but last I checked it has no way to send mods from the server to the client or anything similar, and by default the server doesn’t even enforce that everyone’s using the same set of mods… so it’s possible that you’ll have an object on your ship that’s only provided by a mod you have but the server lacks, and then who knows what happens.)

IRC

Hang on, this isn’t a video game at all.

Starbound’s “fire and forget” approach reminds me a lot of IRC — a protocol I’ve even implemented, a little bit, kinda. IRC doesn’t have any way to match the messages you send to the responses you get back, and success is silent for some kinds of messages, so it’s impossible (in the general case) to know what caused an error. The most obvious fix for this would be to attach a message id to messages sent out by the client, and include the same id on responses from the server.

It doesn’t look like Starbound has message ids or any other solution to this problem — though StarryPy doesn’t document the protocol well enough for me to be sure. The server just sends a stream of stuff it thinks is important, and when it gets a request from the client, it queues up a response to that as well. It’s TCP, so the client should get all the right messages, eventually. Some of them might be slightly out of order depending on the order the client does stuff, but that’s not a big deal; anyway, the server knows the canonical state.

Some thoughts

I bring up IRC because I’m kind of at the limit of things that I know. But one of those things is that IRC is simultaneously very rickety and wildly successful: it’s a decade older than Google and still in use. (Some recent offerings are starting to eat its lunch, but those are really because clients are inaccessible to new users and the protocol hasn’t evolved much. The problems with the fundamental design of the protocol are only obvious to server and client authors.)

Doom’s cheery assumption that the game will play out the same way for every player feels similarly rickety. Obviously it works — well enough that you can go play multiplayer Doom with exactly the same approach right now, 24 years later — but for something as complex as an FPS it really doesn’t feel like it should.

So while I don’t have enough experience writing multiplayer games to give you a run-down of how to do it, I think the lesson here is that you can get pretty far with simple ideas. Maybe your game isn’t deterministic like Doom — although there’s no reason it couldn’t be — but you probably still have to save the game, or at least restore the state of the world on death/loss/restart, right? There you go: you already have a fragment of a concept of entity state outside the actual entities. Codify that, stick it on the network, and see what happens.

I don’t know if I’ll be doing any significant multiplayer development myself; I don’t even play many multiplayer games. But I’d always assumed it would be a nigh-impossible feat of architectural engineering, and I’m starting to think that maybe it’s no more difficult than anything else in game dev. Easy to fudge, hard to do well, impossible to truly get right so give up that train of thought right now.

Also now I am definitely thinking about how a multiplayer puzzle-platformer would work.

Weekly roundup: Business as usual

Post Syndicated from Eevee original https://eev.ee/dev/2017/04/26/weekly-roundup-business-as-usual/

  • art: I bought a tablet — like, one with a screen, not a graphics tablet. But it’s also a graphics tablet, with Wacom pen pressure and everything. So now I can draw in bed, and have been doing some of that. Results are mixed; drawing on a screen is pretty weird, and undo is not as readily accessible.

    I continued touching up my Lexy sprite from Isaac’s Descent — working on the walk cycle now. Not the most exciting work, but it looks much better.

    I also drew a snake sipping coffee, for reasons.

  • book: I finally got through collision detection and some physics. This thing is really coming along now, and it’s cool to see it develop! I doubt I’ll finish the first chapter this month, alas — turns out there is a whole lot of groundwork to lay even for my tiny baby first game.

  • blog: Finalized a Sitepoint article. Did some more work on a new homepage, which is finally nearing completion, I hope.

  • spline: Wow, haven’t touched this in a while. Started implementing editing things, because glip has needed that for forever. Didn’t finish.

  • lunar depot 38: I participated in Ludum Dare 38! This time I teamed up with glip to do the Jam, so we spent three days scrambling to make a game together. (Spoilers for next week’s roundup: the game is Lunar Depot 38.)