Game Networking - Part 1 - Bit Packing • Skarsh Dev

Lately, I’ve been working on a multiplayer game protocol for a game I’m writing in Odin. It’s in its early stages, as I’m going step by step through Glenn Fiedler’s excellent Building a Game Network Protocol blog series. Currently, I have a working implementation that covers the Serialization Strategies post, and I’ve already encountered several interesting topics. As the title of this post suggests, we’re going to take a look at bit packing here. Many of the points I am making is coming from that blog series, and my implementation is mostly an Odin port of what Glenn is showing in C++. The main purpose of this blog post is to dig deeper into some of the code from that blog series. You can find the current implementation of my project here game-networking.

Just to be clear, I am not an expert in networking, game networking protocols, or bit packing. I’m simply trying to explain what I know and have learned through this process. If you discover any errors in my explanations or anywhere else, then please let me know!

Motivation

You might be wondering what bit packing has to do with building a multiplayer game protocol. The short answer is latency, but let’s dive deeper to understand the connection.

For networking in games and many other real-time applications, only the latest state of the “world” or “simulation” is relevant. This is inherently different from information transferred over an HTTP request, where it’s critical that every bit of information is received by the receiver in the correct order.

HTTP runs on top of another protocol called TCP, which is designed to guarantee that every bit of the request you send will be received on the other end in the intended order. Imagine sending a TCP packet every second for one minute, where the payload contains an incrementing number. The receiving side should receive 60 packets with the same incrementing numbers in order. TCP guarantees this even if the network between the sender and receiver is unstable or drops out. While this is great for reliably downloading files, browsing the web and so on, it’s not ideal for games because it introduces unwanted latency.

Why UDP?

So, what’s the alternative? UDP (User Datagram Protocol). In contrast to TCP, UDP provides no guarantees about whether a packet will be received or in what order it will arrive. It does offer some error correction functionality, similar to TCP, but it’s not very strong. Many developers append their own error detection codes, such as CRC32. Why is this better than TCP? While it’s true that we have to handle more ourselves, UDP gives us the flexibility we need to minimize latency as much as possible.

The Importance of Latency

Every gamer knows that those precious milliseconds matter, especially in highly competitive games and for the overall smoothness and quality of the multiplayer experience. Few things ruin a multiplayer game as quickly as bad network lag and desynchronization.

To reduce latency, there are two core principles: send as little data as possible, and send the freshest data possible. TCP suffers most in the “freshness” department. To uphold its guarantees, TCP might have to retransmit old packets containing information about game states that occurred many frames, or even seconds ago if the network is unstable. This is far from ideal for a multiplayer gaming experience.

That said, some games have successfully implemented multiplayer network protocols over TCP, so it’s not entirely inapplicable. However, it’s important to understand the trade-offs involved.

Bit Packing

Now, let’s address the “sending as little data as possible” part. This is where bit packing comes into play. Sending less data is important for several reasons It enables us to send fewer packets over the network, reducing the chances of packet loss. It also helps on putting less pressure on the required network bandwidth between the sender and receiver.

By bit packing, we can ensure that we pack the information we need to send and receive as tightly as possible, with no wasted bits. To begin outlining the bit packing, we need to consider its main purpose and our requirements. For game networking, we know that we’ll need to write bytes onto a UDP socket. We also know that we want those bytes to be as packed with information as possible, meaning we want as few dead bits (useless 0s) as we can manage. I highly recommend reading the first two posts in the Building a Game Network Protocol series before continuing. They go into more detail in some areas that I won’t cover, while my focus here will be to deeply explain the details of bit packing and serialization code, with the twist that it’s written in Odin instead of C++.

So, what do we need to write something onto a UDP socket? The first thing to look at is the socket API. If we check the man page for socket, we’ll see that it mentions different socket types and the send syscall. Looking at the send man page, we can see that there are send and sendto syscalls, both of which take a void buf[len] as one of the arguments. This is essentially a byte buffer. There are more details to this, but this is what we need to know for now.

Since we’re writing this bit packer in Odin, we won’t directly call syscalls or use C for this. Let’s take a look at the Odin net package instead. The relevant procedures in the net package are:

send_udp :: proc(socket: UDP_Socket, buf: []u8, to: Endpoint) -> (bytes_written: int, err: Network_Error) {…}

recv_udp :: proc(socket: UDP_Socket, buf: []u8) -> (bytes_read: int, remote_endpoint: Endpoint, err: Network_Error) {…}

As we can see, both the send and receive UDP procedures take a byte slice []u8 as arguments. You can think of the send procedure as the writing side, and the recv procedure as the reading side. This is exactly the two sides our bit packing will have as well, we’ll have a byte buffer that we can write to and read from.

One important thing to mention about the bit packing implementation we’re writing is that we’re sending and receiving packets and data to and from potentially malicious actors. This is a crucial context to keep in mind, as it places restrictions on what we can do in our implementation. In many cases, the trivial solution would be to just do a memcopy and call it a day. But in our case, where we can’t necessarily trust the bytes we would be copying, that’s a bad idea. We need to ensure that when reading, the data format is what we expect and is legal. If it’s not, we need to handle those cases as they arise. One last thing before we start implementing stuff, we’re gonna split the bit packing into two parts, a Bit Writer and a Bit Reader.

The Bit Writer

Finally we’re at what this blog post were supposed to be about, I hope it hasn’t been too bad yet, but I think the context is important. Let’s take a look at our Bit Writer struct then.

Bit_Writer :: struct {
  buffer:       []u32,
  scratch:      u64,
  scratch_bits: u32,
  word_index:   u32,
  num_bits:     u32,
  bits_written: u32,
}

Alright, not too bad but there’s some weird things going on here. The buffer field, that makes sense, but why is a slice of the type []u32? We’re supposed to write bytes, not unsigned integers! The main idea behind this is that modern cpus like it when things are 4-byte / 32-bit (word) aligned. So when we write to the buffer, we’ll write 4-bytes / 32-bits at a time instead of a single byte at a time. So we’re doing this in the hope of better performance, but we’ll pay for it in terms of some complexity. Now, I’ve done no benchmarking of this solution compared to one that just writes a byte at a time, so I am just relying on Glenn Fiedler’s word here that its a good idea. It makes for an interesting implementation at least! Okay, we’re assuming it’s a good idea to write a word at a time, and we cant just transmute the []u32 into []u8 so that should be fine. But how do we actually do the write one word at a time part? Let’s look at some different basic types and think a bit about it.

u32 - This is the trivial case for us, this fits perfectly into the u32 values we’re writing.
i32 - This is a signed integer, but doesn’t really matter since we’re not exactly interested in whether there’s a difference between the types, just that we can pack the bits tightly, wich we can easily do in this case.
u8 - Now, this becomes a bit more problematic, there’s room for 4 u8s in a u32, if we only write one, then we’re only utilizing 25% of the word, not exactly fulfilling our goal of having as few dead bits as possible.
u64 - Well, this doesn’t fit, does it? We’ll have to split it up into two separate u32 values.
Monster struct, which not only doesn’t fit into one u32, but it doesn’t align with a multiple of a u32. And we can’t just write the first 32-bits, then the last 8-bits and then waste 24 bits.

Monster :: struct {
    id: u16,
    health: u8,
    stamina: u8,
    size: u8,
}

Alright, obviously the other fields in the Bit_Writer struct has to have something to do with this. The two fields that sticks out the most are probably scratch and scratch_bits. The others can make sense without knowing a lot more about how the Bit_Writer works. The word_index is exactly what it sounds like, its the index to the current word we’re writing to, important for keeping track of where we are. num_bits are the total amount of bits of the buffer, essentially its len(buffer) * 32, but for convenience we’ll keep it as a field. And at the end it is bits_written, this is used to keep track of how many bits we’ve written, which will be used in a post later when ensuring alignment and so on. We’ll see how scratch and scratch_bit fields are used through examples.

We need to think a bit more about what it means to pack things as thightly as we can. This essentially means that we want to write only the bits that we need to represent the value that we want. Think of the number 14, in binary that’s 1110, which means that we need 4 bits to represent it. Note that it cannot be represented in 3 bits, even though the 0th bit is 0, in the same way that we need a 0 in the decimal number 10 (ten) to represent it correctly. So for packing purposes we only want to write these 4 bits, no more, since that’s all that is needed. This gives us a hint about how the write procedure for the Bit_Writer should look like. It needs to take the value it should write, and how many bits that it should write. Here’s the signature of the write_bits procedure for the Bit_Writer.

write_bits :: proc(writer: ^Bit_Writer, value: u32, bits: u32) -> bool

For those of you that are unfamiliar with Odin, the ^ before the Bit_Writer means that writer is a pointer to a Bit_Writer. Also we’re returning a bool here as part of the error handling. Alright, this looks good, we can give it a u32 value, and we can specify how many bits to use and it returns a bool if it fails. Seems good. Now, how do we deal with the cases we looked at earlier, let’s take the case of writing a u8 to begin with.

This turns out to be pretty easy, remember, we always know which type we’re writing, so in the case for a u8 we can just cast it to a u32, which is safe since we’re upcasting and then we just write 8 bits. Pretty nice. But what if the value of the u8 does not need 8 bits? Well that requires a bit more finnicky-ness and we’ll deal with in a post later, now we need to get the foundations down. In general though we’ll follow this method, anything that’s less than a u32 we’ll cast to a u32 and write it with the bits that’s needed. If it’s larger than a 32-bits, we’ll have to break it up into chunks that can each fit into a u32. Also, its not that we’ll always be perfectly efficient and not waste a single bit ever, that can be pretty difficult sometimes.

Let’s take a look back at the case of the Monster struct, if we wanted to write that using the write_bits procedure this is how I’d like it to be.

write_monster :: proc(writer: ^Bit_Writer, monster: Monster) -> bool {
    if !write_bits(writer, monster.id, 16) {
        return false
    }
    if !write_bits(writer, monster.health, 8) {
        return false
    }
    if !write_bits(writer, monster.stamina, 8) {
        return false
    }
    if !write_bits(writer, monster.size, 8) {
        return false
    }

    return true
}

I can just write the each of the fields into the value parameter since they can each fit into a u32. The problem is that these does not fit into a single u32 in the buffer. The thing to do that makes sense is to write the first 32-bits of the struct into one word in the buffer, then the last 8 bits into the next word in the buffer. But its not as easy as it sounds, we need to keep track of the bits we’ve written, and depending if we’ve gone past the 32-bit boundary of a word we need to write those bits into the next word. Keeping track of how many bits we’ve written is the purpose of the scratch_bits field, and the scratch is supposed to be temporary storage where we can write the bits before we can write them all together into a word in the buffer. In addition, we’re propagating the boolean value from the write_bits so that whatever called the write_monster procedure knows whether we were able to successfully write it or not.

This part includes some bit masking, so if you don’t know what that is I suggest just quickly reading up on it, it’s a pretty simple case for us here, and I’ll try to explain it as well as I can. The first issue we have when implementing the write_bits procedure is, how can we actually get the n bits from the value that’s passed into the the procedure. In the example of the monster_id, we’re passing it as a u32 but its only a u16, and we’re specifying that we’re writing 16 bits. Which means we want to get the 16 lowest bits of the 32-bit value. We can do this with something called a bitmask, and for creating the mask we’re gonna do a bit shift trick. We’re working on a little endian system, meaning that the Least Significant Bit is furthest to the right. The trick we’ll do to create the mask is the following.

mask := u32(1 << bits) - 1

Let’s start with the inner 1 << bits part. This means that we’re shifting the value 1 left bits number of times, meaning in our case of writing the monster_id, bits = 16. The value 1 can be represented in binary with the 32-bit value.

// 0d denotes a decimal value, 0b denotes a binary value
0d1 = 0b0000_0000_0000_0000_0000_0000_0000_0001

Let’s then left shift 1 by 16 bits

1 << 16 = 0b0000_0000_000_0001_0000_0000_0000_0000

Now this does not help us with getting the 16 lowest bit, for that we would like the bitmask

0b0000_0000_000_0000_1111_1111_1111_1111

so that we can easily biwise & or AND it with the monster_id value to get the lowest 16 bits. Now this is were subtracting 1 comes in.

0b0000_0000_000_0001_0000_0000_0000_0000 - 0d1 = 0b0000_0000_000_0000_1111_1111_1111_1111

If you’re not familiar with binary this might not make sense, but think of it like having the value 1000 in decimal and then subtracting 1 and getting 999, but in binary its all ones instead of nines. This binary value is stored into the mask variable. To actually get the 16 value we want we need to & the value variable that’s passed into write_bits with the mask.

masked_value := value & mask

The nice thing about what we did here, is that it will work for all values for the bits parameter as long as its between 0 and 32, something we’ll have to ensure is checked later. Our implementation of the write_bits procedure now look like this.

write_bits :: proc(writer: ^Bit_Writer, value: u32, bits: u32) -> bool {
    mask := u32(1 << bits) - 1
    masked_value := value & mask

    return true
}

The next step is to ensure that we can accumulate values up until we have 32 bits ready to be written into the buffer. To make this happen we’ll finally start using our scratch and scratch_bits fields. Again, the idea of the scratch is precisely to accumulate these values until we’re ready to write them to the buffer, so let’s see how we are going to do that.

write_bits :: proc(writer: ^Bit_Writer, value: u32, bits: u32) -> bool {
    mask := u32(1 << bits) - 1
    masked_value := value & mask

    writer.scratch |= u64(masked_value) << writer.scratch_bits

    return true
}

Here we’ve added the line writer.scratch |= u64(masked_value) << writer.scratch_bits, there’s a couple of things that happens. The |= operator can be thought of as += except that we bitwise OR the value on the left with the value of the right and then store it into the value on the left. The purpose of doing this bitwise OR is that we want to write the bits set from the u64(masked_value) << writer.scratch_bits into the writer.scratch field. This looks kinda what we want, we could maybe think that all we would need to do is writer.scratch |= u64(masked_value), but that’s not the case. The reason we need to left-shift the masked_value by writer.scratch_bits is because that’s how many bits we’ve written so far. Think about the case for our Monster struct, and the write_monster procedure we came up with earlier, let’s take a look at it here again for clarity.

write_monster :: proc(writer: ^Bit_Writer, monster: Monster) -> bool {
    if !write_bits(writer, monster.id, 16) {
        return false
    }
    if !write_bits(writer, monster.health, 8) {
        return false
    }
    if !write_bits(writer, monster.stamina, 8) {
        return false
    }
    if !write_bits(writer, monster.size, 8) {
        return false
    }

    return true
}

If we start out with the first write_bits call here, we’d write the monster.id value and 16 bits. For the next call to write_bits we’d write the monster.health value and 8 bits. If we are not left-shifting by the amount of bits we written so far we’ll just overwrite however many bits in the scratch we’re specifying in the write_bits call. This shows that we now need to keep track of those bits written in the scratch_bits, so we extend the procedure.

write_bits :: proc(writer: ^Bit_Writer, value: u32, bits: u32) -> bool {
    mask := u32(1 << bits) - 1
    masked_value := value & mask

    writer.scratch |= u64(masked_value) << writer.scratch_bits
    writer.scratch_bits += bits
    writer.bits_written += bits

    return true
}

So far this makes sense, except, why do we need both the scratch_bits and the bits_written field? Well, its because we need to do some resetting of scratch_bits for every word, while the bits_written will keep track of the total amount of bits we’ve written so far. We’ll use bits_written in a later post.

So let’s think through this, again with the write_monster procedure as reference. We write the monster.id and 16 bits, then the scratch should keep whatever value that was in its lower 16 bits, scratch_bits and bits_written should be 16. Next we write the monster.health value. No problem, the scratch will contain monster.health value in the bits [16, 23], since we start at bit 0. The scratch_bits will have the value 16 + 8 = 24, and bits_written will have the value 16 + 8 = 24. Alright, let’s do the stamina pretty quickly. It’s stored in the bits [24, 31] of the scratch, scratch_bits = 32 and bits_written = 32. Now we’re perfectly aligned with a word (32-bit) boundary, so we should write this to the buffer. So let’s do exactly that before we continue writing the last monster.size field in the write_monster procedure. At a high level, what we need to do is check for whether we’ve reached the 32-bit boundary, and then somehow write the bits in scratch into the buffer.

write_bits :: proc(writer: ^Bit_Writer, value: u32, bits: u32) -> bool {
    mask := u32(1 << bits) - 1
    masked_value := value & mask

    writer.scratch |= u64(masked_value) << writer.scratch_bits
    writer.scratch_bits += bits
    writer.bits_written += bits

    if writer.scratch_bits >= 32 {
        writer.buffer[writer.word_index] = u32(writer.scratch & 0xFFFF_FFFF)
    }

    return true
}

In this implementation we check if the scratch bits is equal to or larger than 32, this means that if we’re perfectly on the boundary, or we have written past it, which obviously can happen. Think about a hyptothetical case where the scratch_bits is 28 and we write another 8 bits, making the scratch_bits = 28 + 8 = 36. This is essentially 4 bits into the next word, so that’s something our implementation also needs to deal with, but the newly added line does not totally cover it. We can see that we take the lower 32-bits of the scratch and AND it with hex mask 0xFFFF_FFFF, this is the same as 0b1111_1111_1111_1111_1111_1111_1111_1111. This is actually the reason for my 4 spaced formatting of binary numbers. One hex value is 4 bits, which makes it easy to see. Then we take those lower 32 bits and write them into the buffer at the index of the write.word_index. This is the field that keeps track of which word we’re supposed to write to.

Okay, we’re almost at the end of implementing write_bits, the last things that remain is to ensure that we reset the state of the scratch, scratch_bits and of course increment the word_index. This is how we’ll do it.

write_bits :: proc(writer: ^Bit_Writer, value: u32, bits: u32) -> bool {
    mask := u32(1 << bits) - 1
    masked_value := value & mask

    writer.scratch |= u64(masked_value) << writer.scratch_bits
    writer.scratch_bits += bits
    writer.bits_written += bits

    if writer.scratch_bits >= 32 {
        writer.buffer[writer.word_index] = u32(writer.scratch & 0xFFFF_FFFF)
        writer.word_index += 1
        writer.scratch >>= 32
        writer.scratch_bits -= 32
    }

    return true
}

This addition is not too complicated, like I said, we increment the word_index, then we right-shift the scratch by 32 bits. There is a pretty subtle but important detail about that, we can’t just set the scratch to 0, because remember if we write past the 32-bit boundary, we need to keep track of those bits for the next word. That is what this does, in the case we looked at where we had written 36 bits before the if writer.scratch_bits >= 32 would trigger, this makes sure that we keep those 4 bits. This is the same reason we’re subtracting 32 from scratch_bits instead of setting it to 0. We need to know that we have written 4 bits into the next word. And that’s pretty much it! Well, except error handling of course, let’s add that.

write_bits :: proc(writer: ^Bit_Writer, value: u32, bits: u32) -> bool {

    if bits == 0 {
        return true
    }

    if bits < 0 || bits > 32 {
        return false
    }

    if writer.word_index * 32 + writer.scratch_bits + bits > writer.num_bits {
        return false
    }

    mask := u32(1 << bits) - 1
    masked_value := value & mask

    writer.scratch |= u64(masked_value) << writer.scratch_bits
    writer.scratch_bits += bits
    writer.bits_written += bits

    if writer.scratch_bits >= 32 {
        writer.buffer[writer.word_index] = u32(writer.scratch & 0xFFFF_FFFF)
        writer.word_index += 1
        writer.scratch >>= 32
        writer.scratch_bits -= 32
    }

    return true
}

These checks mostly checks if we’re trying to write an illegal amount of bits, or if we’re writing 0 bits, then we can just return true immediately. The idea is that we should always be allowed to write 0 bits, it doesn’t really make sense, but returning false is not really correct either. The last check is the most complicated one, it essentially checks that we’re not gonna write beyond the amount of bits that can be written to in the buffer.

This is it for the write_bits procedure, we’ll probably not touch this in a really long time now, if at all. It’s also a really important piece to understand since its a major workhorse in the bit packing and eventually the game network protocol. It’s worth noting that we should write a lot of tests for this, which I have in my implementation, but that’s too tedious to deal with here. Let’s continue with the read_bits procedure, we should be able to do that one pretty quickly now, since it’s basically doing the reverse of writing. At then end we’ll look at the flush_bits procedure, somthing that’s needed to make sure that we’ve written all the bits that might be left in the scratch to the buffer.

The Bit Reader

The Bit Reader is very similar to the Bit Writer, actually it got the exact same fields in it’s structure, it’s used pretty much the same way, except we’re reading from the buffer instead writing into it.

Bit_Reader :: struct {
  buffer:       []u32,
  scratch:      u64,
  scratch_bits: u32,
  num_bits:     u32,
  bits_read:    u32,
  word_index:   u32,
}

Let’s take a look at the implementation, which is also quite similar but has some important distinctions, naturally.

read_bits :: proc(reader: ^Bit_Reader, bits: u32) -> (value: u32, success: bool) {
  if bits == 0 {
    return 0, true
  }

  if bits < 0 || bits > 32 {
    return 0, false
  }

  if bits > 32 || reader.bits_read + bits > reader.num_bits {
    return 0, false
  }

  if reader.word_index > u32(len(reader.buffer)) {
    return 0, false
  }

  if reader.scratch_bits < bits {
    reader.scratch |=
      u64(reader.buffer[reader.word_index]) << reader.scratch_bits
    reader.scratch_bits += 32
    reader.word_index += 1
  }

  mask := u64((1 << bits) - 1)
  value = u32(reader.scratch & mask)

  reader.scratch >>= bits
  reader.scratch_bits -= bits
  reader.bits_read += bits

  return value, true
}

The signature looks very similar, except now we’re returning multiple values, the value we’re reading from the buffer and whether that read was successful. The bits here is now how many bits we’re gonna read from the buffer. We have a lot of the same checks to ensure that the Bit_Reader is in a state where it makes sense to read from. For the details its easier if we use a concrete case like we did with the Monster struct and the write_monster for the Bit_Writer. So let’s make a read_monster procedure now, and think through how it would go.

read_monster :: proc(reader: ^Bit_Reader) -> (Monster, bool) {
    id, id_ok := read_bits(reader, 16)
    if !id_ok {
        return Monster{}, false
    }

    health, health_ok := read_bits(reader, 8)
    if !health_ok {
        return Monster{}, false
    }

    stamina, stamina_ok := read_bits(reader, 8)
    if !stamina_ok {
        return Monster{}, false
    }

    size, size_ok := read_bits(reader, 8)
    if !size_ok {
        return Monster{}, false
    }

    return Monster{id, health, stamina, size}, true
}

When we read the id field in the read_monster procedure, our scratch_bits, scratch and word_index is all initialized to 0. So, scratch_bits is definitely smaller than bits, so we immediately go into that part of the code. Then we can see that we fill the scratch with current word from the buffer left-shifted by scratch_bits. Then we left-shift by the amount of scratch_bits, for the case of the id field, this will be 0. We’ll understand better why this left-shift is needed soon.

if reader.scratch_bits < bits {
    reader.scratch |=
        u64(reader.buffer[reader.word_index]) << reader.scratch_bits
    reader.scratch_bits += 32
    reader.word_index += 1
}

Next, we add 32 to the scratch_bits, so in our case scratch_bits now holds the value 32. Then we increment the word_index by 1, since we have read a whole word into the scratch, so that makes sense.

After the if section we create the bit mask, the same we did for the writer, but now we use it to retreive the lower 32-bits from the scratch instead of writing into it. Then we right-shift the scratch by bits and store that into scratch again. This is done so that the lower bits we’ve already read is now overwritten by the remaining upper remaining bits, this is important. Next we subtract the amount of bits we’ve read from the scratch_bits, so in our case of reading the id field, scratch_bits will now hold the value 16, since 32 - 16 = 16. For the case of the reader, we can think of the scratch_bits as a value that tells us how many bits we have left to read, before we reach the next word, where 0 means that we have reached the next word boundary. Finally we increment the amount of bits_read, and return the value and boolean true for success.

mask := u64((1 << bits) - 1)
value = u32(reader.scratch & mask)

reader.scratch >>= bits
reader.scratch_bits -= bits
reader.bits_read += bits

return value, true

Let’s quickly do a couple of more iterations to get to another interesting case. The next field in the Monster struct is the health field, which is a u8, so we need to read 8 bits. This time scratch_bits is larger than bits so we skip the if and just get the value out as normal, subtract the bits, increment the bits_read and return true. Next up is the stamina field, also a u8 so we read 8 bits. This time scratch_bits and bits is exactly equal, so we skip the if block this time around aswell, it’s also worth noting that this is exactly at the next word boundary. Finally we read the size field, and its also a u8, but now we’re back at scratch_bits being smaller than bits, so we need to do that whole thing again. We read the word from the current word_index, and this is important to note, the word_index points at the next word to be read from. At the beginning it was 0, then we read all of that, now its 1, and we’ll increment it to 2 at the end of this if block, and add 32 to the scratch_bits. Then we do the usual end of procedure stuff.

The final thing we still don’t fully understand is what happens when things don’t align so well, imagine we have a situation where the scratch_bits is 8, so we’re 8 bits away from a word boundary, but we then read 16 bits, so 8 bits is in the current scratch from the word we’ve been processing, and the next 8 bits is from the next word. In that situation we left-shift by 8 bits and read in the whole word from the boundary. The purpose of the left-shift now is to push back the the lower scratch_bits number of bits we haven’t processed yet, so that it doesn’t overwrite the unprocessed ones. Also, it’s worth noting here that the scratch_bits in this case will hold the value 32 + 8 = 40. Then when we read out the value using the mask, which will take the lower 16 bits. Then we right-shift, and now we subtract bits from the scratch_bits again, making sure that we’re on track on knowing how many bits is left until next word boundary, which is 40 - 16 = 24. Which makes sense right, we were 8 bits from the boundary before beginning and we read 8 bits into the next word, then we should be 24 bits away from the word after that.

That’s it, that’s all we’ll say about the Bit_Reader, and it’s quite a lot for a small procedure, but as you can see, there is a lot going on with those bit manipulation tricks. I hope I have explained it well. Naturally this function also requires a lot of testing to build up trust in its correctness, which we’ll leave out from here.

Flushing

The last thing we’ll have to do before both the writing and reading of bits is complete is that we need to ensure all the bits in the writer has been flushed. This is only a problem for Bit_Writer and not the Bit_Reader. The reason why is because what’s left in the scratch in case of the Bit_Writer is bits we’ve specifically written into it. For the Bit_Reader those bits are just whatever is the rest of the final word we’ve written to. Luckily for us, the flush_bits procedure is pretty simple.

flush_bits :: proc(writer: ^Bit_Writer) -> bool {
  if writer.word_index * 32 + writer.scratch_bits > writer.num_bits {
    return false
  }

  if writer.scratch_bits > 0 {
    writer.buffer[writer.word_index] = u32(writer.scratch & 0xFFFF_FFFF)
    writer.word_index += 1
  }

  writer.scratch = 0
  writer.scratch_bits = 0
  return true
}

Let’s briefly discuss this before packing things up (hehe). Firstly, we check if the writer is in a legal state, as usual. Then we check if the scratch_bits is larger than 0, if it is it means that we have something left in the scratch that we haven’t written to the buffer. If there’s something left, we just AND everything in the scratch with the word in the buffer we’re currently writing to. We then increment the word_index. Lastly we set the scratch and scratch_bits to 0 and return true.

Now, I am a bit torn about the necessity of incrementing the word_index and resetting the scratch and scratch_bits, because after we have flushed we have introduced a synchronization issue between the Bit_Writer and Bit_Reader. This is an important thing to note, we must always make sure that the writer side and the reader side expect the exact same number of bits in the same order. Which means that if only the Bit_Writer flushes, then we’ve broken that synchronization. Just imagine if the writer wrote the first 8 bits as the health, then the next 16 as the id, and the reader does the opposite. That’s gonna be chaos, but the write_bits and read_bits won’t complain. This is why we’ll only use this as a primitive procedure and build up more advanced serialization procedure on top of it, which we can then write good tests to ensure that it works as expected. A little foreshadowing to the next blog post maybe…