Getting Bounced for a Bad ID

A cautionary tale about choosing the wrong identifier.

Alex Strand

October 31, 2024 6 minute read

The gate attendant at Delta terminal in Oakland made the announcement that the flight was now boarding just as the flood of PagerDuty alarms started blowing up my phone.

I scrambled to open my laptop, sitting on the floor with the computer balanced on my knee as I quieted my phone. Slack notifications flew into action and confirmed that the entire production system had ground to a halt. Everything was broken. The engineer on-call pinged the incident channel with a summary that one of our most critical services, a service that nearly every other service depended upon, was failing write requests to the MySQL table that backed it.

“Every write is failing because we can’t generate new IDs.”

Damn it.

I’m All Outta ints

The ghost of one of the earliest design decisions for this system had come back to haunt the team many years later. If you’ve had even a passing familiarity with MySQL you’ve likely seen a getting started guide where the instructions tell you to create a table with a primary key column that looks something like:

CREATE TABLE books (
    id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    author VARCHAR(255) NOT NULL
);

In particular the part of this CREATE statement we want to focus on is:

id INT AUTO_INCREMENT PRIMARY KEY

Let’s tease apart what this is doing:

Create a field named id
The id field should be an INT type. The INT type in MySQL defaults to 4 bytes and unless otherwise specified is signed. The maximum value for a signed 32-bit integer is 2,147,483,647.
The AUTO_INCREMENT clause means the id field will automatically increase by 1 on every insert.
In summary: id is a 32-bit integer which uses a monotonically increasing sequence to increment on each insert.

If you’re sensing that this MySQL table is going to be really unhappy when you try to create more than 2,147,483,647 rows: trust your intuition.

It’s easy in hindsight to see that overflowing the primary key for the table was going to be a problem, but if you hadn’t experienced that type of scale before you probably wouldn’t think of it. Many applications never reach this level of scale. Until they do. And it’s hard to cast judgement on the developers who use these defaults. At the time of writing there are over 88,000 files on GitHub matching the exact statement above, and that’s just public code!

Prepare for Takeoff

Choosing the right identifier for a critical table isn’t on the top of anyone’s list of biggest concerns when they start writing an application. That said, this seemingly minor design decision can be tricky to unwind months or years later. Many relational databases require taking an exclusive lock on the table while altering a column type, which means blocking reads and writes and thus taking an outage. And very large tables can take minutes or hours to update. Ouch.

One of the design principles we have for StatelyDB is to choose the best operational defaults so you can operate at scale without losing sleep. We believe that all of the collective lessons we’ve learned should be embedded into the database so you can benefit from other people’s mistakes without having to learn them yourself at the worst possible time.

We’ve personally lived a collection of horrors related to operating services where someone had picked an identifier that seemed reasonable at the time but later turned out to be not so great. And that makes a lot of sense because often times a new system starts with a developer designing a data model that solves the problems they had at the early stages of a project. And then that data model gets cemented forever, because it’s one of the most painful things to change.

In StatelyDB we offer a few options for generating identifiers for you. We call these “initial values” and are designed to be a smart default when you want the database to choose for you.

Here’s what the MySQL table above would look in StatelyDB as an Elastic Schema definition:

import { itemType, uuid, string } from "@stately-cloud/schema";

itemType("Book", {
  keyPath: "/books-:id",
  fields: {
    id: {
      type: uuid,
      initialValue: "uuid",
    },
    title: { type: string },
    author: { type: string }
  },
});

Unpacking this a bit:

id is a UUID type, with an initialValue of UUID. This means we want StatelyDB to generate a new UUID for each Item when it is created, unless one is manually provided. The UUID type is a custom data type that is backed by a list of 16 bytes.
The keyPath for this Item includes the id field and an example Key Path would look like /books-GvpCWqiQTa6qpfBHR-ktZw. Our UUIDs are encoded as binary in key paths to save space.

We like UUIDs for their global uniqueness, readability and adding entropy to the partition space. You’ve got several options for initial value:

sequence gives you the classic monotonically-increasing number you’re familiar with when coming from the relational database world, but it’s unique only within its parent path. Pairs well with the uint64 type.
uuid provides a globally unique, 128-bit UUIDv4. Pairs only with the uuid or bytes types.
rand53 provides a random 53-bit numeric value that pairs well with uint64. Why 53 bits? Developers using Javascript will appreciate staying within the maximum-safe Number limit. Like sequence, this one is only guaranteed unique within its parent path.

Takeaway

Look, picking the right identifier might seem like a tiny detail when you’re starting a new project. But that innocent-looking INT type might be the right choice today, but the wrong choice months from now. The structure of your data model can be really painful to change later in the life of an application, especially if you’re using a relational database. And when it breaks, it’s gonna hurt.

That’s exactly why we’re building StatelyDB to be flexible. Our Elastic Schema lets you change your mind later, so you don’t paint yourself into a corner with bad identifiers. And since we’ve made mistakes too, we’ve made it easy in StatelyDB to easily use UUIDs or unsigned 64-bit integers that will provide the room your application needs to grow. But whether you’re using our database or something else, take it from someone who learned this the hard way: spend a few extra minutes thinking through your identifier strategy. Your future self will appreciate not getting paged during boarding call.

Share Share Post Copy

Getting Bounced for a Bad ID

I’m All Outta ints

Prepare for Takeoff

Takeaway

Latest from our blog

Using DynamoDB today? Want to try StatelyDB?