Encoding Sortable Binary Database Keys

When building applications that need to store hierarchical data efficiently, developers often reach for patterns like DynamoDB’s single-table design, where you organize data into grouped, sortable lists that can be queried in a single request.

Consider a social media application where you store users, posts, comments and reactions in one DynamoDB table. In a single-table design, you structure your keys so that you can query all the posts and comments for a single user at once, perfect for displaying a user’s timeline in your app.

All the data is partitioned together by user, and there’s a natural hierarchy. Users have Posts which have Comments which have Reactions.

Your keys in DynamoDB would look like:

Partition key (all types): USER#4c9d36e5-6b19-4e6a-828c-226ed667458a
Sort keys:
- Post: POST#1234
- Comment: POST#1234#COMMENT#1678901234
- Reaction: POST#1234#COMMENT#1678901234#REACT#42

All the objects share the same user ID partition key. The structure of the sort keys means you can get everything for a user, or use prefix queries to get a single post and all of its comments and reactions in one query.

Unfortunately, this isn’t as simple as it might look. Notice that we have different data types stuffed into the same keys: strings, numbers (integers), even a UUID. Stuffing them all into a single string reveals a problem especially with the integers: they don’t sort correctly as strings. POST#10 sorts before POST#2 because DynamoDB uses lexicographic sorting, which cannot be changed. When you query a range of posts based on their IDs, you’ll get them out of order, and some of them will be missing entirely.

And because DynamoDB charges for items by the kilobyte (rounded up), we also want to keep the encoded key as small as possible.

At StatelyDB we solved this problem by building a special binary encoding that preserves the correct sorting, even when you mix different data types in a single key.

‍

StatelyDB Key Paths‍

StatelyDB key paths are easier to work with while following the same general idea as the keys you might create manually for a single-table design. They’re meant to feel a bit more like URLs or file system paths:

DynamoDB: [PK: USER#4c9d36e5-6b19-4e6a-828c-226ed667458a, SK: POST#1234#COMMENT#1678901234#REACT#42]
StatelyDB: /user-4c9d36e5-6b19-4e6a-828c-226ed667458a/post-1234/comment-1678901234/react-42

Key paths can contain three types of data:

Strings: "user", "post", "comment"
Integers: 1678901234, 42, -17
Binary data: UUIDs (16 bytes), arbitrary byte sequences

They express both the partition key and sort key when using the default DynamoDB backend. For /user-4c9d36e5-6b19-4e6a-828c-226ed667458a/post-1234/comment-1678901234, the first segment (/user-4c9d36e5-6b19-4e6a-828c-226ed667458a) becomes the partition key, and the remaining segments (/post-1234/comment-1678901234) become the sort key.

‍

Binary Key Encoding

StatelyDB’s DynamoDB backend always uses tables with binary key types, and we encode our key paths into binary. This approach handles all data types uniformly.

The encoded data has two parts. The actual key segments (user, 4c9d36e5-6b19-4e6a-828c-226ed667458a, post, comment, etc.) come first, with each segment separated by a 0x01 byte (taking the place of / or - in our string key path). A 0x00 byte separates the list of segments from the encoding metadata that describes each segment’s type:

{seg1}(0x01){seg2}(0x01)...{segN}(0x00){EncodingInfo1}...{EncodingInfoN}

The encoding metadata consists of protobuf-style varints (variable length integers, gotta save space) corresponding to each segment. Each varint uses three bits to indicate the type (string, binary, or integer) and the rest of the bits tell us the segment’s length in bytes. This metadata appears at the end of the encoded key so it doesn’t affect sorting. Otherwise, the length value would cause shorter strings to always sort before longer strings (e.g. “b” before “aaa”), which would be incorrect.

Within the encoded key, strings and binary data are trivially sortable lexicographically. UUIDs are stored as 16 bytes in binary. Why treat UUIDs as binary? We care about the size of keys, and a UUID string takes 36 bytes "550e8400-e29b-41d4-a716-446655440000", while the same UUID as binary takes only 16 bytes.

But what about integers?

‍

The Tricky Part: Sortable Integers

Consider these integers as strings: "1", "2", "10", "11". They sort lexicographically as "1", "10", "11", "2", which is not how you want integers to be sorted. We need an encoding where the binary representation of integers maintains numerical order when compared byte-by-byte.

We could represent all integers as full 8-byte unsigned integers, shifting signed integers into the unsigned space since two’s complement encoding also wouldn’t sort correctly in binary. However, we really want to minimize wasted space to reduce costs. We can do better by taking inspiration from varints, where smaller integers take less space. A number that fits in one byte should not waste 7 additional bytes of storage.

But standard varints don’t sort lexicographically either: you can’t compare the first byte of a long number with the first byte of a short number. For example, 129 (0x81 0x01 as varint) should sort before 256 (0x80 0x02), but the first bytes suggest the opposite order.

The key insight is to prefix the encoding with a “sort byte” that expresses both the number’s length in bytes and ensures correct lexicographic sorting:

Sort byte values for different number ranges
    Number Range     | Bytes | Sort Byte
  -2^64+1 to -2^56   |   8   |     0
  -2^48 to -2^56-1   |   7   |     1
       ...           |  ...  |   ...
   -255 to 0         |   1   |     7
      1 to 255       |   1   |     8
    256 to 65535     |   2   |     9
       ...           |  ...  |   ...
   2^56 to 2^64-1    |   8   |    15

The sort byte only uses 4 bits to encode the length and sign of the number. The other 4 bits are left at zero to make sure integers always sort before strings. After that we just have the nonzero big-endian bytes of the number. This creates perfect numerical ordering where negative integers sort before positive integers, and smaller integers (requiring fewer bytes) sort first. Let’s walk through some concrete examples:

Simple cases:

1 → 0x08 0x01 (sort byte 8 = “1-byte positive”, then value 1)
6 → 0x08 0x06
255 → 0x08 0xFF (largest 1-byte positive)

Multi-byte integers:

256 → 0x09 0x01 0x00 (sort byte 9 = “2-byte positive”, big-endian 256)
1234 → 0x09 0x04 0xD2 (1234 = 0x04 0xD2 in hex)

Negative integers require bit inversion:

-1 → 0x07 0xFE (~1 = 0xFE)
-2 → 0x07 0xFD (~2 = 0xFD)

Why invert the bits for negative integers? Without inversion, -1 would encode as 0x07 0x01 and -2 as 0x07 0x02, making -2 sort after -1. With bit inversion, -2 becomes 0x07 0xFD and -1 becomes 0x07 0xFE, so -2 correctly sorts first. We also treat zero as a negative number (encoding it as 0x07 0xFF) to ensure it doesn’t conflict with our 0x00 delimiter that separates segments from metadata.

Here are some more examples that make it clear how these both sort correctly and save space:

-(1<<64 - 1): (0000|0000) 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
-(1<<64 - 2): (0000|0000) 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x01
-(1<<56 + 1): (0000|0000) 0xFE 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFE
-(1<<56)    : (0000|0000) 0xFE 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
-(1<<56 - 1): (0000|0001) 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
-(1<<48 - 1): (0000|0010) 0x00 0x00 0x00 0x00 0x00 0x00
-(1<<40 - 1): (0000|0011) 0x00 0x00 0x00 0x00 0x00
-(1<<32 - 1): (0000|0100) 0x00 0x00 0x00 0x00
-257        : (0000|0110) 0xFE 0xFF
-17         : (0000|0111) 0xEE
-16         : (0000|0111) 0xEF
-2          : (0000|0111) 0xFD
-1          : (0000|0111) 0xFE
0           : (0000|0111) 0xFF
1           : (0000|1000) 0x01
2           : (0000|1000) 0x02
3           : (0000|1000) 0x03
16          : (0000|1000) 0x10
17          : (0000|1000) 0x11
256         : (0000|1001) 0x01 0x00
(1<<32 - 1) : (0000|1011) 0xFF 0xFF 0xFF 0xFF
(1<<40 - 1) : (0000|1100) 0xFF 0xFF 0xFF 0xFF 0xFF
(1<<48 - 1) : (0000|1101) 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
(1<<56 - 1) : (0000|1110) 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF
(1<<56)     : (0000|1111) 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(1<<56 + 1) : (0000|1111) 0x01 0x00 0x00 0x00 0x00 0x00 0x00 0x01
(1<<64 - 2) : (0000|1111) 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFE
(1<<64 - 1) : (0000|1111) 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF

The result is that all integers sort in numerical order when compared as binary data, even in the middle of a longer string that also mixes in other data types.

You might have also noticed that we can encode 65 bits of information here: a full 64 bits plus a sign. This means we can properly sort signed and unsigned integers together.

‍

Putting It All Together

Here is how we encode the full key path from our earlier example, which contains mixed data types:

Input: /user-4c9d36e5-6b19-4e6a-828c-226ed667458a/post-1234/comment-1678901234/react-42

Segments section:

"user"     0x75, 0x73, 0x65, 0x72
-          0x01
<uuid>     0x4c9d36e56b194e6a828c226ed667458a
/          0x01
"post"     0x70, 0x6F, 0x73, 0x74
-          0x01
1234       0x09, 0x04, 0xd2 (3-byte encoded integer)
/          0x01
"comment"  0x63, 0x6F, 0x6D, 0x6D, 0x65, 0x6E, 0x74
-          0x01
1678901234 0x0b, 0x64, 0x11, 0xff, 0xf2 (5-byte encoded integer)
/          0x01
"react"    0x72, 0x65, 0x61, 0x63, 0x74
-          0x01
42         0x08, 0x2A (1-byte encoded integer)
separator: 0x00
metadata:  8 encoding bytes

Comments now sort chronologically by timestamp, and post 42 correctly comes after post 9 but before post 100. And the key is 63 bytes in total (including the encoding metadata) vs. 80 bytes as a string.

‍

Conclusion

With some careful design, StatelyDB manages to provide a human-readable API layer with key paths that still sort intuitively in the database, avoiding a common pitfall of string key paths. Our philosophy is to absorb complexity like this ourselves rather than making every developer have to think about (and work around) how string sorting affects their query patterns.

StatelyDB Key Paths‍

Binary Key Encoding

The Tricky Part: Sortable Integers

Putting It All Together

Conclusion

Ship Your Next Feature This Week

Turn Your Data Into Your Competitive Edge

Encoding Sortable Binary Database Keys

StatelyDB Key Paths‍

Binary Key Encoding

The Tricky Part: Sortable Integers

Putting It All Together

Conclusion

Ship Your Next Feature This Week

Turn Your Data Into Your Competitive Edge

Get StatelyDB Updates

Ship Your Next Feature This Week