Declaring Better Database Migrations
Moving away from file ordering and towards dependencies
Software has a very odd quirk. We’re very comfortable with the concept of dependencies for certain aspects of software development, such as the libraries and dependency trees for the code we run on our servers (or on our customers’ machines). We even manage dependencies inside the code we write. Files depend on other files, modules depend on other modules. Function A calls function B, and Class Foo inherits from Class Bar. We develop tooling to visualize, manage, build, analyze and run these dependencies. It’s part of many developers’ daily lives - and most wouldn’t know how to write code without them. Dependencies form the critical backbone of most modern software.
And yet, we avoid any sort of dependency tooling for database migrations. We somehow say or excuse that dependencies don’t really exist for databases. Or, that databases are somehow only linearly dependent. It goes A → B → C → D and that’s it. No other changes.
This system also aggravates merge conflicts - especially “hidden” conflicts. Version control shows conflicts between edits to the same file - but not separate files. Yet this system relies on filename ordering. Meaning that we can have two different migration files create a conflict in the order - but since version control sees two distinct files we end up with no conflict when we go to merge. The conflict only happens when we try to migrate. This sucks.
And this is before we even get to database integrity verification. No off-the-shelf tool I’ve used provides any means to validate that the final database is what you expect. Nothing has checks for logic in functions or stored procedures. No sort of ability to run the selects or inserts that your code would run just to make sure it’s all working right. It’s sad, really, that we don’t have that. Which means that these migration scripts must be manually verified - a task that is non-trivial when onboarding a new engineer or setting up a new release environment.
It’s not uncommon for developers to be frustrated with SQL databases. I am too. But SQL is very powerful, the databases are capable, and once they’re properly running they’re amazing to use. It’s just that the actual setup of the database - especially the “tools” we use to do the setup - is lacking.
So, I have an idea, and I’m building it. I want to address the above problems. The way I’m doing it is not by having a folder of only SQL files, but rather by having a folder of JSON or TOML files that describe how the SQL files relate to each other and how to run them. These files describe the migration plan, and let us do things other tools can’t.
The Migration Metadata
SQL is great for expressing relations inside a database. It’s bad at expressing relations between items not in a database. For instance, with SQL I can declare foreign keys between tables in a database, create and query views, make stored procedures, etc. I cannot, however, say that create_session_table.up.sql depends on create_user_table.up.sql. That, unfortunately, is not available in SQL files.
We could try extending SQL, or adding it to a comment, or create a “meta” SQL file to describe SQL files in some kind of virtual table or in-memory database. But, those are more complicated than what we need. All we need is a way to define the following data:
A migration name or identifier
Dependencies between migration names and identifiers
The role of SQL files in that migration
In traditional migration systems, these are all defined implicitly, which is to say by convention. The file name typically defines the migration name, role, and dependencies all in one. The name is usually broken up into three parts, version number (defines order which is to say “this migration depends on all previous migrations”), name which is a human-readable name of the migration, and either “.up” or “.down” for a forward or rollback migration respectively. This means we end up with a directory like the following:
migrations/
├─ v1__create_user_table.up.sql
├─ v1__create_user_table.down.sql
├─ v2__create_session_table.up.sql
├─ v2__create_session_table.down.sql
├─ v3__create_company_table.up.sql
├─ v3__create_company_table.down.sql
├─ v4__alter_session_table.up.sql
├─ v4__alter_session_table.down.sqlThat looks okay at first, but it quickly leads to a folder of thousands of files once projects get more mature. The solution would be folders - except I have not seen tools that cleanly allow folders. The only third-party tool I’ve used which allows folders required all migrations (and their order) to be declared in a single XML file. This file was a pain to edit, and had a ton of merge conflicts since every migration added across the entire company had to be put in that one XML file.
<!-- recreation of what this sort-of looked like -->
<migrations>
<migration>
<name>create_user_table</name>
<up>up/create_user_table.sql</up>
<down>down/create_user_table.sql</down>
</migration>
<migration>
<name>create_session_table</name>
<up>up/create_session_table.sql</up>
<down>down/create_session_table.sql</down>
</migration>
<migration>
<name>create_company_table</name>
<up>up/create_company_table.sql</up>
<down>down/create_company_table.sql</down>
</migration>
<migration>
<name>alter_session_table</name>
<up>up/alter_session_table.sql</up>
<down>down/alter_session_table.sql</down>
</migration>
</migrations>The plus was that you didn’t have version control claim there was no conflict. That was a huge plus. The downside is that version control always claimed there was a conflict - even if the sql files didn’t conflict.
Let’s build on this by splitting from one file to multiple files. Each JSON file would have the metadata for that migration. Also, we’ll switch from XML to JSON and/or TOML. Those include type details rather than treat everything as a string, so as we expand our metadata we can have type encoding as part of the file format. And, since metadata will be JSON or TOML, we can just recursively search to find all of them, meaning we can start nesting files in directories. Here’s what it would look like on the file system:
migrations/
├─ companies/
│ ├─ up/
│ │ ├─ table.sql
│ ├─ down/
│ │ ├─ table.sql
│ ├─ create_table.json
├─ sessions/
│ ├─ up/
│ │ ├─ table.sql
│ │ ├─ add_company_col.sql
│ ├─ down/
│ │ ├─ add_company_col.sql
│ │ ├─ table.sql
│ ├─ add_company_column.json
│ ├─ create_table.json
├─ users/
│ ├─ up/
│ │ ├─ table.sql
│ ├─ down/
│ │ ├─ table.sql
│ ├─ create_table.jsonWe can now organize our code into directories. Now instead of scrolling through a thousand files to find which ones touch the users tables, we can go to the users folder directly! But, a good file structure is meaningless if our system can’t use the metadata. So, let’s build up that metadata file. We’ll start by recreating the XML alter_session_table config in a JSON file, and then see what’s lacking.
{
"name": "alter_session_table",
"up": "sessions/up/add_company_col.sql",
"down": "sessions/down/add_company_col.sql"
}That is a good start, but it’s not enough. The system doesn’t know how this migration relates to the other migrations. We could try adding in a ordering number or order field, but that’s going to be a pain to manage - especially as our org grows.
However, we don’t need to define a global order. We only need to define dependencies. To make this concrete, let’s look at the SQL:
ALTER TABLE "sessions" ADD COLUMN company_id UUID REFERENCES companies(id);This statement only relies on the sessions and companies tables. We don’t need to have the users tables to run the above statement. So, let’s declare that we depend on those two migrations in our JSON file.
{
"name": "alter_session_table",
"up": "sessions/up/add_company_col.sql",
"down": "sessions/down/add_company_col.sql",
"deps": ["create_session_table", "create_company_table"]
}This works great! Now our system knows that it needs to run both “create_session_table” and “create_company_table” before it runs “alter_session_table.” With this, our system can now determine an order based on dependencies rather than based on convention or file system ordering. This means instead of a line:
create_users_table
create_sessions_table
create_companies_table
alter_sessions_tablewe now get a tree:
┌──────────────────┐ ┌──────────────────────┐
│create_users_table│ │create_companies_table│
└────────┬─────────┘ └─────────┬────────────┘
│ │
▼ │
┌─────────────────────┐ │
│create_sessions_table├──┐ │
└─────────────────────┘ │ │
│ │
▼ ▼
┌────────────────────┐
│alter_sessions_table│
└────────────────────┘ We now have a way to create new migrations, avoid merge conflicts, and define metadata around our migrations. It also means we can now run our migrations in any order so long as our dependencies are respected. In other words, it does not matter if we run create_users_table before create_companies_table or vice versa. Either order works.
Which means that if two developers add new, independent migrations then it doesn’t matter what order they are ran - or more importantly, it does not matter which order the PRs are merged. It also means that one developer does not need to rename their migration file and get another PR approval simply because another developer merged a migration first! This is a huge time (and bug) saver. I’ve had many botched PRs because someone else merged their migration before I did. I’ve also had so many botched deploys because multiple migrations were merged around the same time and neither developer rebased before merging.
Now, it doesn’t matter. If they’re independent then we have a system where it’s valid to run those independent migrations in any order and we still get the same final database! That’s exciting!
Making things leaner
We have a great system now. However, we’re running into a “fun” little issue. Each simple, one-liner migration gets a lot of files on disk (three to be exact). This is okay for large files, but our example is one line up and one line down. Two lines gets three files. That’s, not great.
Let’s make things leaner by adding in an option to embed those one-liners.
We’ll do that by adding in a “file” and “embed” object for our up/down. Our JSON now looks like this:
{
"name": "alter_session_table",
"up": {
"file": "sessions/up/add_company_col.sql"
},
"down": {
"file": "sessions/down/add_company_col.sql"
},
"deps": ["create_session_table", "create_company_table"]
}Now that we know it’s a file, we can switch to have things be raw SQL.
{
"name": "alter_session_table",
"up": {
"raw": "ALTER TABLE \"sessions\" ADD COLUMN company_id UUID REFERENCES companies(id);"
},
"down": {
"raw": "ALTER TABLE \"sessions\" DROP COLUMN company_id;"
},
"deps": ["create_session_table", "create_company_table"]
}We just saved two files for every one-liner! For two liners, we can just use an array. But instead of having our “raw” be an array, we can have our up/down be an array. That way, for our large migrations, we can split it into multiple files and still have it count as “one” migration. Here’s the final output:
{
"name": "alter_session_table",
"up": [{
"raw": "ALTER TABLE \"sessions\" ADD COLUMN company_id UUID REFERENCES companies(id);"
}],
"down": [{
"raw": "ALTER TABLE \"sessions\" DROP COLUMN company_id;"
}],
"deps": ["create_session_table", "create_company_table"]
}Not bad.
There’s still some very important improvements we need to do, but this post is getting long so I’ll do it in a follow-up post.
If you’re wanting to try out this system, I do have a work-in-progress migration tool written in Go. It currently only supports Postgres and SQLite since I’m still running the foundational features through their paces.

