What Changes Once Your ClickHouse® Schema Is Code
If you build on Postgres or MySQL, you probably haven't hand-written a migration in years. You change a schema file, a tool works out the diff, generates the SQL, and fails your CI if production has quietly drifted from what your repo says. That's the boring, solved version of schema management. Boring is the highest compliment you can pay to the thing that mutates your database.
Now open your ClickHouse® project. You're back to writing DDL by hand, reading diffs with your eyes, and finding out about drift when a query breaks in production.
It's not that ClickHouse has no migration tooling, but it's never been a default, the one obvious thing a team reaches for the way the relational world reaches for its schema tools. Most of what exists are SQL runners: they apply the files you write and track which ones ran, but you are still the diff engine. You still decide what changed, and you still hope dev and prod haven't drifted apart.
We got tired of being the diff engine. So we built CHKit, and we're releasing it open source. This post isn't a feature tour. It's about what actually changes on your keyboard once your ClickHouse schema lives in code.
Why we built it
We ran ClickHouse in our previous company, Numia, at close to petabyte scale, powering real-time APIs over blockchain data. The engine itself was rarely the problem. The problem was everything around it, and schema management was near the top of the list.
Every change was hand-written DDL, reviewed by eye. The real shape of a table was whatever the pile of migration files happened to produce. A manual ALTER run during an incident never made it back into version control, and nothing told us our code and our database now disagreed. We wrote a lot of tooling to paper over this: health checks, scripts, guardrails.
But the schema problem stayed, being the main root cause not having a schema-as-code story the way Postgres does. So we built one. If you want the longer version of why, and why we decided to open source it, there's a companion post on the CHKit blog. Here, I want to stay on the practical side.
Your ClickHouse schema as code
chkit defines your ClickHouse tables, views, and materialized views as TypeScript (Python coming soon). Here is a real table, with the things you actually set in production:
import { schema, table } from '@chkit/core'
const events = table({
database: 'analytics',
name: 'events',
columns: [
{ name: 'id', type: 'UInt64' },
{ name: 'org_id', type: 'String' },
{ name: 'event', type: 'LowCardinality(String)' },
{ name: 'received_at', type: 'DateTime64(3)' },
{ name: 'payload', type: 'String', codec: [{ kind: 'ZSTD', level: 3 }] },
],
engine: 'MergeTree()',
orderBy: ['org_id', 'received_at'],
primaryKey: ['org_id', 'received_at'],
partitionBy: 'toYYYYMM(received_at)',
ttl: 'received_at + INTERVAL 90 DAY',
})
export default schema(events)
That's the desired state of your database, type-checked and reviewable in a pull request. The loop around it is three commands. chkit generate compares your definitions against the last snapshot and writes a migration SQL file. chkit migrate --apply runs the pending migrations. chkit drift introspects the live database and tells you where it differs from your code. Two more earn their place in CI: chkit check fails the build on pending migrations, drift, or an edited migration, and chkit codegen generates TypeScript row types from the same definitions.
It's not an ORM. You still write your own SQL for queries. CHKit owns the schema, the migrations, and the guardrails, nothing else.
One thing to be upfront: it's beta. The CLI and the schema DSL are stable and run our own production workloads, but we may still make small breaking changes before 1.0. The deeper reference lives in the chkit docs.
What changes once the schema is code
Here's the part that matters, and the reason this is worth a blog post.
You stop hand-writing migrations
You change the TypeScript and run chkit generate. It diffs the new state against the old one, works out the ordered set of operations, and writes the SQL for you. If nothing changed, it writes nothing.
The migration file stops being a thing you author and starts being a thing you review. The cognitive job shifts from "write the correct ALTER for this change" to "read what CHKit proposes and approve it." On a busy schema that is the difference between a careful afternoon and a thirty-second pull request.
Drift becomes a CI gate instead of a 3am surprise
This is the one that pays for the whole tool.
The failure is always the same. Someone fixes an incident at 3am by running an ALTER straight against production. It works. The incident closes. The change never makes it into a migration file, and now your repo and your database disagree, silently, until weeks later when a deploy does something nobody can explain.
chkit drift reads the live database and compares it, column by column, against your code. It catches the missing column, the changed TTL, and the sorting key that isn't what your repo thinks it is. Put chkit check in CI, and that comparison runs on every pull request, so the 3am ALTER shows up the next morning as a named, concrete difference instead of a vague feeling that something is off. The point isn't that people stop touching production. They won't. The point is that the database stops being able to keep a secret from you.
Destructive changes can't fire by accident, and structural ones announce themselves
Two different problems hide inside "schema change," and ClickHouse treats them very differently.
Some changes are cheap metadata edits: adding a column, changing a TTL, adding an index. Others rewrite the table. Changing the engine, the ORDER BY, the partitioning, or the primary key isn't an in-place edit at all. ClickHouse has no ALTER for those. The only path is to create a new table, copy the data across, and swap them. If you've ever changed a sorting key on a large table and watched it turn into an afternoon of INSERT SELECT, you know exactly what I mean. (We dug into why sorting-key order matters this much in our query optimization post.)
CHKit tags every operation in a plan as safe, caution, or danger, and it knows which changes are structural rewrites versus in-place edits. A DROP COLUMN, a DROP TABLE, anything that destroys data, is tagged as danger and blocked. In an interactive terminal, you get a prompt. In CI it exits non-zero and refuses to run unless you explicitly pass --allow-destructive. That check also scans hand-written SQL, not just the migrations CHKit generated itself, so you can't sneak a raw DROP past it.
Your application types stop drifting from your schema
chkit codegen generates TypeScript row types from the same definitions that your tables come from. Run chkit codegen --check in CI, and the build fails when your application types and your database schema fall out of sync. It's the same drift problem as before, one layer up, closed the same way.
What CHKit does not do?
A launch post that only lists wins is marketing, so here's where the edges are.
CHKit manages schema, which means DDL. It does not move your data. There's an optional backfill plugin for time-windowed copies with checkpoints, but it's something you opt into, not part of the core migrate loop, and it doesn't make the hard part easy. Reshaping a large table is still a real data migration you have to plan. Everyone in this space struggles with that one.
It also emits single-node DDL on the default path. If you run a replicated cluster with Keeper, you manage that replication yourself (we are planning to include this feature in the near future). And a couple of conveniences, like the managed uniqueKey, only apply to the ObsessionDB engine. CHKit works against any ClickHouse, but the Cloud-only features are exactly that.
Last, CHKit cannot undo a ClickHouse mutation. What it does is tell you, before you run anything, that the small-looking change in your diff is actually a full-table rewrite. The data still moves. You just stop finding out on a Friday afternoon.
The rule worth saving
If you take one thing from this: schema-as-code earns its keep the moment more than one person, or more than one environment, can change your tables. One engineer on one cluster can get away with a migrations folder and good habits. A team with a staging and prod split, and a CI pipeline that can run an ALTER, cannot. That's the point where drift stops being hypothetical.
And when a diff comes back tagged danger or structural, stop and read it. That's CHKit telling you this change moves data. I'd plan the backfill before I approved it, every time, rather than discover the rewrite in production.
The relational world stopped handwriting migrations a long time ago. ClickHouse doesn't have to be the exception.
Try it
CHKit is MIT licensed and works with any ClickHouse, including Cloud, Altinity, self-hosted, and ObsessionDB. You can scaffold a project from a working example in one command:
npm create chkit@latest
The code is on GitHub. If you run ClickHouse and you're still managing schema by hand, we built this so you don't have to. And if you want managed ClickHouse without the operational tax that started this whole story, that's what we do at ObsessionDB.
We'd genuinely like to know how you handle ClickHouse schema changes today, especially drift and destructive ALTERs. Open an issue and tell us what's missing.
Continue Reading
Originally written for obsessionDB. Read the original article here.
ClickHouse is a registered trademark of ClickHouse, Inc. https://clickhouse.com