db-migration-safe
Make SQL schema migrations safe. This skill detects locking/blocking hazards in a migration, rewrites unsafe DDL into zero-downtime multi-step migrations, generates a rollback, and gates the actual run behind a plan → validate → execute workflow.
Postgres is the strong, deterministic path. MySQL is an honest hybrid (see
references/mysql-catalog.md) — materially weaker, and labeled as such.
This skill shells out to external linters (squawk, eugene) and OSC tools
(gh-ost/pt-osc). It does not bundle them. If they are missing, install per
references/tool-setup.md. Static analysis (squawk + eugene lint) needs no DB;
eugene trace and EXECUTE need a Postgres.
When to use / when NOT to use
Use for any DDL: ALTER TABLE, CREATE INDEX, ADD/DROP COLUMN, type changes,
constraints (NOT NULL, CHECK, UNIQUE, foreign keys), renames, and backfills —
whether hand-written SQL or generated by an ORM. Do not use for pure SELECT/DML
reporting queries, or for declarative schema engines that own their own apply step
(e.g. Atlas) — those have their own safety path.
Safety contract (read first)
- PLAN and VALIDATE never touch the user's database. They are analysis only and may run freely.
eugene traceruns the migration against an ephemeral throwaway Postgres (or a disposable clone) and rolls back by default. It never touches production.- EXECUTE applies real DDL to a real database. Never run EXECUTE unless the user
explicitly asks to apply / run / ship the migration. Always dry-run first, always
wrap in
lock_timeout, never runCONCURRENTLYinside a transaction. EXECUTE is procedural guidance the user drives — there is no auto-execute script.
Note on auto-invocation: this is a single skill, so its frontmatter governs the whole file. PLAN/VALIDATE are safe to auto-trigger, so model-invocation is left enabled. The EXECUTE phase is gated behaviorally by the rule above. If you would rather make the entire skill user-triggered only, add
disable-model-invocation: trueto the frontmatter — it will require explicit invocation for analysis too.
PLAN
Run the static analyzer on the migration file. No database is required.
python3 scripts/analyze.py path/to/migration.sql --dialect postgres
# add --pg-version 13.0 to gate version-specific rules
# add --json for machine-readable output (CI)
analyze.py runs squawk and eugene lint, normalizes both into one verdict, and
exits nonzero if any error-level finding is present (so it doubles as a CI gate).
For --dialect mysql it applies the built-in InnoDB Online DDL heuristics instead (no
free static MySQL lock-linter exists; see references/mysql-catalog.md).
Read the merged verdict:
- Findings — each has a
source(squawk/eugene/heuristic), a rule id, alevel(error/warning), the line, the message, and a workaround. - Interpretation — map rule ids to plain meaning via
references/squawk-rules.mdandreferences/eugene-hints.md. - Table size — static linters cannot tell 100 rows from 100k. Probe real size with
scripts/table_size.sql(needs DB access) before deciding whether a "slow" op is actually a problem:psql "$DATABASE_URL" -v tablename=public.orders -f scripts/table_size.sql - Escalate to trace when the verdict involves a possible table/index rewrite (E5,
E10), an FK without a covering index (E15), or any lock whose real duration depends on
row count.
traceobserves the actual lock mode and how long it is held:python3 scripts/trace.py path/to/migration.sql # ephemeral temp server python3 scripts/trace.py path/to/migration.sql --host db.staging --port 5432 \ --user app --database app # disposable clone
Then produce three artifacts for the user:
- a RISK REPORT (the verdict, in plain language),
- a SAFE REWRITE (next section), and
- a generated ROLLBACK (
scripts/gen_rollback.py).
Safe rewrites
For every error/warning finding, replace the unsafe statement with the cataloged
zero-downtime rewrite. The full table of 11 operations → rewrites is in
references/postgres-catalog.md (MySQL: references/mysql-catalog.md). The two Postgres
primitives behind almost every rewrite:
CONCURRENTLY— build/drop indexes without anACCESS EXCLUSIVEwrite lock (must run outside a transaction).NOT VALID→VALIDATE CONSTRAINT— add a constraint in two phases so the expensive full-table validation takes a weakSHARE UPDATE EXCLUSIVElock instead of blocking writes.
Quick reference (see catalog for the exact multi-step SQL and PG-version nuances):
| Unsafe | Safe rewrite |
|---|---|
CREATE INDEX | CREATE INDEX CONCURRENTLY (outside txn) |
ADD COLUMN ... NOT NULL [DEFAULT] | nullable col → set default → batch backfill → NOT NULL via CHECK |
SET NOT NULL | ADD CHECK (c IS NOT NULL) NOT VALID → VALIDATE → SET NOT NULL → drop check |
ADD FOREIGN KEY | ... NOT VALID → VALIDATE CONSTRAINT |
ADD CHECK | ... CHECK (...) NOT VALID → VALIDATE CONSTRAINT |
ADD UNIQUE | CREATE UNIQUE INDEX CONCURRENTLY → ADD CONSTRAINT ... USING INDEX |
ALTER COLUMN ... TYPE | expand-contract (new col → dual-write → backfill → swap → drop) |
| rename column/table | expand-contract, or compatibility VIEW |
backfill UPDATE | bounded batches in short txns, throttled |
Write the rewritten statements to a new file (e.g. migration.safe.sql) so it can be
re-validated.
Rollback
Generate the reverse migration from the (rewritten) DDL:
python3 scripts/gen_rollback.py migration.safe.sql > migration.rollback.sql
gen_rollback.py emits the inverse of every recognized cataloged op
(ADD COLUMN→DROP COLUMN, CREATE INDEX CONCURRENTLY→DROP INDEX CONCURRENTLY, …).
Review the output:
- Irreversible ops (
DROP COLUMN,DROP TABLE, type narrowing) cannot be restored from DDL alone. The generator marks them and emits the backup-table SQL that the forward migration must run first (<table>_<col>_backup_<date>) plus a best-effort restore. It never emits a silently-wrong inverse. - Unknown statement shapes produce an explicit
manual rollback required: <reason>comment, not a guess.
VALIDATE
This is the gate. Re-run the analyzer on the rewritten migration:
python3 scripts/analyze.py migration.safe.sql --dialect postgres
Only proceed when it passes with zero error-level findings (warnings are
configurable via .squawk.toml / -- eugene: ignore). If it still flags errors, return
to Safe rewrites and loop. Re-run trace if the rewrite changed which locks are taken.
EXECUTE (gated — user-triggered only)
Do not enter this phase unless the user explicitly asked to apply the migration.
Apply the validated migration with locking guardrails. Dry-run first.
- Bound every lock. Wrap blocking statements so they fail fast instead of queueing
behind a long transaction and stalling all traffic:
SET lock_timeout = '5s'; SET statement_timeout = '0'; -- or a sane cap for the specific op CONCURRENTLYruns outside a transaction. Do not putCREATE/DROP INDEX CONCURRENTLYin aBEGIN/COMMITblock or a transactional migration runner. After a failed concurrent build, drop the leftoverINVALIDindex before retrying.- Two-phase constraints. Run the
NOT VALIDstep, let it commit, thenVALIDATE CONSTRAINTin a separate statement. - Batched backfills. Update in bounded chunks in separate short transactions, with a
short sleep between batches to let replication and autovacuum keep up:
-- repeat until 0 rows affected UPDATE t SET col = expr WHERE id IN (SELECT id FROM t WHERE col IS NULL ORDER BY id LIMIT 5000) - Dry-run the whole sequence against the ephemeral/clone via
scripts/trace.pyone last time, confirmpassed_all_checks, then apply to the target.
For MySQL, force ALGORITHM=…, LOCK=… so the server errors instead of silently falling
back to a COPY rebuild, and delegate large rewrites to gh-ost/pt-osc — see
references/mysql-catalog.md.
Reference index
references/postgres-catalog.md— the 11 unsafe ops → safe rewrites (core knowledge).references/mysql-catalog.md— InnoDB Online DDL matrix + OSC delegation (weaker path).references/squawk-rules.md— squawk rule ids and what each catches.references/eugene-hints.md— eugene E1–E15 / W12–W14 and their workarounds.references/tool-setup.md— installing squawk / eugene / gh-ost; ephemeral Postgres.
Scripts
scripts/analyze.py— orchestrates squawk + eugene lint (or MySQL heuristics) → one verdict. Exits nonzero on error-level findings.scripts/trace.py— runseugene trace(ephemeral temp server or clone) → real lock report.scripts/gen_rollback.py— generates the reverse migration; flags irreversible ops.scripts/table_size.sql—pg_class.reltuplessize probe.
All scripts are Python 3 standard library only — no pip install.