db-migration-safe

Make SQL schema migrations safe. This skill detects locking/blocking hazards in a migration, rewrites unsafe DDL into zero-downtime multi-step migrations, generates a rollback, and gates the actual run behind a plan → validate → execute workflow.

Postgres is the strong, deterministic path. MySQL is an honest hybrid (see references/mysql-catalog.md) — materially weaker, and labeled as such.

This skill shells out to external linters (squawk, eugene) and OSC tools (gh-ost/pt-osc). It does not bundle them. If they are missing, install per references/tool-setup.md. Static analysis (squawk + eugene lint) needs no DB; eugene trace and EXECUTE need a Postgres.

When to use / when NOT to use

Use for any DDL: ALTER TABLE, CREATE INDEX, ADD/DROP COLUMN, type changes, constraints (NOT NULL, CHECK, UNIQUE, foreign keys), renames, and backfills — whether hand-written SQL or generated by an ORM. Do not use for pure SELECT/DML reporting queries, or for declarative schema engines that own their own apply step (e.g. Atlas) — those have their own safety path.

Safety contract (read first)

PLAN and VALIDATE never touch the user's database. They are analysis only and may run freely.
eugene trace runs the migration against an ephemeral throwaway Postgres (or a disposable clone) and rolls back by default. It never touches production.
EXECUTE applies real DDL to a real database. Never run EXECUTE unless the user explicitly asks to apply / run / ship the migration. Always dry-run first, always wrap in lock_timeout, never run CONCURRENTLY inside a transaction. EXECUTE is procedural guidance the user drives — there is no auto-execute script.

Note on auto-invocation: this is a single skill, so its frontmatter governs the whole file. PLAN/VALIDATE are safe to auto-trigger, so model-invocation is left enabled. The EXECUTE phase is gated behaviorally by the rule above. If you would rather make the entire skill user-triggered only, add disable-model-invocation: true to the frontmatter — it will require explicit invocation for analysis too.

PLAN

Run the static analyzer on the migration file. No database is required.

python3 scripts/analyze.py path/to/migration.sql --dialect postgres
# add --pg-version 13.0 to gate version-specific rules
# add --json for machine-readable output (CI)

analyze.py runs squawk and eugene lint, normalizes both into one verdict, and exits nonzero if any error-level finding is present (so it doubles as a CI gate). For --dialect mysql it applies the built-in InnoDB Online DDL heuristics instead (no free static MySQL lock-linter exists; see references/mysql-catalog.md).

Read the merged verdict:

Findings — each has a source (squawk/eugene/heuristic), a rule id, a level (error/warning), the line, the message, and a workaround.
Interpretation — map rule ids to plain meaning via references/squawk-rules.md and references/eugene-hints.md.
Table size — static linters cannot tell 100 rows from 100k. Probe real size with scripts/table_size.sql (needs DB access) before deciding whether a "slow" op is actually a problem:
```
psql "$DATABASE_URL" -v tablename=public.orders -f scripts/table_size.sql
```

Escalate to trace when the verdict involves a possible table/index rewrite (E5, E10), an FK without a covering index (E15), or any lock whose real duration depends on row count. trace observes the actual lock mode and how long it is held:

python3 scripts/trace.py path/to/migration.sql            # ephemeral temp server
python3 scripts/trace.py path/to/migration.sql --host db.staging --port 5432 \
    --user app --database app                              # disposable clone

Then produce three artifacts for the user:

a RISK REPORT (the verdict, in plain language),
a SAFE REWRITE (next section), and
a generated ROLLBACK (scripts/gen_rollback.py).

Safe rewrites

For every error/warning finding, replace the unsafe statement with the cataloged zero-downtime rewrite. The full table of 11 operations → rewrites is in references/postgres-catalog.md (MySQL: references/mysql-catalog.md). The two Postgres primitives behind almost every rewrite:

CONCURRENTLY — build/drop indexes without an ACCESS EXCLUSIVE write lock (must run outside a transaction).
NOT VALID → VALIDATE CONSTRAINT — add a constraint in two phases so the expensive full-table validation takes a weak SHARE UPDATE EXCLUSIVE lock instead of blocking writes.

Quick reference (see catalog for the exact multi-step SQL and PG-version nuances):

Unsafe	Safe rewrite
`CREATE INDEX`	`CREATE INDEX CONCURRENTLY` (outside txn)
`ADD COLUMN ... NOT NULL [DEFAULT]`	nullable col → set default → batch backfill → NOT NULL via CHECK
`SET NOT NULL`	`ADD CHECK (c IS NOT NULL) NOT VALID` → `VALIDATE` → `SET NOT NULL` → drop check
`ADD FOREIGN KEY`	`... NOT VALID` → `VALIDATE CONSTRAINT`
`ADD CHECK`	`... CHECK (...) NOT VALID` → `VALIDATE CONSTRAINT`
`ADD UNIQUE`	`CREATE UNIQUE INDEX CONCURRENTLY` → `ADD CONSTRAINT ... USING INDEX`
`ALTER COLUMN ... TYPE`	expand-contract (new col → dual-write → backfill → swap → drop)
rename column/table	expand-contract, or compatibility `VIEW`
backfill `UPDATE`	bounded batches in short txns, throttled

Write the rewritten statements to a new file (e.g. migration.safe.sql) so it can be re-validated.

Rollback

Generate the reverse migration from the (rewritten) DDL:

python3 scripts/gen_rollback.py migration.safe.sql > migration.rollback.sql

gen_rollback.py emits the inverse of every recognized cataloged op (ADD COLUMN→DROP COLUMN, CREATE INDEX CONCURRENTLY→DROP INDEX CONCURRENTLY, …). Review the output:

Irreversible ops (DROP COLUMN, DROP TABLE, type narrowing) cannot be restored from DDL alone. The generator marks them and emits the backup-table SQL that the forward migration must run first (<table>_<col>_backup_<date>) plus a best-effort restore. It never emits a silently-wrong inverse.
Unknown statement shapes produce an explicit manual rollback required: <reason> comment, not a guess.

VALIDATE

This is the gate. Re-run the analyzer on the rewritten migration:

python3 scripts/analyze.py migration.safe.sql --dialect postgres

Only proceed when it passes with zero error-level findings (warnings are configurable via .squawk.toml / -- eugene: ignore). If it still flags errors, return to Safe rewrites and loop. Re-run trace if the rewrite changed which locks are taken.

EXECUTE (gated — user-triggered only)

Do not enter this phase unless the user explicitly asked to apply the migration.

Apply the validated migration with locking guardrails. Dry-run first.

Bound every lock. Wrap blocking statements so they fail fast instead of queueing behind a long transaction and stalling all traffic:
```
SET lock_timeout = '5s';
SET statement_timeout = '0';  -- or a sane cap for the specific op
```
CONCURRENTLY runs outside a transaction. Do not put CREATE/DROP INDEX CONCURRENTLY in a BEGIN/COMMIT block or a transactional migration runner. After a failed concurrent build, drop the leftover INVALID index before retrying.
Two-phase constraints. Run the NOT VALID step, let it commit, then VALIDATE CONSTRAINT in a separate statement.
Batched backfills. Update in bounded chunks in separate short transactions, with a short sleep between batches to let replication and autovacuum keep up:
```
-- repeat until 0 rows affected
UPDATE t SET col = expr
WHERE id IN (SELECT id FROM t WHERE col IS NULL ORDER BY id LIMIT 5000)
```
Dry-run the whole sequence against the ephemeral/clone via scripts/trace.py one last time, confirm passed_all_checks, then apply to the target.

For MySQL, force ALGORITHM=…, LOCK=… so the server errors instead of silently falling back to a COPY rebuild, and delegate large rewrites to gh-ost/pt-osc — see references/mysql-catalog.md.

Reference index

references/postgres-catalog.md — the 11 unsafe ops → safe rewrites (core knowledge).
references/mysql-catalog.md — InnoDB Online DDL matrix + OSC delegation (weaker path).
references/squawk-rules.md — squawk rule ids and what each catches.
references/eugene-hints.md — eugene E1–E15 / W12–W14 and their workarounds.
references/tool-setup.md — installing squawk / eugene / gh-ost; ephemeral Postgres.

Scripts

scripts/analyze.py — orchestrates squawk + eugene lint (or MySQL heuristics) → one verdict. Exits nonzero on error-level findings.
scripts/trace.py — runs eugene trace (ephemeral temp server or clone) → real lock report.
scripts/gen_rollback.py — generates the reverse migration; flags irreversible ops.
scripts/table_size.sql — pg_class.reltuples size probe.

All scripts are Python 3 standard library only — no pip install.

alexbobkovv/db-migration-safe

Ask in your favorite AI

Dokumentation

db-migration-safe

When to use / when NOT to use

Safety contract (read first)

PLAN

Safe rewrites

Rollback

VALIDATE

EXECUTE (gated — user-triggered only)

Reference index

Scripts

Verwandte Skills

MustaphaSteph/vorec-plugins

kostja94/privacy-page-generator

fleurytian/awesome-claude-skills

kostja94/copywriting

joaocarlos/writing-context-rtfm

yaklang/dangling-markup-injection