CommunitySchreiben & Editierengithub.com

alexbobkovv/db-migration-safe

Zero-downtime SQL migration safety as an Agent Skill: detect lock hazards, rewrite unsafe DDL, auto-generate rollbacks, gate execution (Postgres-first, wraps squawk + eugene).

Funktioniert mit~Claude Code~Codex CLI~Cursor
npx skills add alexbobkovv/db-migration-safe

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Dokumentation

db-migration-safe

Make SQL schema migrations safe. This skill detects locking/blocking hazards in a migration, rewrites unsafe DDL into zero-downtime multi-step migrations, generates a rollback, and gates the actual run behind a plan → validate → execute workflow.

Postgres is the strong, deterministic path. MySQL is an honest hybrid (see references/mysql-catalog.md) — materially weaker, and labeled as such.

This skill shells out to external linters (squawk, eugene) and OSC tools (gh-ost/pt-osc). It does not bundle them. If they are missing, install per references/tool-setup.md. Static analysis (squawk + eugene lint) needs no DB; eugene trace and EXECUTE need a Postgres.

When to use / when NOT to use

Use for any DDL: ALTER TABLE, CREATE INDEX, ADD/DROP COLUMN, type changes, constraints (NOT NULL, CHECK, UNIQUE, foreign keys), renames, and backfills — whether hand-written SQL or generated by an ORM. Do not use for pure SELECT/DML reporting queries, or for declarative schema engines that own their own apply step (e.g. Atlas) — those have their own safety path.

Safety contract (read first)

  • PLAN and VALIDATE never touch the user's database. They are analysis only and may run freely.
  • eugene trace runs the migration against an ephemeral throwaway Postgres (or a disposable clone) and rolls back by default. It never touches production.
  • EXECUTE applies real DDL to a real database. Never run EXECUTE unless the user explicitly asks to apply / run / ship the migration. Always dry-run first, always wrap in lock_timeout, never run CONCURRENTLY inside a transaction. EXECUTE is procedural guidance the user drives — there is no auto-execute script.

Note on auto-invocation: this is a single skill, so its frontmatter governs the whole file. PLAN/VALIDATE are safe to auto-trigger, so model-invocation is left enabled. The EXECUTE phase is gated behaviorally by the rule above. If you would rather make the entire skill user-triggered only, add disable-model-invocation: true to the frontmatter — it will require explicit invocation for analysis too.


PLAN

Run the static analyzer on the migration file. No database is required.

python3 scripts/analyze.py path/to/migration.sql --dialect postgres
# add --pg-version 13.0 to gate version-specific rules
# add --json for machine-readable output (CI)

analyze.py runs squawk and eugene lint, normalizes both into one verdict, and exits nonzero if any error-level finding is present (so it doubles as a CI gate). For --dialect mysql it applies the built-in InnoDB Online DDL heuristics instead (no free static MySQL lock-linter exists; see references/mysql-catalog.md).

Read the merged verdict:

  1. Findings — each has a source (squawk/eugene/heuristic), a rule id, a level (error/warning), the line, the message, and a workaround.
  2. Interpretation — map rule ids to plain meaning via references/squawk-rules.md and references/eugene-hints.md.
  3. Table size — static linters cannot tell 100 rows from 100k. Probe real size with scripts/table_size.sql (needs DB access) before deciding whether a "slow" op is actually a problem:
    psql "$DATABASE_URL" -v tablename=public.orders -f scripts/table_size.sql
    
  4. Escalate to trace when the verdict involves a possible table/index rewrite (E5, E10), an FK without a covering index (E15), or any lock whose real duration depends on row count. trace observes the actual lock mode and how long it is held:
    python3 scripts/trace.py path/to/migration.sql            # ephemeral temp server
    python3 scripts/trace.py path/to/migration.sql --host db.staging --port 5432 \
        --user app --database app                              # disposable clone
    

Then produce three artifacts for the user:

  • a RISK REPORT (the verdict, in plain language),
  • a SAFE REWRITE (next section), and
  • a generated ROLLBACK (scripts/gen_rollback.py).

Safe rewrites

For every error/warning finding, replace the unsafe statement with the cataloged zero-downtime rewrite. The full table of 11 operations → rewrites is in references/postgres-catalog.md (MySQL: references/mysql-catalog.md). The two Postgres primitives behind almost every rewrite:

  • CONCURRENTLY — build/drop indexes without an ACCESS EXCLUSIVE write lock (must run outside a transaction).
  • NOT VALIDVALIDATE CONSTRAINT — add a constraint in two phases so the expensive full-table validation takes a weak SHARE UPDATE EXCLUSIVE lock instead of blocking writes.

Quick reference (see catalog for the exact multi-step SQL and PG-version nuances):

UnsafeSafe rewrite
CREATE INDEXCREATE INDEX CONCURRENTLY (outside txn)
ADD COLUMN ... NOT NULL [DEFAULT]nullable col → set default → batch backfill → NOT NULL via CHECK
SET NOT NULLADD CHECK (c IS NOT NULL) NOT VALIDVALIDATESET NOT NULL → drop check
ADD FOREIGN KEY... NOT VALIDVALIDATE CONSTRAINT
ADD CHECK... CHECK (...) NOT VALIDVALIDATE CONSTRAINT
ADD UNIQUECREATE UNIQUE INDEX CONCURRENTLYADD CONSTRAINT ... USING INDEX
ALTER COLUMN ... TYPEexpand-contract (new col → dual-write → backfill → swap → drop)
rename column/tableexpand-contract, or compatibility VIEW
backfill UPDATEbounded batches in short txns, throttled

Write the rewritten statements to a new file (e.g. migration.safe.sql) so it can be re-validated.

Rollback

Generate the reverse migration from the (rewritten) DDL:

python3 scripts/gen_rollback.py migration.safe.sql > migration.rollback.sql

gen_rollback.py emits the inverse of every recognized cataloged op (ADD COLUMNDROP COLUMN, CREATE INDEX CONCURRENTLYDROP INDEX CONCURRENTLY, …). Review the output:

  • Irreversible ops (DROP COLUMN, DROP TABLE, type narrowing) cannot be restored from DDL alone. The generator marks them and emits the backup-table SQL that the forward migration must run first (<table>_<col>_backup_<date>) plus a best-effort restore. It never emits a silently-wrong inverse.
  • Unknown statement shapes produce an explicit manual rollback required: <reason> comment, not a guess.

VALIDATE

This is the gate. Re-run the analyzer on the rewritten migration:

python3 scripts/analyze.py migration.safe.sql --dialect postgres

Only proceed when it passes with zero error-level findings (warnings are configurable via .squawk.toml / -- eugene: ignore). If it still flags errors, return to Safe rewrites and loop. Re-run trace if the rewrite changed which locks are taken.

EXECUTE (gated — user-triggered only)

Do not enter this phase unless the user explicitly asked to apply the migration.

Apply the validated migration with locking guardrails. Dry-run first.

  1. Bound every lock. Wrap blocking statements so they fail fast instead of queueing behind a long transaction and stalling all traffic:
    SET lock_timeout = '5s';
    SET statement_timeout = '0';  -- or a sane cap for the specific op
    
  2. CONCURRENTLY runs outside a transaction. Do not put CREATE/DROP INDEX CONCURRENTLY in a BEGIN/COMMIT block or a transactional migration runner. After a failed concurrent build, drop the leftover INVALID index before retrying.
  3. Two-phase constraints. Run the NOT VALID step, let it commit, then VALIDATE CONSTRAINT in a separate statement.
  4. Batched backfills. Update in bounded chunks in separate short transactions, with a short sleep between batches to let replication and autovacuum keep up:
    -- repeat until 0 rows affected
    UPDATE t SET col = expr
    WHERE id IN (SELECT id FROM t WHERE col IS NULL ORDER BY id LIMIT 5000)
    
  5. Dry-run the whole sequence against the ephemeral/clone via scripts/trace.py one last time, confirm passed_all_checks, then apply to the target.

For MySQL, force ALGORITHM=…, LOCK=… so the server errors instead of silently falling back to a COPY rebuild, and delegate large rewrites to gh-ost/pt-osc — see references/mysql-catalog.md.


Reference index

  • references/postgres-catalog.md — the 11 unsafe ops → safe rewrites (core knowledge).
  • references/mysql-catalog.md — InnoDB Online DDL matrix + OSC delegation (weaker path).
  • references/squawk-rules.md — squawk rule ids and what each catches.
  • references/eugene-hints.md — eugene E1–E15 / W12–W14 and their workarounds.
  • references/tool-setup.md — installing squawk / eugene / gh-ost; ephemeral Postgres.

Scripts

  • scripts/analyze.py — orchestrates squawk + eugene lint (or MySQL heuristics) → one verdict. Exits nonzero on error-level findings.
  • scripts/trace.py — runs eugene trace (ephemeral temp server or clone) → real lock report.
  • scripts/gen_rollback.py — generates the reverse migration; flags irreversible ops.
  • scripts/table_size.sqlpg_class.reltuples size probe.

All scripts are Python 3 standard library only — no pip install.

Verwandte Skills

MustaphaSteph/vorec-plugins

Claude Code plugin for screen recording with AI narration. Record tutorials, product demos, and walkthroughs from any codebase. AI voice-over, zoom effects, subtitles, 4K export.

community

kostja94/privacy-page-generator

When the user wants to create, optimize, or structure Privacy Policy page. Also use when the user mentions "privacy policy," "privacy page," "data protection," "GDPR compliance," "privacy notice," "data privacy," "CCPA," "cookie policy," or "personal data." For legal overview page, use legal-page-generator.

community

fleurytian/awesome-claude-skills

Claude Skills developed for brain-workers. Dev by Fleury, an Ex-McKinsey and now AI product manager. 小红书/RedNote@如宝|AI&Anlalytics. Email me @[email protected]

community

kostja94/copywriting

When the user wants to write or optimize short-form marketing copy—headlines, CTAs, ad copy, landing page copy, email copy. Also use when the user mentions "copywriting," "headline," "CTA copy," "ad copy," "landing page copy," "sales copy," "conversion copy," "PAS," "AIDA," "BAB," "copy formula," or "differentiation." For long-form article bodies (blog posts, guides), use article-content.

community

joaocarlos/writing-context-rtfm

A MCP extension built on top of rtfm-ai that it acts as a gatekeeper by taking the agent's task, queries the underlying RTFM index, filters out background noise, and packs only the essential and supporting source chunks into a tight contextualized prompt.

community

yaklang/dangling-markup-injection

Dangling markup injection playbook. Use when HTML injection is possible but JavaScript execution is blocked (CSP, sanitizer strips event handlers, WAF blocks script tags) — exfiltrate CSRF tokens, session data, and page content by injecting unclosed HTML tags that capture subsequent page content.

community