Community코딩 & 개발github.com

autoreview

Auto Review closeout. Codex review is the default when no engine is set and is the recommended reviewer.

지원 대상Claude CodeCodex CLI~Cursor
npx skills add https://github.com/clawdbot/clawdbot/tree/main/.agents/skills/autoreview

문서

Auto Review

Run the bundled structured review helper as a closeout check. This is code review, not Guardian auto_review approval routing.

Codex review is the default when no engine is set. It usually delivers the best review results and should remain the normal final closeout engine.

Use when:

  • user asks for Codex review / Claude review / autoreview / second-model review
  • after non-trivial code edits, before final/commit/ship
  • reviewing a local branch or PR branch after fixes

Contract

  • Treat review output as advisory. Never blindly apply it.
  • Verify every finding by reading the real code path and adjacent files.
  • Read dependency docs/source/types when the finding depends on external behavior.
  • Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase.
  • Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
  • When an accepted finding shows a bug class or repeated pattern, inspect the current PR scope for sibling instances before fixing.
  • Fix the scoped bug class at once when practical; stop at touched surfaces, owner boundaries, and clear follow-up territory.
  • Keep going until structured review returns no accepted/actionable findings only while the work remains inside the original task scope.
  • If a review-triggered fix changes code, rerun focused tests and rerun the structured review helper.
  • For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
  • Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
  • Be patient with large bundles. Structured review can take up to 30 minutes while the model call is active, especially with Codex tools or web search.
  • Treat heartbeat lines like review still running: ... elapsed=... pid=... as healthy progress, not a hang. Let the helper continue while heartbeats are advancing. Pass --stream-engine-output when live engine text is useful; Codex and Claude filter tool/file chatter, other engines pass raw output through.
  • Do not kill a review just because it has been quiet for 2-5 minutes, or because it is still running under the 30-minute window. Inspect the process only after missing multiple expected heartbeats, after 30 minutes, or after an obviously failed subprocess; prefer letting the same helper command finish.
  • Tools are useful in review mode. The helper allows read-only inspection tools and web search by default so reviewers can check dependency contracts, upstream docs, and current behavior.
  • Security perspective is always included, but it should not cripple legitimate functionality. Report security findings only when the change creates a concrete, actionable risk or removes an important safety check.
  • For regression provenance, if no blamed PR is traceable, use the blamed commit as the provenance: commit SHA, date, and author username. Do not guess a merger or frame missing PR metadata as a separate finding.
  • Do not invoke built-in codex review, nested reviewers, or reviewer panels from inside the review. The helper builds one bundle, calls one selected engine, validates one structured result, and stops.
  • Stop as soon as the helper exits 0 with no accepted/actionable findings. Do not run an extra review just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
  • Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
  • Multi-reviewer panels are opt-in only. Use them when explicitly requested or when risk justifies the extra spend; the main agent still verifies every accepted finding before fixing.
  • If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
  • If gh/Gitcrawl reports database disk image is malformed, run gitcrawl doctor --json once to let the portable cache repair before retrying review; do not bypass the shim unless repair fails and freshness requires live GitHub.
  • If Gitcrawl reports a portable manifest mismatch, source/runtime DB health error, or stale portable-store checkout, run gitcrawl doctor --json and inspect source_db_health, runtime_db_health, and portable_store_status before falling back to live GitHub.
  • Do not push just to review. Push only when the user requested push/ship/PR update.

Scope Governor

Autoreview is a closeout gate, not permission to rewrite the task.

Before the first review, freeze a scope baseline: original request or issue, target branch, intended behavior, owner boundary, changed files, and non-test LOC. For inherited or already-bloated branches, use the intended PR diff as the baseline rather than accepting all existing branch drift.

Before patching a finding, classify it:

  • In-scope blocker: the finding is introduced by the current diff, affects the same owner boundary, and can be fixed without changing the task's contract.
  • Follow-up: the finding is real but belongs to an adjacent bug class, sibling surface, cleanup, or broader hardening track.
  • Stop-and-escalate: the finding requires a new protocol/config/storage/public API contract, a different owner boundary, a release-process change, or a design choice outside the original request.

Stop patching and report the scope break instead of continuing when:

  • a narrow PR turns into an architecture change, protocol change, migration, or release-process change;
  • the diff grows past 2x the original files or non-test LOC without explicit approval to expand scope;
  • two review-triggered patch cycles have not converged; pause and reclassify every remaining finding before another edit;
  • the best fix is "define the canonical contract first" rather than another local inference layer;
  • fixing the accepted finding would make the PR no longer describe the same behavior, issue, or owner boundary.

After the two-cycle pause, continue only when every remaining accepted finding is still an in-scope blocker. Otherwise preserve the useful analysis, identify the smallest safe landed subset if one exists, and open or request a follow-up for the larger fix. Do not keep committing speculative fixes just to satisfy the reviewer.

Do not stack or push review-triggered fix commits while scope classification or focused proof is unresolved. Keep exploratory edits local until the cycle is proven in scope; if scope breaks, remove them from the landing lane instead of preserving them as branch history.

Critical exceptions must be explicit: active data loss, crash, broken install/upgrade, release blocker, or concrete security exposure. If the exception is not one of those, it is not critical enough to blow up scope.

Release Branches And Release Process

On release, beta, stable, hotfix, signing, notarization, appcast, package-publish, or release-check work, use freeze discipline even when the branch name is not release-like:

  • Fix only release blockers, failed release infrastructure, exact backports, install/upgrade breakage, data loss, crashes, or concrete security exposure.
  • Treat non-blocking autoreview findings as follow-ups for main, not reasons to broaden the release branch.
  • Do not introduce new product behavior, config surface, protocol shape, migration, plugin ownership, docs narrative, or process policy unless it directly unblocks the release.
  • Keep proof tied to the release target: exact branch/ref, failing check or shipped-risk reason, smallest command/proof, and whether the fix must also forward-port to main.
  • If review discovers a real but non-critical design problem during release closeout, stop with a follow-up issue/PR plan; do not use the release branch as the refactor lane.

Pick Target

Dirty local work:

<autoreview-helper> --mode local

Use this only when the patch is actually unstaged/staged/untracked in the current checkout. --mode uncommitted is accepted as an alias for --mode local. For committed, pushed, or PR work, point the helper at the commit or branch diff instead; do not force dirty modes just because the helper docs mention dirty work first. A clean local review only proves there is no local patch.

Branch/PR work:

<autoreview-helper> --mode branch --base origin/main

Optional review context is first-class:

<autoreview-helper> --mode branch --base origin/main --prompt-file /tmp/review-notes.md --dataset /tmp/evidence.json

If an open PR exists, use its actual base:

base=$(gh pr view --json baseRefName --jq .baseRefName)
<autoreview-helper> --mode branch --base "origin/$base"

Committed single change:

<autoreview-helper> --mode commit --commit HEAD

or with the helper:

/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode commit --commit HEAD

Use commit review for already-landed or already-pushed work on main. Reviewing clean main against origin/main is usually an empty diff after push. For a small stack, review each commit explicitly or review the branch before merging with --base.

Parallel Closeout

Format first if formatting can change line locations. Then it is OK to run tests and review in parallel:

scripts/autoreview --parallel-tests "<focused test command>"

On Windows, the default --parallel-tests shell preserves the platform cmd.exe semantics used by Python shell=True. Use --parallel-tests-shell powershell or --parallel-tests-shell pwsh when the focused test command is PowerShell-specific.

Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation.

Review Panels

Run multiple reviewers against one frozen bundle:

<autoreview-helper> --reviewers codex,claude

--panel is shorthand for Codex plus Claude unless --engine changes the first reviewer:

<autoreview-helper> --panel

Set reviewer models and thinking/effort explicitly:

<autoreview-helper> --reviewers codex,claude --model codex=gpt-5.1 --thinking codex=high --model claude=sonnet --thinking claude=max

Inline syntax is also supported:

<autoreview-helper> --reviewers codex:gpt-5.1:high,claude:sonnet:max

Codex maps thinking to model_reasoning_effort and accepts low, medium, high, or xhigh. Claude maps thinking to --effort and also accepts max. Engines without a real thinking knob reject --thinking.

Context Efficiency

Run the helper directly so target selection, engine choice, structured validation, and exit status all stay in one path. If output is noisy, summarize the completed helper output after it returns; do not ask another agent or reviewer to rerun the review.

Helper

OpenClaw repo-local helper:

.agents/skills/autoreview/scripts/autoreview --help

On native Windows, invoke the extensionless Python helper through Python:

python .agents\skills\autoreview\scripts\autoreview --help

The smoke harness has thin shell wrappers over a shared Python implementation:

.agents/skills/autoreview/scripts/test-review-harness --fixture benign --engine codex
.agents\skills\autoreview\scripts\test-review-harness.ps1 -Fixture benign -Engine codex

agent-scripts checkout helper:

skills/autoreview/scripts/autoreview --help

Global helper from agent-scripts:

~/.codex/skills/agent-scripts/autoreview/scripts/autoreview --help

If installed from agent-scripts, path is:

/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --help

The helper:

  • chooses dirty local changes first
  • accepts --mode uncommitted as an alias for --mode local
  • otherwise uses current PR base if gh pr view works
  • otherwise uses origin/main for non-main branches
  • supports --engine codex, claude, droid, and copilot; default is AUTOREVIEW_ENGINE or codex; Codex should remain the default when nothing is set
  • resolves bare git, gh, reviewer, and PowerShell shell commands from absolute PATH entries only, never from the reviewed checkout; explicit relative --*-bin paths are resolved from the reviewed repository root
  • use --mode commit --commit <ref> for already-committed work, especially clean main after landing
  • should be left in --mode auto or forced to --mode branch for PR/branch work; do not force --mode local after committing
  • writes only to stdout unless --output, --json-output, or live streamed engine stderr is set
  • supports --dry-run, --parallel-tests, --parallel-tests-shell, --prompt, --prompt-file, --dataset, --no-tools, --no-web-search, and commit refs
  • supports --stream-engine-output or AUTOREVIEW_STREAM_ENGINE_OUTPUT=1 for live engine text while preserving structured validation; Codex and Claude hide tool/file event details, emit compact activity summaries, and report usage at turn completion
  • supports opt-in review panels with --panel / --reviewers, plus per-engine --model and --thinking
  • allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through codex exec with read-only sandbox and structured output
  • prints review still running: <engine> elapsed=<seconds>s pid=<pid> to stderr at long-running intervals while waiting for the selected review engine, unless streamed output or compact Codex activity has been visible recently
  • prints autoreview clean: no accepted/actionable findings reported when the selected review command exits 0
  • exits nonzero when accepted/actionable findings are present

Final Report

Include:

  • review command used
  • tests/proof run
  • findings accepted/rejected, briefly why
  • the clean review result from the final helper/review run, or why a remaining finding was consciously rejected

Do not run another review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.

Individual skills in this repo

This repo contains 20 individual skills — each has its own dedicated page.

1password

Set up and use 1Password CLI for sign-in, desktop integration, and reading or injecting secrets.

acp-router

Route plain-language requests for Claude Code, Cursor, Copilot, OpenClaw ACP, OpenCode, Gemini CLI, Qwen, Kiro, Kimi, iFlow, Factory Droid, Kilocode, or explicit ACP harness work into either OpenClaw ACP runtime sessions or direct acpx-driven sessions ("telephone game" flow). For coding-agent thread requests, read this skill first, then use only `sessions_spawn` for thread creation. Codex chat binding defaults to the native Codex app-server plugin unless ACP is explicit or background spawn needs ACP.

agent-transcript

Add a redacted agent transcript section to GitHub PR or issue bodies during OpenClaw agent-created PR/issue workflows.

apple-notes

Create, view, edit, delete, search, move, or export Apple Notes via the memo CLI on macOS.

apple-reminders

List, add, edit, complete, or delete Apple Reminders and reminder lists via remindctl.

bear-notes

Create, search, and manage Bear notes via grizzly CLI.

blacksmith-testbox

Run Blacksmith Testbox for CI-parity checks, secrets, hosted services, migrations, or builds local cannot reproduce.

blogwatcher

Monitor blogs and RSS/Atom feeds for updates using the blogwatcher CLI.

blucli

BluOS CLI (blu) for discovery, playback, grouping, and volume.

bluebubbles

Send and manage iMessages via BlueBubbles, including attachments, tapbacks, edits, replies, and groups.

browser-automation

Use when controlling web pages with the OpenClaw browser tool, especially multi-step flows, login checks, tab management, or recovery from stale refs/timeouts.

camsnap

Capture frames or clips from RTSP/ONVIF cameras.

canvas

Present HTML on connected OpenClaw node canvases, navigate/eval/snapshot, and debug canvas host URLs.

channel-message-flows

Use when running QA Lab channel message flow evidence.

clawdtributor

Use for OpenClaw clawtributors PR/issue triage: Discrawl discovery, live-open rechecks, deep review, topic grouping, and compact @handle/LOC/type/blast/verification summaries.

clawhub

Search, install, update, sync, or publish agent skills with the ClawHub CLI and registry.

claw-score

Audit or refresh OpenClaw maturity scorecard docs from root taxonomy, maturity scores, and QA evidence artifacts without using maintainer discrawl data or committed inventory reports.

clawsweeper

Use for all ClawSweeper work: OpenClaw issue/PR sweep reports, commit-review reports, repair jobs, cloud fix PRs, @clawsweeper maintainer mention commands, trusted ClawSweeper-reviewed autofix/automerge, GitHub Actions monitoring, permissions, gates, and manual backfills.

clownfish-cloud-pr

Use when launching Clownfish in GitHub Actions to create or update one guarded GitHub implementation PR from issue/PR refs, a ClawSweeper report, a custom maintainer prompt, or to opt an existing Clownfish PR into ClawSweeper-reviewed cloud automerge.

codex-review

Codex code review closeout: local dirty changes, PR branch vs main, parallel tests.

관련 스킬