Loop Engineering

Help the user replace themselves as the thing that prompts a coding agent. The deliverable is a small system that discovers work, hands it to a maker, verifies the result with a separate checker, writes down what is done, decides what is next, and re-runs until a goal is met, then stops for an honest reason.

Your job in this skill is not to write the loop for them in one shot. It is to interview, design the contract, then scaffold so they end with a loop they can actually run and trust.

What a loop is, and is not

A loop is not a cron job. A cron job repeats blindly. A loop discovers work, verifies it with a separate checker, persists state, decides what is next, and stops for a reason. If the thing you are building cannot stop on its own, it is a bug, not a loop.

Build stop conditions first. They are what let the user walk away.

Step 0: decide whether a loop is even the right tool

Before designing anything, apply this test. A loop pays off only when both are true:

There is a clear, machine-checkable success criterion (a test passes, a build is green, a schema validates, a rubric returns PASS).
Reaching it involves tedious trial and error the user would otherwise do by hand, turn after turn.

If the user cannot state how a machine would know the work is done, stop and say so. A loop with no verifiable done-condition does not converge; it thrashes and burns tokens. Help them find a checkable proxy first, or recommend they keep prompting by hand. Do not build a loop to avoid understanding the work.

Good loop candidates: fixing failing CI, triaging an issue inbox, dependency upgrades against a solid test suite, performance tuning with a benchmark, flaky test hunts, doc builds that must stay green. Poor candidates: open-ended design, anything where "good" is a matter of taste with no proxy, one-off tasks.

Step 1: interview the user (fill the loop contract)

Ask these questions. Lead with the verification question — it is the one that decides whether the loop is viable. Do not ask all eleven at once; ask in small batches, and infer what you safely can from the repo.

Objective. What is the recurring work you keep doing by hand? What should the loop make true?
Done-condition (ask first, ask hard). How would a machine know it is done, with no human looking? Name the exact command or check.
Trigger. What starts a run: a schedule (every N minutes/nightly) or an event (push, new issue, webhook, CI failure)?
Discover / intake. How does a run pick what to work on? (read CI status, query the issue tracker, read a fix_plan.md, scan alerts)
Workspace. Where does it act, and what is strictly off-limits? (a git worktree or sandbox; never force-push, touch secrets, change deps, hit prod)
Context. What must each run read to not re-derive the project from zero? (a SKILL.md / AGENTS.md, the spec, the memory file)
Delegation. Which agent does the work (the maker), and which separate one checks it (the checker)? They must not be the same call.
Verification. What gates must pass for "done" to be true? (unit tests, lint, types, an LLM-as-judge rubric returning PASS/FAIL with evidence)
Memory / state. Where is progress stored so a run can resume after a restart? (a markdown file, an issue tracker, a state.json on disk)
Budget. The hard ceilings: max iterations, max runtime, max tokens, and a dollar cap if a credential can spend money.
Hand-off. When does it escalate to a human instead of pressing on? (ambiguous, high-risk, or the same failure twice with no new evidence)

Record answers into templates/loop-contract.template.md. If the user cannot answer #2 with a concrete check, return to Step 0.

Step 2: write the loop contract

Fill the template into a real loop-contract.md in the user's repo. This is the design artifact the loop is built from and reviewed against. Keep it short and concrete. Every field should name a command, a path, or a number, not a vibe.

Step 3: choose the smallest shape that fits the tool

Map the contract onto the user's agent. Pick the simplest mechanism that satisfies it; do not reach for multi-agent orchestration when one /goal will do. See reference/tool-mapping.md for specifics.

Claude Code: /goal for the verified stop condition, sub-agents in .claude/agents/ for maker/checker separation, hooks or a scheduled task for the trigger, claude -p ... --output-format json for headless runs.
Codex: the Automations tab for the trigger, codex exec --json for headless runs, codex exec resume --last to continue, sub-agents for the checker.
Greenfield / from scratch: a Ralph-style bash loop, while :; do cat PROMPT.md | agent; done, with one task per turn and a fix_plan.md. Best when there is no codebase to break yet; pair it with a gate and a budget so it can stop. See reference/ralph.md.

Step 4: scaffold the files

Generate concrete, runnable artifacts (adapt names to the user's stack):

The maker prompt with the non-negotiable rules baked in (see laws below).
The verifier: either a shell gate (the exact test/lint command) or a checker prompt that returns PASS/FAIL plus evidence and may ESCALATE. Start from reference/verifier-rubric.md.
The memory file (fix_plan.md or state.json) that a run reads first and writes last.
The runner: the while loop, cron line, GitHub Action, or automation that fires it, including the gate ("if the build is already green, do nothing") and the budget cap.

Step 5: harden before letting it run

Walk the user through reference/hardening-checklist.md before the first unattended run. Always dry-run once with a tight budget and a sandboxed credential before scaling up.

The four honest stop conditions

Every loop must be able to stop for exactly one of these, and say which:

goal met: the separate verifier confirms the done-condition.
budget spent: an iteration, token, time, or dollar ceiling tripped.
stalled: the same failure twice with no new evidence (stop thrashing).
needs a human: high-risk or ambiguous, so it escalates. This is a success state, not a failure.

The non-negotiable laws

Bake rules 1–5 into every maker's instructions. Rules 6–7 cannot be baked into a maker — they are the engineer's to own, so make sure the user keeps them.

The maker never grades its own work. A separate checker decides "done". The model that wrote the code is far too generous marking its own homework.
Never weaken or delete a test, or narrow a check, to make it pass. Fix the cause. If the test is wrong, escalate; do not silently edit it.
A loop that cannot stop is a bug. Wire the four stop conditions before the first run.
Memory lives on disk, not in the context. Read it first each turn, write it last. The agent forgets between runs; the repo does not.
Fix only the cause; do not widen scope. Smallest change that could be right.
Verification stays the engineer's responsibility. "Done" is a claim, not a proof. The verifier is the asset the user owns; the maker is a commodity that improves for free with every model release.
The fleet scales to the user's review rate, not the tool's lane count. The human is the serial bottleneck. Usually the right number of parallel agents is a low single digit.

What the loop never does for you

Say this plainly to the user so they keep the right job:

Verification. A loop running unattended is also a loop making mistakes unattended. The separate checker makes "done" mean something; it does not make it certain.
Comprehension. The faster the loop ships, the faster understanding-debt grows. Schedule time to read what it built.
Intent. Why this work matters, and what "good" means, has to come from a human. The loop does not know the difference between using it to move faster on work you understand and using it to avoid understanding the work. You do.

Build the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.

invincible04/awesome-loop-engineering

Ask in your favorite AI

文档

Loop Engineering

What a loop is, and is not

Step 0: decide whether a loop is even the right tool

Step 1: interview the user (fill the loop contract)

Step 2: write the loop contract

Step 3: choose the smallest shape that fits the tool

Step 4: scaffold the files

Step 5: harden before letting it run

The four honest stop conditions

The non-negotiable laws

What the loop never does for you

Further reading (optional)

相关技能

cognitedata/integrate-todo-list

ardathhomeless585/mini-coding-agent

urmzd/agentspec

sujalsharmaa/skills-getting-started-with-github-copilot

karlis-eng/scaleshift-proposal-generator

sickn33/mobile-design