name: testing-workflows version: "0.1.0" megalos_version: ">=0.4.0"
Testing megálos workflows
1. Scope and non-goals
This skill teaches an AI coding agent how to verify the behaviour of a
megálos workflow YAML file before it ships. The primary surface is the
dry-run CLI (python -m megalos_server.dryrun), which walks a workflow
through the production runtime with a mock input source in place of a real
LLM. Dry-run answers the question "given these mock responses, does this
workflow step, branch, gate, descend, and terminate the way the author
intended?" It is fast, offline, and deterministic; it makes no LLM calls
and no MCP round-trips. A live run against a real megálos deployment is
available as a bounded secondary surface for the cases dry-run cannot
reach — real-LLM output branches and real-registry mcp_tool_call
digressions — and is covered in §6 (drafted in a later task).
In scope. The interactive dry-run recipe (§3). Scripted-responses
fixtures (§4) for regression replay. Load-time cross-workflow errors —
unknown_call_target and call_cycle_detected — and how dry-run surfaces
them via create_app() bootstrap (§5). Exit-code and error-banner
contracts the agent must read.
Out of scope. Per-file workflow validation — covered by the
companion skill validating-workflows. Workflow authoring — covered by
authoring-workflows. Domain-repo packaging, Horizon deployment, and
registry setup — covered by deploying-workflows. Framework-level tests
and workflow-fixture contributions to the megálos repository itself:
pytest/conftest is not a surface this skill teaches. The
testing-workflows skill verifies workflow behaviour with dry-run;
framework test authoring is a distinct activity for contributors to
megálos, and is explicitly out of scope here. Real LLM invocation — dry-run never calls an LLM;
authors who need a real LLM in the loop reach for the live-run surface
(§6) or a deployed instance.
Audience. The reader is an AI coding agent acting on behalf of a
workflow author. The dry-run design explicitly frames this audience: authors
are not test engineers. A workflow author needs to see the directive
template rendered, the gates listed, and the schema feedback at each
step — not write a pytest case. The interactive and scripted dry-run
recipes below are shaped around that audience.
2. Dry-run as primary mode
python -m megalos_server.dryrun loads a workflow and drives it through
the production execution path, reading each step's mock "LLM response"
from stdin (or from a scripted YAML file — see §4) instead of calling a
model. The dry-run is not a parallel simulator; it is the production
runtime with a mock input source. The same start_workflow,
submit_step, classification, output_schema validation, retry
accounting, branch selection, and sub-workflow descent that a live server
runs are what dry-run runs. Anything dry-run accepts, the server accepts;
anything dry-run rejects at bootstrap, the server rejects at startup.
What dry-run exercises. Step rendering (banner, precondition, gates,
directive). output_schema validation and the retry surface (validation
hints, remaining-retry counts, budget exhaustion). Branch selection and
branch-default resolution. Precondition skip detection — when a step's
precondition: is unmet at runtime, dry-run prints Skipped: <step_id>
on stdout and advances (see §3 for the surface contract; §9 for the
common-mistake shape). Sub-workflow descent and parent resume.
create_app() load-time checks over the full workflow set in the target's
parent directory, including unknown_call_target and call_cycle_detected
(§5).
What dry-run does not exercise. Real LLM generation quality. Real MCP
registry round-trips (mcp_tool_call steps are covered by schema and
structural checks at load time, but a real registry call is a live-run
concern). Production-scale session persistence, latency, or concurrency.
When the author needs to verify any of these, the live-run surface is the
tool (§6).
Exit-code contract. Dry-run exits 0 on workflow_complete and 1
on any other terminal status — the production _TERMINAL_STATUSES
frozenset is {"workflow_complete", "error", "session_escalated", "workflow_changed"}. A non-zero exit always pairs with a decoded error
banner on stderr. Scripting around dry-run (CI smoke-tests, pre-commit
checks) should treat exit 0 as the sole green.
3. Interactive dry-run recipe
The interactive mode is the common case: the agent (or its human operator) types a mock LLM response at each step and watches the workflow unfold. Invocation:
python -m megalos_server.dryrun workflows/my_workflow.yaml
Dry-run loads every *.yaml file in the target's parent directory
(required so sub-workflow call: targets can resolve — see §5), then
bootstraps the workflow and enters a REPL. At each step, the runtime
renders a banner, the optional precondition line, any gates, and the
step's directive template, then prompts with > . The agent types a mock
LLM response and presses enter. The workflow advances.
A minimal three-step linear-workflow session looks like this:
$ python -m megalos_server.dryrun workflows/example.yaml
=== Step: alpha — First step ===
<directive template rendered here>
> ok
=== Step: bravo — Second step ===
<directive template rendered here>
> ok
=== Step: charlie — Third step ===
<directive template rendered here>
> ok
Workflow complete
Exit code 0. Three mock responses drove three steps to
workflow_complete.
Prompts the agent may encounter.
-
Step response prompt (
>). The default prompt at every step. Whatever the agent types is fed to the runtime as if an LLM had produced it. For steps with anoutput_schema, the response must be valid JSON matching the schema; an invalid JSON payload triggers a validation error (with aRetries remaining: Nline on stderr) and re-prompts at the same step. Exhausting the retry budget terminates the run with aMax retries (N) exceededbanner and exit1. -
Branch selection prompt. When a step with a
branches:block is reached, dry-run first consumes a step response, then prints:Branches: 1. first_branch_target 2. second_branch_target [default] Choose branch [1-2, empty = default]:The
[default]tag marks the branch chosen when the selector evaluates against the step response. An empty input resolves to the default. -
Precondition line. Steps with a
precondition:render it at entry asPrecondition: <ref> == "<value>"orPrecondition: <ref> is present, before the directive. -
Skip surface (
Skipped: <step_id>). When a step's precondition is unmet at runtime, dry-run does not render that step's banner or prompt; instead it printsSkipped: <step_id>on stdout and advances to the next step. The earlier-renderedPrecondition:line on the prior step is the implicit explanation. Skip detection only flags precondition-bearing steps; alternate branches that simply weren't taken are not "skipped" — they are an unreached path. A run that unintentionally skips a step still terminates withworkflow_completeon exit0, so the only signal of an unintended skip is theSkipped:line in stdout. Read everySkipped:line and confirm the precondition that suppressed the step was meant to suppress it. -
EOF (Ctrl-D). Closing stdin mid-workflow aborts with
Dry-run aborted by user (EOF)on stderr and exit1.
Sub-workflow descent. When a step's call: target resolves to a
sibling workflow, dry-run prints Entering sub-workflow <name> and
indents the child's banners by two spaces per nesting level. On the
child's workflow_complete, dry-run prints Returned from sub-workflow
and resumes the parent at the next step.
Clean-directory discipline. Because dry-run loads every *.yaml in
the parent directory, a broken sibling YAML blocks the target from
loading. If the bootstrap-time error banner names a file other than the
target, a sibling is at fault — fix it, or move the target into a
directory that contains only it and its call: targets. The framing
paragraph dry-run prints on load failure is:
Failed to load workflows from <parent_dir>: <exception>
Note: dry-run loads all *.yaml files in the parent directory (required
for sub-workflow 'call' target resolution). If the error above names a
file other than <target.yaml>, a sibling workflow has a problem — fix
it, or move <target.yaml> to a directory containing only it and its
call targets.
4. Scripted-responses fixtures
For regression replay and CI smoke-tests, dry-run accepts a
--responses-file pointing at a YAML fixture that drives the workflow
non-interactively:
python -m megalos_server.dryrun workflows/my_workflow.yaml \
--responses-file tests/responses/my_workflow_happy.yaml
The responses file has a fixed shape: a top-level mapping with a required
version: 1 key and a required entries: list. Each entry is a mapping
with a required step_id: and exactly one of response: (for step
content) or branch: (for branch-selection prompts) — never both, never
neither. A minimal three-step happy-path fixture:
version: 1
entries:
- step_id: alpha
response: ok
- step_id: bravo
response: ok
- step_id: charlie
response: ok
Ordered consumption. At the root workflow frame (descent depth 0),
entries are consumed in order: the first entry must match the first
step's step_id, the second entry the second step, and so on. A spelling
or order drift aborts the run. During sub-workflow descent (depth > 0),
dry-run walks past non-matching entries until a match is found, so
parent-frame and child-frame entries may be interleaved in file order —
but each frame's own entries must appear in that frame's execution order.
Drift-detection banners. The parser and scripted-consume layer print
verbatim banners on stderr before exiting 1. Authors will see exactly
these strings when a fixture is wrong:
- Missing version field:
Responses file missing required 'version' field. Expected: version: 1 - Unknown version:
Unknown responses-file version: <n>. Supported: [1] - Response/branch mutex (both present):
... must have exactly one of 'response' or 'branch', not both. - Response/branch mutex (neither present):
... must have exactly one of 'response' or 'branch'. - Step-id drift at root depth:
Script entry expected step_id=<entry_step>, REPL at step_id=<repl_step> - Entry-type mismatch (step expected response, script gave branch):
At step <repl_step>, expected step response but script provided branch selection(or, symmetrically,expected branch selection but script provided response) - Exhaustion (too few entries):
Responses file exhausted at step <repl_step> (expecting: <expected_label>) - Unused entries at completion (too many entries):
Responses file had <N> unused entries after workflow completion
The unused-entries guard runs on workflow_complete immediately before
the run exits; a longer-than-needed fixture is a scripting error, not a
silent pass. Every drift case exits 1, so CI that treats exit 0 as
green will fail loudly on any of the above.
Coverage parity with interactive mode. Scripted mode drives the same
surface the interactive mode drives: output_schema validation and
retry banners (an invalid response: triggers the same validation-error
re-prompt), branch selection (via branch: entries), and sub-workflow
descent (with interleaved parent/child entries). A scripted fixture that
threads a workflow end-to-end is the canonical regression artifact for
that workflow.
5. Load-time cross-workflow errors — unknown_call_target and call_cycle_detected
Two cross-workflow errors surface at workflow-set load time, not under per-file validation. Both are emitted from the same pass the validating-workflows skill flagged as a forward reference:
A nuance worth internalising: these two errors surface at workflow-set load time, not under a per-file
python -m megalos_server.validate <file>run. The per-file validator call covers only the outer workflow's own structural and semantic rules; it cannot know about the other workflows the server will eventually load. The call-target and cycle checks run inside the server'screate_app()after every workflow YAML in the workflow directory has been loaded individually, and they run before the MCP app is constructed. In practice this means: an agent that authors or modifies a workflow with acall:step must either load the full workflow set into the server to surfaceunknown_call_targetandcall_cycle_detected, or use the test surface covered intesting-workflows. A green per-file validate run is a necessary but not sufficient condition for a loadable server.
This section is that test surface. The verbatim f-string shapes the loader emits are:
Workflow '<parent_name>' step '<step_id>' calls unknown workflow '<target>' (code: unknown_call_target)
call cycle detected: <wf_a> -> <wf_b> -> ... -> <wf_a> (code: call_cycle_detected)
How dry-run surfaces them. Dry-run's bootstrap calls
create_app(workflow_dir=<target's parent directory>). create_app()
loads every *.yaml in that directory, then runs the cross-workflow
call-resolution pass before the MCP app is constructed. If any
call: target does not resolve to a loaded workflow, or if the call
graph contains a cycle, the load fails with the error code above and
dry-run exits 1 at bootstrap, before the REPL starts. The failure is
wrapped in the framing paragraph shown in §3 ("dry-run loads all
*.yaml files in the parent directory …") so the author can see which
file is implicated.
The practical implication. A per-file python -m megalos_server.validate <file> run cannot surface
unknown_call_target or call_cycle_detected: it sees only the one
file on the command line, not the workflow set. Dry-run on a single
target does surface them, because it loads the target's entire
parent directory. For a workflow with call: digressions, the agent's
verdict loop is:
- Run per-file
validateon each workflow individually — catches structural and semantic errors local to each file. - Run
python -m megalos_server.dryrun <target.yaml>on any workflow that hascall:steps, or on one workflow that transitively reaches them — catchesunknown_call_targetandcall_cycle_detectedat bootstrap across the full directory. - For an even stronger signal against the actual server's loader, run
python main.py(domain-repo entrypoint) orfastmcp inspect main.py:mcp— both exercisecreate_app()the same way dry-run does. These are the deployment-side verdict tools (deploying-workflows§8) and surface exactly the same cross-workflow errors.
A green dry-run at the target confirms per-workflow structural correctness and cross-workflow graph correctness for the workflow set in that target's parent directory. That is the gap the per-file validator leaves open, and closing it is the job of this section and of §§6+.
For the verbatim error messages, what each one means, and the fix for
each, see references/load-time-errors.md.
6. Live-run as bounded secondary
Dry-run covers roughly 80% of the teach-value for a workflow author:
the production execution path, schema validation, branch
resolution, sub-workflow descent, and the load-time cross-workflow
errors of §5 all surface in dry-run without an LLM, without an MCP
round-trip, and without network cost. The remaining cases — where the
agent needs real-LLM output to drive a branch the scripted file
cannot fairly mock, or where an mcp_tool_call step must actually
round-trip to a registered server — are what the live-run surface is
for. Live-run is deliberately a secondary surface and is deliberately
thin in this skill: it is a pointer, not a tutorial.
When to reach for live-run. Two cases only:
- LLM-output-dependent branches. The workflow has a
branches:block whose selector depends on the content of a real LLM response in a way the agent cannot fairly script. In dry-run the agent is typing the mock response and then the branch selection — it is grading its own homework. A live run against a real model is the honest test. - Real-registry
mcp_tool_calldigressions. The workflow has anmcp_tool_callstep and the agent needs to confirm the call actually reaches the registered server, the auth env var resolves, and the response shape matches what the workflow expects downstream. Dry-run's structural checks onmcp_tool_callare real, but the network round-trip and env-var resolution are not exercised.
The recipe. Stand up a local megálos server from the domain repo
that holds the workflow, then point any MCP-compatible client at it.
The server side is the deploying-workflows skill's territory; this
skill only names the shape:
$ python main.py
or, for the FastMCP CLI form,
$ fastmcp run main.py:mcp
Either command binds the HTTP transport on FASTMCP_HOST:FASTMCP_PORT
and keeps the process up. From there, point an MCP-compatible client
at the local endpoint (any MCP client works) and drive the workflow
through the client the same way an
end user would. Server-start mechanics, env-var wiring, and the
deploy.sh pre-flight live in deploying-workflows §§3–6; client
configuration and connection-add UX belong to whichever client the
operator chose and are out of scope here.
What live-run does not earn. Client authoring, SDK selection, client-library tutorials, LLM-integration patterns, and real-LLM cost/quality discussion are all out of scope for this skill. Live-run is a verification surface, not a development topic. If a workflow passes dry-run — interactive and scripted — and passes the per-file validator, a live run is confirmation, not discovery. Dry-run stays primary; live-run closes the real-LLM and real-network gap and nothing wider.
7. Relationship to sibling skills
This skill sits third in the four-skill author's loop — after authoring and validating, before deploying. The boundaries are sharp on purpose.
authoring-workflows. You write the workflow YAML against the
exported JSON Schema and the authoring guidance there. When you are
ready to check that it behaves the way you intended under the runtime,
you hand it to this skill. Authoring produces the artifact;
testing-workflows verifies its behaviour before it ships.
validating-workflows. Validation is the fast, offline,
per-file gate: structural and semantic rules, JSON Schema conformance,
step-reference resolution, single-file registry cross-check. It
answers "is this one workflow loadable in isolation?" — nothing about
behaviour, nothing about siblings. The testing-workflows surface picks
up where validation stops: behavioural checks under the production
runtime (output_schema retries, branch resolution, sub-workflow
descent) and the cross-workflow load-time surface.
Closing the forward-reference. The validating-workflows
skill flags unknown_call_target and call_cycle_detected as errors
that surface at workflow-set load time, not under per-file validate,
and forwards to this skill for the surfacing mechanism. §5 above is
that forwarding target. Dry-run's create_app() bootstrap runs the
cross-workflow resolution pass over the target's parent directory, so
running python -m megalos_server.dryrun <target.yaml> on a workflow
with call: digressions surfaces both error classes before the REPL
starts. A reader arriving here from the validating-workflows
Anti-scope paragraph should land on §5 for the verbatim error shapes
and on references/load-time-errors.md for the catalogue.
deploying-workflows. Deployment is the ship gate: domain-repo
layout, entrypoint contract, deploy.sh pre-flight, Horizon flow. A
green dry-run is a pre-flight for ship, not a replacement for
deploy.sh: the authoritative pre-flight still runs the
python main.py + fastmcp inspect main.py:mcp verdict triple
against the real domain repo. Dry-run catches the same load-time
cross-workflow errors those commands catch (they share create_app()),
but deploy.sh also checks domain-repo shape, which dry-run has no
opinion about. Treat dry-run as the daily inner loop and the deploy
verdict triple as the outer gate.
The author's day-to-day rhythm: edit in authoring, gate in validating, behave-check in testing, ship-gate in deploying. Each skill answers one question the others cannot.
8. Worked example — output_schema retry and a mis-defaulted branch
This example walks a narrower slice than the skill's empirical test:
two small workflows that exercise two of the five stress classes —
a mis-defaulted branch and an output_schema violation caught
inside a scripted run. The worked example deliberately does not
cover unknown_call_target (that is the empirical test's job) so the
reader walks away having learned the shape without having copy-pasted
the harder case.
8a. Starting conditions
Two workflows on disk in a clean directory:
collect.yaml— a two-step collect-and-summarise workflow, modelled on the megálosdemo_validationfixture. Step 1 has anoutput_schemarequiringtitle(string, 3+ chars),goals(array of 3+ strings), andconfirmed(booleantrue). Step 2 summarises what step 1 collected.route.yaml— a single-step workflow with abranches:block routing to one of two terminal banners depending on the step response. The[default]tag is on the wrong branch (the author intendedshortas default but taggedlong).
Both workflows pass per-file validation individually.
8b. Class 4 — output_schema violation inside a scripted run
The agent writes a scripted-responses file to regress the
collect-and-summarise flow. First attempt — collect_happy.yaml:
version: 1
entries:
- step_id: collect_info
response: '{"title": "xy", "goals": ["only one"]}'
- step_id: summarize
response: summary line
Running it:
$ python -m megalos_server.dryrun workflows/collect.yaml \
--responses-file tests/responses/collect_happy.yaml
=== Step: collect_info — Collect Project Information ===
<directive rendered>
Validation failed:
- title: string shorter than 3 characters
- goals: array has fewer than 3 items
- confirmed: required property missing
Hint: Submit JSON with title (string, 3+ chars), goals (array of 3+
strings), and confirmed (must be boolean true).
Retries remaining: 2
The scripted response: payload does not satisfy the step's
output_schema, so dry-run emits the validation banner and
re-prompts at the same step. In scripted mode, the next entry in the
file must drive the same step again. The first attempt only had one
entry per step, so the run fails with a scripted-exhaustion banner:
Responses file exhausted at step collect_info (expecting: step response)
Exit code 1. The fix is a scripted file that threads the retry
explicitly — one invalid submission, then a valid one, then the
summariser:
version: 1
entries:
- step_id: collect_info
response: '{"title": "xy", "goals": ["only one"]}'
- step_id: collect_info
response: '{"title": "Project X", "goals": ["a", "b", "c"], "confirmed": true}'
- step_id: summarize
response: summary line
Re-running:
$ python -m megalos_server.dryrun workflows/collect.yaml \
--responses-file tests/responses/collect_happy.yaml
=== Step: collect_info — Collect Project Information ===
<directive rendered>
Validation failed:
- title: string shorter than 3 characters
...
Retries remaining: 2
=== Step: collect_info — Collect Project Information ===
<directive rendered>
=== Step: summarize — Summarize the Project ===
<directive rendered>
Workflow complete
Exit code 0. The regression artifact captures both the failure and
the recovery.
8c. Class 1 — mis-defaulted branch
The route.yaml workflow's author believes typing an empty branch
selection resolves to short, because that was the intended default.
In interactive dry-run:
$ python -m megalos_server.dryrun workflows/route.yaml
=== Step: route — Route Request ===
<directive rendered>
> pick
Branches:
1. short
2. long [default]
Choose branch [1-2, empty = default]:
Pressing enter resolves to branch 2 (long) — the [default] tag is
on long, not short. Dry-run surfaces the actual default the
runtime will take, not the one the author intended. The fix is to
swap the default: key on the branches: block in route.yaml and
re-run — the [default] tag now sits on short, and empty input
resolves there.
8d. What this example does not exercise
No call: step, therefore no unknown_call_target and no
call_cycle_detected. No mcp_tool_call step. No sub-workflow
descent and no precondition. The empirical test for this skill
combines those axes to probe transfer; the worked example stays
narrower so the reader walks through two clean failure classes without
conflating five.
9. Common mistakes
Each item below names the symptom the author sees, the cause, and the
fix. Most surface through stderr banners the dry-run prints verbatim
before exiting 1.
-
Symptom:
Responses file missing required 'version' field. Expected: version: 1on stderr, exit1. Cause: The scripted-responses YAML has no top-levelversionkey. Fix: Addversion: 1as the top-level mapping's first key. The only supported value is1. -
Symptom:
Unknown responses-file version: <n>. Supported: [1]. Cause:version:is present but holds a value other than1. Fix: Setversion: 1. The supported set is literally[1]. -
Symptom:
... must have exactly one of 'response' or 'branch', not both.(or... must have exactly one of 'response' or 'branch'.when neither is present). Cause: A responses-file entry supplies bothresponse:andbranch:on the same mapping, or neither. Fix: Every entry is either a content response (response: <str>) or a branch selection (branch: <target>). Branch entries are only valid at the branch-selection prompt that follows a branching step's content response. -
Symptom:
Script entry expected step_id=<entry>, REPL at step_id=<repl>. Cause: At root depth, the scripted entries are out of order or thestep_id:is misspelled against the workflow's actual step ids. Fix: At depth 0, entries must appear in the workflow's step order. Check the spelling ofstep_id:against the workflow YAML. During sub-workflow descent (depth > 0) dry-run walks past non-matching entries, but the root frame is strict. -
Symptom:
At step <id>, expected step response but script provided branch selection(or the reverse). Cause: A branching step's scripted entries are shaped wrong. A branching step needs a contentresponse:entry first (the mock LLM output), then abranch:entry at the selection prompt. A plain (non-branching) step needs only aresponse:entry. Fix: For a branching step, write two entries: oneresponse:and onebranch:. For a plain step, write oneresponse:. -
Symptom:
Responses file exhausted at step <id> (expecting: step response). Cause: The scripted file has too few entries to reachworkflow_complete, typically because a retry was not accounted for. Fix: Count the steps the workflow will actually take, including any retries the author is deliberately driving, and provide one entry per prompt. -
Symptom:
Responses file had <N> unused entries after workflow completion. Cause: The scripted file has leftover entries after the workflow reachedworkflow_complete— a silent pass would mask a script-authoring error. Fix: Trim the file to exactly the entries the workflow consumes. The unused-entries guard runs immediately before exit, so a longer file always exits1. -
Symptom:
Failed to load workflows from <parent_dir>: ...and the error implicates a sibling YAML, not the target. Cause: Dry-run loads every*.yamlin the target's parent directory (required forcall:target resolution). A broken sibling blocks the target from loading. Fix: Fix the sibling, or move the target into a directory that contains only it and itscall:targets. See §3's clean-directory discipline. -
Symptom:
Skipped: <step_id>appears on stdout, exit0, but the author intended that step to run. Cause: The step has aprecondition:whose runtime resolution is false — the referenced prior-step output was missing or did not match. Skip is silent on stderr and does not flip the exit code: a run with unintended skips still terminatesworkflow_complete, so theSkipped:line on stdout is the only signal. Fix: Re-read the prior step'soutput_schemaand the precondition's<ref>clause. Confirm the referenced field is populated by some live path. If the precondition is correct but the prior mock response did not satisfy it, supply a different mock response in the next dry-run; if the precondition itself is wrong, fix it in the workflow YAML. -
Symptom: Author treats the
>prompt as a user-input prompt and types what a human end-user might say. Cause: Misreading the prompt's contract. The>prompt is the mock LLM response at the current step — the string a live model would have produced, not the string a human user would have typed into a client. Fix: Type what the LLM would have emitted. For steps with anoutput_schema, that is a JSON payload matching the schema. For free-form steps, it is whatever the runtime would read from an LLM's text output. -
Symptom: Author assumes a green dry-run proves the workflow is correct under a real LLM. Cause: Dry-run never calls an LLM. It verifies the runtime's structural behaviour (transitions, validation, retries, branch resolution, descent) given mock responses the author chose. It does not prove that a real LLM will emit responses that drive the same path, and terminal envelopes prove only that the runtime's finish state is well-formed, not that the artifact is semantically right. Fix: Use dry-run for structural verdict; use the live-run surface (§6) when real-LLM behaviour or real-registry
mcp_tool_callround-trip is the question.
10. References
references/scripted-responses-template.md— YAML shape, annotated example, and the verbatim drift-detection banners emitted by the responses-file parser and scripted-consume layer, each with Means and Fix.references/load-time-errors.md— the two cross-workflow load-time errors (unknown_call_targetandcall_cycle_detected) with verbatim message shapes, a preamble pinning thecreate_app()boundary, and Means/Fix entries.
These references are consulted by message — when dry-run prints one of the banners they catalogue, the agent opens the matching file and looks up the entry.