reduce-hallucination
Turn the verified techniques interrogation science uses to get a knowledgeable witness to tell the truth into a standard audit for LLM agents. Each LLM node is a witness. The handoff between nodes is the transcription of testimony. User corrections and task records are physical evidence.
Theory and evidence live in references/. Read references/research-map.md for the citations behind each claim.
Three iron rules
- No accusation without evidence (Wellman's rule). Every finding must quote the actual prompt, schema, or output it is about. A suspicion without a quote goes in a "to verify" list, not in the findings.
- Surface signals do not prove truth. Fluent, confident, detailed text is not more correct than terse text. People read deception from behavior at about chance. Judge content only: grounding, consistency, whether the cited span exists.
- Self-correction needs a new external signal. "Are you sure? Check again" with no new evidence degrades accuracy. Every re-check must inject retrieved results or validator output.
The three pillars of a good citation
Citation quality decomposes into three checks, in ascending order of difficulty and value:
| Pillar | Question | Status today |
|---|---|---|
| Faithful | Is the quote verbatim-real and the claim non-fabricated? | Largely solvable: verbatim string-match + abstention. |
| Relevant | Is the source on topic for the question? | Largely solvable: retrieval quality + consistency probing. |
| Supportive | Does the source support the claim, vs contradict it, vs stay neutral? | The frontier. Hard, and the rare "contradict" class is where systems are weakest. |
A citation that resolves to a real, on-topic passage but does not actually support the claim ("misgrounding") is worse than no citation, because it manufactures trust over a wrong claim and disarms the reviewer's only working defense (checking the content). Showing a citation without verifying it backend makes hallucination more dangerous, not less.
Audit protocol
Phase 1 — interrogate each witness (prompt + schema red flags)
For each LLM node, run the red-flag list. Each flag is one interrogation violation.
| # | Red flag (what to look for) | Why it matters | Harm |
|---|---|---|---|
| R1 | An enum field with no null / unknown option; a one-way "output a confirmation of X" instruction. | Forcing an answer produces compliance (false-confession analogue). | Forced guessing, confident fabrication. |
| R2 | Prose grants an abstention ("use null if missing") but the schema has no null. | Schema wins, the abstention is void. | Silent fabrication. |
| R3 | The prompt asks for evidence/citation, but nothing downstream checks it. | Telling without verifying teaches the model to fabricate more convincing citations. | Fake anchors. |
| R4 | Directional instructions: "be conservative", "assume", "infer from", "default to". | One-way presumption; combined with R1 it produces systematic bias. | Skewed output in one direction. |
| R5 | An aggregate field with no defined unit ("total years", "overall score"). | A measurement needs its unit declared before it can be locked down. | Definition mismatch read as an extraction error. |
| R6 | Few-shot examples that are off-task or that model citation-free reasoning. | Examples set the expectation for a "good" answer. | Teaches the wrong output style. |
| R7 | The prompt claims the node can do something it has no tool for. | False premise. | Fabricated process narration. |
| R8 | A "fact" handed in by the user or an upstream node, used as truth with no source tag. | Only confront with verified evidence. | Premise pollution. |
| R9 | A template with a mandatory slot and no empty branch. | The template version of forcing an answer. | A blank gets filled with the nearest value. |
| R10 | Re-check / retry written aggressively ("you were wrong, fix it") with no new evidence. | Sycophancy flips correct answers. | Right answers overturned. |
Record each as: node | R# | quoted text | one-line consequence.
Phase 2 — audit the transcription (node boundaries / data flow)
This is where most pipeline hallucination happens. Draw the data flow and check each edge:
- Does uncertainty cross the boundary? Upstream
confidence/missing_fields/null/ notes — does anything downstream consume them, or does a template flatten them into an assertion? Look for "dead-letter" uncertainty: produced but never read. - Is provenance lost? A value the upstream reasoning admits is "assumed/inferred" becomes an unmarked fact downstream. The schema should carry
stated | inferred | absent. - Do fact categories cross wires? Configuration or requirement data rendered as if the user/document asserted it.
- Does an aggregate change unit? "total medical years" upstream becomes "years as title X" downstream.
- Is the LLM doing deterministic computation? Date math, sums, counts in a prompt should move to a code step.
Phase 3 — match physical evidence (when task records exist)
- Pull intermediate node outputs for a sample of tasks (prefer ones with user corrections or failures).
- Compare final output vs each node's intermediate output vs the raw input, field by field; locate which boundary first lost the truth.
- Find prior inconsistent statements: the same field contradicting itself across nodes.
- If corrections are directional (systematically one way), re-check Phase-1 R4 directional instructions; random error points more to input quality.
- Classify honestly: "information that was never in the source" (a process gap, not a hallucination) vs "information that was in the input but stated wrong" (a real hallucination).
Phase 4 — ranked fixes (by ROI)
Layer 1 — prompt / schema (zero structural change)
- Each R1/R2 → add a
null/unknownenum + one de-stigmatizing line ("X is a correct, expected answer"). - Each R9 → add an empty branch to the template (assertion becomes an open question).
- Each R5 → split the unit, or output a range
{floor, ceiling}. - Each R3 → add a
{value, source, evidence_quote}triple to load-bearing fields.
Layer 2 — a deterministic code validator (zero LLM cost)
| Validator | Checks |
|---|---|
| Enum whitelist | Field value is in the allowed set. |
| Quote string-match | evidence_quote exists verbatim in the cited source; if not, reject or downgrade. |
| Cross-field consistency | Totals reconcile; missing_fields does not contradict filled values. |
| Unit computation | Date diffs / sums done in code; the LLM only does semantic extraction. |
| Exit contract | Every user-visible fact slot is non-empty and comes from an allowed source field. |
Place it before any node that produces user-visible content. On failure, write a fallback flag and route to an open question or a human.
Layer 3 — an LLM cross-examiner node (LM-vs-LM)
- A separate node that gets only the output under review plus the source document (not the polluting context).
- Phrase the task to refute: "Try to refute: does every claim in this output trace to the source document? List the claims that do not."
- Output
{verdict, offending_claims[]}; for high-stakes flows, re-ask reordered and take a consistency vote. - Cost: +1 LLM call per task — gate it to user-visible exits or high-stakes decisions only.
Phase 5 — the report
# Hallucination Audit — {agent}
## One-line verdict
## Evidence overview (nodes / version audited / task sample / correction sample + direction)
## Findings (each: location | R# | quoted text | consequence | fix #)
## Data-flow map and dead-letter channels
## Fix recommendations (Layer 1/2/3, each tagged with the findings it kills)
## To verify (suspicions without enough evidence + how to get it)
## Honest boundary (which corrections are process gaps, not hallucinations; what this audit did not cover)
Supportiveness as the frontier (optional extension)
Faithful + relevant are the floor. The open problem is supportive: does the cited source support, contradict, or stay neutral on the claim? Two notes from the research:
- Model it as stance plus degree. Three-way stance (entail / contradict / neutral) misses partial or conditional support, which is most of regulatory and legal text. Add a "partial/conditional" flag that routes to human review.
- Supportiveness is relative to intent. The same passage supports "we comply" and contradicts "we have a gap", depending on what the user is arguing. When stance or intent is uncertain, surface both a supporting and a contradicting source and let the user choose. Present direction as a flag inviting review, not a verdict.
Reminders
- If a prompt is "shockingly bad", first check whether it is auto-generated boilerplate (repeated Role/Objective/Task is the tell). "A systematic problem with this generator" beats listing the same fault per agent.
- Contrast within one agent: which fields are clean and which fail. The difference is usually the answer (fields with an
unknownoption tend to be clean). - Run the report through a plain-language editing pass: a finding must be legible to someone who has not read the theory.
Related skills
Part of a small family of open agent skills for trustworthy, clear communication:
- reduce-hallucination — audit LLM agents; ground, cite, abstain.
- better-doc — Classic Style + Smart Brevity.
- numbers — make numbers meaningful, and show them cleanly.