reduce-hallucination

Turn the verified techniques interrogation science uses to get a knowledgeable witness to tell the truth into a standard audit for LLM agents. Each LLM node is a witness. The handoff between nodes is the transcription of testimony. User corrections and task records are physical evidence.

Theory and evidence live in references/. Read references/research-map.md for the citations behind each claim.

Three iron rules

No accusation without evidence (Wellman's rule). Every finding must quote the actual prompt, schema, or output it is about. A suspicion without a quote goes in a "to verify" list, not in the findings.
Surface signals do not prove truth. Fluent, confident, detailed text is not more correct than terse text. People read deception from behavior at about chance. Judge content only: grounding, consistency, whether the cited span exists.
Self-correction needs a new external signal. "Are you sure? Check again" with no new evidence degrades accuracy. Every re-check must inject retrieved results or validator output.

The three pillars of a good citation

Citation quality decomposes into three checks, in ascending order of difficulty and value:

Pillar	Question	Status today
Faithful	Is the quote verbatim-real and the claim non-fabricated?	Largely solvable: verbatim string-match + abstention.
Relevant	Is the source on topic for the question?	Largely solvable: retrieval quality + consistency probing.
Supportive	Does the source support the claim, vs contradict it, vs stay neutral?	The frontier. Hard, and the rare "contradict" class is where systems are weakest.

A citation that resolves to a real, on-topic passage but does not actually support the claim ("misgrounding") is worse than no citation, because it manufactures trust over a wrong claim and disarms the reviewer's only working defense (checking the content). Showing a citation without verifying it backend makes hallucination more dangerous, not less.

Audit protocol

Phase 1 — interrogate each witness (prompt + schema red flags)

For each LLM node, run the red-flag list. Each flag is one interrogation violation.

#	Red flag (what to look for)	Why it matters	Harm
R1	An enum field with no `null` / `unknown` option; a one-way "output a confirmation of X" instruction.	Forcing an answer produces compliance (false-confession analogue).	Forced guessing, confident fabrication.
R2	Prose grants an abstention ("use null if missing") but the schema has no null.	Schema wins, the abstention is void.	Silent fabrication.
R3	The prompt asks for evidence/citation, but nothing downstream checks it.	Telling without verifying teaches the model to fabricate more convincing citations.	Fake anchors.
R4	Directional instructions: "be conservative", "assume", "infer from", "default to".	One-way presumption; combined with R1 it produces systematic bias.	Skewed output in one direction.
R5	An aggregate field with no defined unit ("total years", "overall score").	A measurement needs its unit declared before it can be locked down.	Definition mismatch read as an extraction error.
R6	Few-shot examples that are off-task or that model citation-free reasoning.	Examples set the expectation for a "good" answer.	Teaches the wrong output style.
R7	The prompt claims the node can do something it has no tool for.	False premise.	Fabricated process narration.
R8	A "fact" handed in by the user or an upstream node, used as truth with no source tag.	Only confront with verified evidence.	Premise pollution.
R9	A template with a mandatory slot and no empty branch.	The template version of forcing an answer.	A blank gets filled with the nearest value.
R10	Re-check / retry written aggressively ("you were wrong, fix it") with no new evidence.	Sycophancy flips correct answers.	Right answers overturned.

Record each as: node | R# | quoted text | one-line consequence.

Phase 2 — audit the transcription (node boundaries / data flow)

This is where most pipeline hallucination happens. Draw the data flow and check each edge:

Does uncertainty cross the boundary? Upstream confidence / missing_fields / null / notes — does anything downstream consume them, or does a template flatten them into an assertion? Look for "dead-letter" uncertainty: produced but never read.
Is provenance lost? A value the upstream reasoning admits is "assumed/inferred" becomes an unmarked fact downstream. The schema should carry stated | inferred | absent.
Do fact categories cross wires? Configuration or requirement data rendered as if the user/document asserted it.
Does an aggregate change unit? "total medical years" upstream becomes "years as title X" downstream.
Is the LLM doing deterministic computation? Date math, sums, counts in a prompt should move to a code step.

Phase 3 — match physical evidence (when task records exist)

Pull intermediate node outputs for a sample of tasks (prefer ones with user corrections or failures).
Compare final output vs each node's intermediate output vs the raw input, field by field; locate which boundary first lost the truth.
Find prior inconsistent statements: the same field contradicting itself across nodes.
If corrections are directional (systematically one way), re-check Phase-1 R4 directional instructions; random error points more to input quality.
Classify honestly: "information that was never in the source" (a process gap, not a hallucination) vs "information that was in the input but stated wrong" (a real hallucination).

Phase 4 — ranked fixes (by ROI)

Layer 1 — prompt / schema (zero structural change)

Each R1/R2 → add a null/unknown enum + one de-stigmatizing line ("X is a correct, expected answer").
Each R9 → add an empty branch to the template (assertion becomes an open question).
Each R5 → split the unit, or output a range {floor, ceiling}.
Each R3 → add a {value, source, evidence_quote} triple to load-bearing fields.

Layer 2 — a deterministic code validator (zero LLM cost)

Validator	Checks
Enum whitelist	Field value is in the allowed set.
Quote string-match	`evidence_quote` exists verbatim in the cited source; if not, reject or downgrade.
Cross-field consistency	Totals reconcile; `missing_fields` does not contradict filled values.
Unit computation	Date diffs / sums done in code; the LLM only does semantic extraction.
Exit contract	Every user-visible fact slot is non-empty and comes from an allowed source field.

Place it before any node that produces user-visible content. On failure, write a fallback flag and route to an open question or a human.

Layer 3 — an LLM cross-examiner node (LM-vs-LM)

A separate node that gets only the output under review plus the source document (not the polluting context).
Phrase the task to refute: "Try to refute: does every claim in this output trace to the source document? List the claims that do not."
Output {verdict, offending_claims[]}; for high-stakes flows, re-ask reordered and take a consistency vote.
Cost: +1 LLM call per task — gate it to user-visible exits or high-stakes decisions only.

Phase 5 — the report

# Hallucination Audit — {agent}
## One-line verdict
## Evidence overview (nodes / version audited / task sample / correction sample + direction)
## Findings (each: location | R# | quoted text | consequence | fix #)
## Data-flow map and dead-letter channels
## Fix recommendations (Layer 1/2/3, each tagged with the findings it kills)
## To verify (suspicions without enough evidence + how to get it)
## Honest boundary (which corrections are process gaps, not hallucinations; what this audit did not cover)

Supportiveness as the frontier (optional extension)

Faithful + relevant are the floor. The open problem is supportive: does the cited source support, contradict, or stay neutral on the claim? Two notes from the research:

Model it as stance plus degree. Three-way stance (entail / contradict / neutral) misses partial or conditional support, which is most of regulatory and legal text. Add a "partial/conditional" flag that routes to human review.
Supportiveness is relative to intent. The same passage supports "we comply" and contradicts "we have a gap", depending on what the user is arguing. When stance or intent is uncertain, surface both a supporting and a contradicting source and let the user choose. Present direction as a flag inviting review, not a verdict.

Reminders

If a prompt is "shockingly bad", first check whether it is auto-generated boilerplate (repeated Role/Objective/Task is the tell). "A systematic problem with this generator" beats listing the same fault per agent.
Contrast within one agent: which fields are clean and which fail. The difference is usually the answer (fields with an unknown option tend to be clean).
Run the report through a plain-language editing pass: a finding must be legible to someone who has not read the theory.

Related skills

Part of a small family of open agent skills for trustworthy, clear communication:

reduce-hallucination — audit LLM agents; ground, cite, abstain.
better-doc — Classic Style + Smart Brevity.
numbers — make numbers meaningful, and show them cleanly.

zhichao1208/reduce-hallucination

Ask in your favorite AI

Documentation

reduce-hallucination

Three iron rules

The three pillars of a good citation

Audit protocol

Phase 1 — interrogate each witness (prompt + schema red flags)

Phase 2 — audit the transcription (node boundaries / data flow)

Phase 3 — match physical evidence (when task records exist)

Phase 4 — ranked fixes (by ROI)

Phase 5 — the report

Supportiveness as the frontier (optional extension)

Reminders

Related skills

Related Skills

steipete/notion

affaan-m/seo

affaan-m/brand-voice

affaan-m/crosspost

affaan-m/x-api

affaan-m/content-engine