Finance Audit — Personal CFO Report
Turn a folder of raw statement files into a Personal CFO Report: verified data plus a narrative layer that answers What happened? Why? What should I do next?
Accuracy-first: amounts are read verbatim and reconciled against the running balance — nothing is invented.
Bank-agnostic. Works with any bank or account export, given a transaction list. Columns, header rows and date order (DMY/MDY/YMD) are auto-detected per file; the merchant/counterparty cleaner is rail-agnostic (UPI, POS, ACH, SEPA, card, wire, …). Nothing is hardcoded to a specific bank — you confirm the auto-detected mapping and tune the category keywords.
Pipeline (5 stages)
ingest -> parse -> categorize (loop to >=99%) -> analyze -> report (serve + hot reload)
The scripts live in this skill. Run them against the user's audit folder (the folder containing the statement files). Set two variables first:
SKILL="<absolute path to this skill directory>" # the folder containing this SKILL.md
ROOT="<absolute path to the audit folder>" # where the statement files are
Privacy — never commit real data
All statement files, balances, narrations, and generated output stay local
in ROOT/. The skill repo only ships templates, example config, and dummy
fixtures (assets/fixtures/sample_analysis.json).
Gitignored (never push): output/ (including report.html, analysis.json,
audit.db), statement files, and live config/mapping.json / overrides.
Optional labels in config/mapping.json: account_label or account_last4 for
the sidebar footer (defaults to "Primary account"). Business display name in
config/business.json → display_name (defaults to "Business").
Assume statement files are already downloaded into ROOT (PDF / XLSX / XLS / CSV / TXT).
Pick the runtime (the user prefers Node.js):
- Use Node when
nodeis available AND no input is a PDF. - Use Python when any input is a PDF (Node path can't parse PDF) or
nodeis unavailable.
Detect and install dependencies:
# Node path (preferred)
command -v node && node --version # needs Node >= 22 for built-in sqlite
( cd "$ROOT" && npm install xlsx ) # ONLY for .xls/.xlsx; CSV/TXT need nothing
# Python path (fallback / required for PDF)
python3 --version
pip install openpyxl "xlrd==2.0.1" pdfplumber pandas # + pandas for analyze stage
Define how you'll invoke a stage ($STAGE = ingest|parse|categorize|analyze|report):
# Node: node "$SKILL/scripts/node/$STAGE.mjs" --root "$ROOT" [args]
# Python: python3 "$SKILL/scripts/python/$STAGE.py" --root "$ROOT" [args]
Step 1 — Ingest & detect
node "$SKILL/scripts/node/ingest.mjs" --root "$ROOT" # or python3 .../ingest.py --root "$ROOT"
This previews every file and writes config/mapping.json with an auto-detected
header row, column roles, and an explicit date_format. Read the preview and
verify mapping.json — banks differ. If an entry has a _date_note, the date
order was ambiguous (e.g. 03/04/2024); confirm date_format with the user.
Each file entry needs: header_row, data_start_row, date_format (a strptime
pattern like %d/%m/%Y, or auto), columns (role → 0-based index), and
amount_mode:
debit_credit— separate withdrawal + deposit columns (most common)signed— oneamountcolumn whose sign indicates directionamount_type— anamountcolumn plus atypecolumn holding DR/CR
Fix any wrong indices before parsing.
Step 2 — Parse, normalize, dedupe, validate
node "$SKILL/scripts/node/parse.mjs" --root "$ROOT" # or .../parse.py
Writes output/audit.db (SQLite). The debit/credit split is derived from the
signed amount so it always matches the balance chain (correctly handling refunds
encoded as negative withdrawals). Require the integrity check to PASS (or
n/a when there is no balance column). If it FAILs:
- Re-check
mapping.jsoncolumns /data_start_row/date_format. - Re-read the ingest stdout preview (re-run ingest if needed); confirm you didn't include header/footer rows.
- Per-file
chain_issues > 0points to the exact rows that mis-parsed.
Do not proceed to categorization until parsing reconciles.
Step 3 — Categorize (loop until >= 99% by value)
cp "$SKILL/config/categories.example.json" "$ROOT/config/categories.json" # first run only
# For a business/freelancer account, also:
cp "$SKILL/config/business.example.json" "$ROOT/config/business.json" # then edit it
node "$SKILL/scripts/node/categorize.mjs" --root "$ROOT" # or .../categorize.py
Rule precedence: overrides.json > business.json > categories.json; highest
priority wins. Categorization is script-first; reach 99% like this:
- Run categorize. Read the coverage summary and
output/uncategorized.csv(merchants grouped, largest value first). - For merchants whose identity is obvious from the narration, add rules to
config/overrides.jsonyourself (see format below) — no need to ask. - For ambiguous merchants (unclear who/what, or personal-vs-business unclear), ask the user a concise question, then add the rule.
- Re-run categorize. Repeat until value coverage ≥ 99%.
config/overrides.json format:
{ "rules": [
{ "keywords": ["ACME", "ACME CORP"], "primary": "Income", "sub": "Acme retainer", "when": "credit" },
{ "keywords": ["BIGRENT"], "primary": "Housing", "sub": "Rent", "is_business": 0 }
] }
Aim for every transaction to carry a primary_cat and sub_cat. The factual
P2P-counterparty fallback labels transfers but counts as "needs review" — convert
the important ones into real rules.
Step 4 — Analyze (build output/analysis.json)
python3 "$SKILL/scripts/python/analyze.py" --root "$ROOT"
Computes FY slices, category reports, CFO narrative, income engine, burn summary,
problem areas, and business post-mortem from output/audit.db. Writes
output/analysis.json (and optional findings.md locally — never commit).
Requires pandas. Tune config/business.json → display_name for the business
section label (default "Business").
Step 5 — Generate the report (serve + hot reload + auto-open)
python3 "$SKILL/scripts/python/report.py" --root "$ROOT" --serve
Builds output/report.html from analysis.json + ledger rows embedded inline.
With --serve, opens http://127.0.0.1:8765/ and hot-reloads when db/config changes
(re-runs analyze + re-renders). Use --refresh-analysis to force re-analyze on a
one-shot build.
Report UI (output/report.html) — Money.Audit hybrid dashboard
Shell: rounded app frame, light sidebar with FY list + section nav (scroll-spy), full-width main column. Title: What matters about your money.
Look: Fraunces serif + Spline Sans UI; warm cream shell (#f7f6f3), gold
accent. Not generic gray SaaS.
Sections (sidebar): Snapshot, Income engine, Monthly burn, Problem areas, Business post-mortem, Cash & movement, Action plan, All categories, Ledger (search + filters).
Data embedded: const ANALYSIS = … and const LEDGER = … — no external fetch.
Sidebar footer shows account_label from mapping (never hardcode account numbers in
the template).
Motion: emil-design-eng — :active scale(0.98), bar width transitions, reduced-motion
safe. Style guide: references/ui_style_guide.md.
Template: assets/report_template.html. Dummy fixture for docs:
assets/fixtures/sample_analysis.json.
When changing report UI: edit the template + style guide; never commit real
output/report.html. Regenerate locally via analyze → report.
Report content
- Snapshot — health score, metrics, three things CFO quote.
- Income / burn / problems — decision-first sections with drill-down to categories.
- Business review — when
config/business.jsonis present. - Ledger — searchable transaction evidence (first 350 matches shown).
Flags: --port N, --no-open, --refresh-analysis. Without --serve, writes
output/report.html once (runs analyze first if analysis.json is missing).
Methodology: references/financial_frameworks.md.
Note: The hybrid Money.Audit UI uses the Python analyze + report path. Node
report.mjs still renders the legacy CFO template for basic audits without analyze.
Layout produced in the audit folder
ROOT/
├── <statement files> # input (PDF/XLSX/XLS/CSV/TXT)
├── config/
│ ├── mapping.json # column mapping (verify after ingest)
│ ├── categories.json # category rules (from the example)
│ ├── business.json # optional: business income/payroll/infra
│ └── overrides.json # your labels added during the loop
└── output/
├── audit.db # SQLite source of truth
├── analysis.json # analyze stage (local only — PII)
├── uncategorized.csv # review queue (regenerated each categorize run)
└── report.html # Money.Audit dashboard (local only — PII)
The skill only writes what's needed: editable config, the SQLite db that carries
state between stages, the review queue, and the final report. Intermediate dumps
(the ingest preview, the report JSON) are printed/embedded, not written as extra
files. mapping.json and audit.db must persist on disk because each stage runs
as a separate process.
Notes
- The same
mapping.json/config/ SQLite schema work across both runtimes; you can switch runtimes mid-audit. - Keep raw statement files out of version control (they contain PII).
- For full schema, validation math, and config reference, see reference.md.
- For analysis frameworks (savings rate, 50/30/20, benchmarks, recommendation logic), see references/financial_frameworks.md.
Common requests this skill handles
- "Audit / categorize my bank statements and make a report" — full pipeline.
- "What's my Personal CFO report for FY2024-25?" — use period selector after report.
- "What's my savings rate and how do I improve it?" — executive summary + action plan.
- "Where am I overspending?" — spending intelligence + counterfactuals.
- "What should I stop/start/continue?" — action plan section.
- "Ask my CFO about runway / debt / investments" — use Ask your CFO prompts with your agent.
- "Improve the report UI / make it full width / fix the layout" — edit
assets/report_template.htmlper references/ui_style_guide.md; regenerate locally with analyze → report. Never commitoutput/report.html.