Paper Format Agent
Use this skill for local-first academic document formatting work. It is designed for .docx papers and a separate format specification in .docx, .doc, or .txt.
Safety Defaults
- Never upload private papers or format guides to external services unless the user explicitly asks for it.
- Treat student papers, manuscripts, reviewer comments, and school templates as sensitive data.
- Do not change academic content unless the user explicitly opts in. Default to formatting-only changes.
- Keep reports and generated documents in a user-provided output directory.
Standard Workflow
- Confirm the input files:
--format-file: format guide or template text.--paper-file: source paper, currently.docx.--out-dir: output directory for reports and generated documents.
- Run the CLI in formatting-only mode:
python -m paper_format_agent.cli \
--format-file "format_guide.docx" \
--paper-file "paper.docx" \
--out-dir "./output" \
--engine auto \
--strict-required-sections
- Inspect generated artifacts:
formatted_paper_v3.docx: repaired document.format_rules.json: extracted rules.format_report.json: machine-readable score and checks.format_report.html: human-readable report.modify_log.json: formatting operations.engine_report.json: post-processing engine result.
- Check the content guard fields in
format_report.json:content_changedshould normally befalse.content_guard_enforcedshould normally betrue.
- If a rule is wrong, update rule extraction or scoring logic and add a minimal test case.
Validation
Run these before handing work back:
python tools/validate_skill.py
python -m unittest discover -s tests -p "test_*.py"
python tools/compile_check.py
python tools/release_audit.py
When Adding Template Support
- Add the smallest representative rule text needed to reproduce the behavior.
- Prefer deterministic rule extraction over LLM-only interpretation.
- Add tests for extracted margins, font size, line spacing, required sections, headings, captions, or references.
- Do not commit real student documents. Use synthetic text or anonymized fixtures.
When Reviewing Output
Read format_report.json first, then inspect format_report.html if the user needs a human-readable summary. Prioritize issues in this order:
- Content changed unexpectedly.
- Required sections or headings were misclassified.
- Page setup, margins, fonts, or line spacing are wrong.
- Captions, references, tables, or numbering look wrong.
- Report wording or score explanation is unclear.
Contribution-Friendly Tasks
Good first PRs usually fit one of these buckets:
- Add a synthetic test for a school, journal, or conference formatting rule.
- Improve a rule extractor with a narrowly scoped regex or parser.
- Improve report clarity without changing scoring semantics.
- Add a regression case to
docs/regression_manifest.sample.json. - Improve this skill workflow or
agents/openai.yamlmetadata.