EdgeVox — Claude Code Rules
Offline voice agent framework for robots. Pure-Python package, no cloud dependencies, runs on CPU/CUDA/Metal.
Project layout
edgevox/
├── edgevox/ # Source package
│ ├── audio/ # VAD, mic capture, playback
│ ├── stt/ # STT backends (faster-whisper, sherpa-onnx)
│ ├── llm/ # llama.cpp / Gemma integration
│ ├── tts/ # TTS backends (Kokoro, Piper, Supertonic, PyThaiTTS)
│ ├── core/ # Pipeline orchestration
│ ├── cli/ # CLI entrypoints
│ ├── ui/ # TUI widgets
│ ├── integrations/ # ROS2 bridge, etc.
│ ├── tui.py # Main TUI app
│ └── setup_models.py # Model downloader
│ ├── server/ # FastAPI web UI + WebSocket server
├── webui/ # React frontend (Vite + Tailwind)
├── scripts/ # Utility scripts (model upload, etc.)
├── voices/ # Voice config files
├── docs/ # Project docs
├── website/ # VitePress site
└── pyproject.toml
Entrypoints (see pyproject.toml):
edgevox→edgevox.tui:main(TUI default,--web-uifor web,--simple-uifor CLI)edgevox-cli→edgevox.cli.main:mainedgevox-setup→edgevox.setup_models:main
Supported languages & backends
| Language | STT | TTS |
|---|---|---|
| English, French, Spanish, etc. | faster-whisper | Kokoro |
| Vietnamese | sherpa-onnx (zipformer) | Piper |
| German, Russian, Arabic, Indonesian | faster-whisper | Piper |
| Korean | faster-whisper | Supertonic |
| Thai | faster-whisper | PyThaiTTS |
Models are hosted on nrl-ai/edgevox-models (HuggingFace) with fallback to upstream repos.
Architecture principles
- Plug-and-play, customizable by default. Every component — STT backend, TTS backend, LLM, VAD, agent loop behavior, pipeline stage, tool, skill, hook — must be swappable without editing core code. Prefer Protocols, registries, and decorators over hard-coded paths. New behavior lands as a new plugin/hook/backend, not as a patch to an existing module. If you find yourself adding a conditional to core for a specific use case, step back and extract it into an injection point instead.
Agent harness architecture
The agent harness (edgevox/agents/ + edgevox/llm/hooks_slm.py + edgevox/llm/tool_parsers/) is fully documented under docs/documentation/:
agent-loop.md— the six-fire-point loop, parallel dispatch, handoff short-circuit.hooks.md— hook authoring contract, built-ins, ordering rules.memory.md—MemoryStore/SessionStore/NotesFile/Compactor.interrupt.md— barge-in signals + cancel-token plumbing.multiagent.md— Blackboard, BackgroundAgent, AgentPool.tool-calling.md— parser chain + grammar-constrained decoding roadmap.
Harness rules
- Typed
AgentContextfields (ctx.tool_registry,ctx.llm,ctx.interrupt,ctx.memory,ctx.artifacts,ctx.blackboard) are the public plumbing surface.ctx.stateis user-only scratch — framework code must not write magic keys there. - Hook-owned state lives under
ctx.hook_state[id(self)]. Keying byid(self)is what guarantees two instances of the same hook class don't share state. - Barge-in is enforceable, not advisory. Every
LLM.completecall threadsctx.interrupt.cancel_tokenviastop_event=…so llama-cpp'sstopping_criteriaactually halts generation within one decode step. - Tokenizer-exact token counts.
estimate_tokens(messages, llm)andLLM.count_tokensreplace thechars // 4heuristic when an LLM is available. Required for correct context-window decisions on CJK / Vietnamese / Thai. - Tool-call parsing runs raw-first.
parse_tool_calls_from_contenttries detectors against the raw content before stripping<think>blocks — Qwen3 emits tool calls inside reasoning blocks (see llama.cpp#20837). - Preset parsers are validated at load.
resolve_preset(slug)asserts every name intool_call_parsers=(...)is a registered detector; a typo fails loudly rather than silently disabling detection. - Model-emitted tool-call ids round-trip. Mistral's
[TOOL_CALLS]format carries a 9-char id that the follow-uprole="tool"message must reuse.ToolCallItem.idplumbs this through the parser chain and the agent loop.
Preferred import surfaces
- Agent framework:
from edgevox.agents import LLMAgent, AgentContext, Session, Handoff, ... - Built-in hooks:
from edgevox.agents.hooks_builtin import MemoryInjectionHook, TokenBudgetHook, ... - SLM hardening:
from edgevox.llm.hooks_slm import default_slm_hooks - Memory:
from edgevox.agents.memory import JSONMemoryStore, NotesFile, Compactor, estimate_tokens - Multi-agent:
from edgevox.agents.multiagent import Blackboard, BackgroundAgent, AgentPool - Interrupt:
from edgevox.agents.interrupt import InterruptController, InterruptPolicy, EnergyBargeInWatcher
Avoid reaching into private modules or _agent_harness.py directly.
Coding rules
- Python ≥ 3.10. Use modern syntax (
X | Yunions,match,dict[str, int]). - Format and lint with ruff. Line length 120. Run
ruff formatthenruff check --fix. - No trailing summaries in code comments. Comment the why, not the what.
- Type hints on public functions. Internal helpers may skip them when obvious.
- No prints in library code. Use
rich/textualfor user-facing output,loggingfor diagnostics. - Imports go at the top of the file. Only push an import inside a function when something concrete forces it — circular-import break, optional/heavy dependency behind a capability check, or lazy-load to shave CLI startup latency. Convenience or "it's only used in one place" is not a reason; move it up.
- No new top-level dependencies without reason. Prefer the stdlib. If you must add one, update
pyproject.toml. - Hardware-aware code paths must degrade gracefully — CUDA/Metal/CPU fallbacks, never crash on missing accelerator.
- Never commit model files (
.gguf,.onnx,.bin, weights). They live undermodels/which is gitignored.
Audio / model conventions
- Sample rate: 16 kHz mono int16 for capture and STT input.
- TTS output: resample to device rate via
sounddevice. - VAD frame size: 32 ms (512 samples @ 16 kHz).
- Latency budget: STT < 0.5 s, LLM first token < 0.4 s, TTS first chunk < 0.1 s on RTX 3080.
- Treat the streaming pipeline as the contract: do not introduce blocking calls that hold the event loop.
Tooling
- uv for package management. Use
uv pip install/uv venvinstead of barepip/python -m venv. See https://docs.astral.sh/uv/. - pre-commit runs ruff (lint + format), gitleaks, and standard hygiene hooks. Install once with
pre-commit install. - gitleaks scans for secrets on every commit. If a finding is a false positive, allowlist it in
.gitleaks.tomlwith a comment explaining why — do not delete the finding. - pytest for tests (
pytest, asyncio mode = auto). Tests live undertests/.
Workflow expectations
- Read files before editing them. Don't propose changes to code you haven't looked at.
- Run
ruff format+ruff check --fixbefore declaring a task done. - Don't bypass hooks (
--no-verify) — fix the underlying issue. - Don't add yourself or Claude as a commit author / co-author. Specifically: no
Co-Authored-By: Claude …trailer, no🤖 Generated with Claude Codefooter — commit messages end after the body, nothing else. - Prefer editing existing files over creating new ones; don't create README/docs files unless asked.
- If a change touches the streaming pipeline, manually note the latency impact in the PR description.
- Prefer Mermaid diagrams over ASCII art in any markdown doc (
docs/,website/,README.md, PR descriptions). GitHub and VitePress rendermermaidfenced blocks natively; hand-drawn box-and-line ASCII is harder to read, impossible to edit cleanly, and breaks under monospace-font changes. The only acceptable ASCII diagrams are directory trees (├──/└──) — those stay as-is.
Writing docs (docs/, README.md, website/)
- Always quote Mermaid labels that contain
(,),@,:,,,&,<br/>, or punctuation other than letters, digits,_, and-. Use["foo(args)"]for nodes and-->|"@tool call"|for edge labels. Unquoted parentheses inside[...]or|...|blow up the flowchart parser silently — the page still loads but the diagram disappears. After editing any mermaid block, open the page locally (npm run docs:dev) and confirm zeroParse error on line Nentries in the browser console. The breakage is invisible in source review. - Test docs in the dev server before declaring done.
npm run docs:dev(VitePress on:5173) catches mermaid parse errors, broken links, and dead anchors that the markdown source hides. For visual changes (new diagrams, layout, hero, sidebar), capture a Playwright screenshot of the affected page so the reviewer can spot regressions without spinning up the server. - Cross-page links are root-absolute, no extension —
/documentation/hooks, never./hooks.mdorhooks(the latter resolves wrong undercleanUrls: true). Mermaid node-click targets follow the same convention. - Sidebar registration is mandatory. A new page under
docs/documentation/is invisible until it's added to thesidebarblock indocs/.vitepress/config.ts. Update both at the same time.
What NOT to do
- Don't add cloud API calls or telemetry. EdgeVox is offline-first.
- Don't introduce GPL-licensed dependencies (project is MIT).
- License verification is mandatory before adding any new dependency. Check the package's actual license (PyPI/GitHub/documentation — not just one guess) and record the result inline in
pyproject.tomlnext to the dep (e.g.# MIT/# LGPL-3 — dynamic-linked, compatible). Copyleft licenses to refuse: GPL-2/3, AGPL, SSPL. LGPL is acceptable only for pure dynamic-link libraries (PySide6, Qt Multimedia, rlottie). CC-BY-SA is acceptable for assets (SVG piece sets etc.), never for source-level dependencies. When in doubt, pick a permissive alternative or flag for discussion. - Don't commit
dist/,build/,*.egg-info/, model weights, or recordings. - Don't add speculative abstractions or "future-proofing" beyond what the task requires.