EurecaMoment/BenchClaw
BenchClaw is a Codex/OpenCode skill workflow for benchmark construction, evaluation, and maintenance. It standardizes the full pipeline—from idea drafting and data generation to evaluation, reporting, failure diagnosis, and skill refinement—so agents can build reproducible, auditable benchmarks with clear quality gates, lineage, and rollback.