CommunityArt & Designgithub.com

EurecaMoment/BenchClaw

BenchClaw is a Codex/OpenCode skill workflow for benchmark construction, evaluation, and maintenance. It standardizes the full pipeline—from idea drafting and data generation to evaluation, reporting, failure diagnosis, and skill refinement—so agents can build reproducible, auditable benchmarks with clear quality gates, lineage, and rollback.

Works with~Claude CodeCodex CLI~CursorOpenCode
npx add-skill EurecaMoment/BenchClaw

EurecaMoment/BenchClaw

BenchClaw is a Codex/OpenCode skill workflow for benchmark construction, evaluation, and maintenance. It standardizes the full pipeline—from idea drafting and data generation to evaluation, reporting, failure diagnosis, and skill refinement—so agents can build reproducible, auditable benchmarks with clear quality gates, lineage, and rollback.

Related Skills