CommunityWriting & Editinggithub.com

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

Works withClaude CodeCodex CLI~Cursor
npx add-skill bettyguo/agent_eval

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

Related Skills