Community라이팅 & 에디팅github.com

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

지원 대상Claude CodeCodex CLI~Cursor
npx skills add bettyguo/agent_eval

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

문서

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

관련 스킬