CommunityRedacción y edicióngithub.com

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

Compatible conClaude CodeCodex CLI~Cursor
npx add-skill bettyguo/agent_eval

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

Skills relacionados