CommunityEscrita e Ediçãogithub.com

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

Funciona comClaude CodeCodex CLI~Cursor
npx skills add bettyguo/agent_eval

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Documentação

bettyguo/agent_eval

An open-source benchmark for Claude Code skill bundles (.claude/skills/) and CLAUDE.md configs. Pass@k + cost + reliability, content-addressed leaderboard, runs on Anthropic / OpenAI / Google.

Habilidades Relacionadas