derberg/eval-bench

Benchmark Claude Code plugins/skills/agents/MCPs by A/B comparing versions with LLM-judged evaluation prompts

Funciona comClaude Code~Codex CLI~Cursor
npx skills add derberg/eval-bench

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Documentação

derberg/eval-bench

Benchmark Claude Code plugins/skills/agents/MCPs by A/B comparing versions with LLM-judged evaluation prompts

Habilidades Relacionadas