Community코딩 & 개발github.com

derberg/eval-bench

Benchmark Claude Code plugins/skills/agents/MCPs by A/B comparing versions with LLM-judged evaluation prompts

지원 대상Claude Code~Codex CLI~Cursor
npx skills add derberg/eval-bench

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

문서

derberg/eval-bench

Benchmark Claude Code plugins/skills/agents/MCPs by A/B comparing versions with LLM-judged evaluation prompts

관련 스킬