Community程式設計與開發github.com

derberg/eval-bench

Benchmark Claude Code plugins/skills/agents/MCPs by A/B comparing versions with LLM-judged evaluation prompts

相容平台Claude Code~Codex CLI~Cursor
npx add-skill derberg/eval-bench

derberg/eval-bench

Benchmark Claude Code plugins/skills/agents/MCPs by A/B comparing versions with LLM-judged evaluation prompts

相關技能