Community연구 & 데이터 분석github.com

jeremylongshore/j-rig-skill-binary-eval

Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.

지원 대상Claude Code~Codex CLI~Cursor
npx skills add jeremylongshore/j-rig-skill-binary-eval

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

문서

jeremylongshore/j-rig-skill-binary-eval

Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.

관련 스킬