Community研究與資料分析github.com

jeremylongshore/j-rig-skill-binary-eval

Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.

相容平台Claude Code~Codex CLI~Cursor
npx add-skill jeremylongshore/j-rig-skill-binary-eval

jeremylongshore/j-rig-skill-binary-eval

Binary-criteria evaluation harness for Claude skills with planned extension to plugins, agents, and MCP servers. Score every change yes/no across 7 layers — package integrity, trigger quality, functional quality, regression protection, baseline value, model variance, rollout safety. Never gradients.

相關技能