azrabano23/evalkit
Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
npx skills add azrabano23/evalkitEvaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
交互式文献引用向导 —— Claude Code Skill
Triage and resolve CentOS issues using RHEL-compatible tooling, SELinux-aware practices, and firewalld.
A rigorous scientific validation skill for Claude Code that evaluates claims against high-impact literature using a weighted scoring matrix.
Portable Claude Code setup: skills, agents, commands, and a safe installer
Create robust, accessible web forms with best practices for HTML structure, CSS styling, JavaScript interactivity, form validation, and server-side processing. Use when asked to "create a form", "build a web form", "add a contact form", "make a signup form", or when building any HTML form with data handling. Covers PHP and Python backends, MySQL database integration, REST APIs, XML data exchange, accessibility (ARIA), and progressive web apps.
Score your repo's AI-agent harness — CLAUDE.md, AGENTS.md, skills, rules — then remediate the findings. Install: npx skills add paniolo-ai/scan