CommunityCodierung & Entwicklunggithub.com

azrabano23/evalkit

Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.

Funktioniert mit~Claude Code~Codex CLI~Cursor

npx skills add azrabano23/evalkit

Original anzeigen→Alle Skills durchsuchen

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

ChatGPT Claude Gemini Grok Perplexity DeepSeek

Dokumentation

azrabano23/evalkit

Verwandte Skills

Tubo2333/zotero-auto-cite

交互式文献引用向导 —— Claude Code Skill

community

github/centos-linux-triage

Triage and resolve CentOS issues using RHEL-compatible tooling, SELinux-aware practices, and firewalld.

community

tunabirgun/scival-plugin

A rigorous scientific validation skill for Claude Code that evaluates claims against high-impact literature using a weighted scoring matrix.

community

dtsong/my-claude-setup

Portable Claude Code setup: skills, agents, commands, and a safe installer

community

github/create-web-form

Create robust, accessible web forms with best practices for HTML structure, CSS styling, JavaScript interactivity, form validation, and server-side processing. Use when asked to "create a form", "build a web form", "add a contact form", "make a signup form", or when building any HTML form with data handling. Covers PHP and Python backends, MySQL database integration, REST APIs, XML data exchange, accessibility (ARIA), and progressive web apps.

community

paniolo-ai/scan

Score your repo's AI-agent harness — CLAUDE.md, AGENTS.md, skills, rules — then remediate the findings. Install: npx skills add paniolo-ai/scan

community

← More Codierung & Entwicklung skills