azrabano23/evalkit

Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.

지원 대상~Claude Code~Codex CLI~Cursor

npx skills add azrabano23/evalkit

원본 보기→모든 스킬 둘러보기

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

ChatGPT Claude Gemini Grok Perplexity DeepSeek

문서

azrabano23/evalkit

관련 스킬

s-kisaragi/claude-skills

Agent skill repository for AI-assisted workflows.

community

RickYangzz/corex-codex-skills

Agent skill repository for RickYangzz/corex-codex-skills.

community

jiangxidong/agent-skill-skill-learner

An agent skill that analyzes agent skill repos and generates structured learning documentation

community

lgtm-hq/ai-skills

Canonical Agent Skills library for Claude Code, Cursor, Codex, and other agents.

community

CherryYang05/myskills

个人 Agent Skills 仓库，支持 Claude Code、OpenCode、Codex 等 Agent。skill-sync 这个 skill 支持本地与仓库的双向同步

community

veeexx-1/Y-Nav

🧭 Explore smart navigation with Y-Nav, your privacy-focused, AI-powered dashboard for seamless multi-device syncing.

community

← More 코딩 & 개발 skills