azrabano23/evalkit
Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
npx skills add azrabano23/evalkitEvaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
Agent skill repository for AI-assisted workflows.
Agent skill repository for RickYangzz/corex-codex-skills.
An agent skill that analyzes agent skill repos and generates structured learning documentation
Canonical Agent Skills library for Claude Code, Cursor, Codex, and other agents.
个人 Agent Skills 仓库,支持 Claude Code、OpenCode、Codex 等 Agent。skill-sync 这个 skill 支持本地与仓库的双向同步
🧭 Explore smart navigation with Y-Nav, your privacy-focused, AI-powered dashboard for seamless multi-device syncing.