CommunityProgramação e Desenvolvimentogithub.com

azrabano23/evalkit

Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.

Funciona com~Claude Code~Codex CLI~Cursor

npx skills add azrabano23/evalkit

Ver original→Navegar por todas as habilidades

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

ChatGPT Claude Gemini Grok Perplexity DeepSeek

Documentação

azrabano23/evalkit

Habilidades Relacionadas

bitflight-devops/hallucination-detector

Zero-dependency Claude Code plugin that catches speculation, invented causality, and fake citations before they pollute your context. Install in one command, works offline, no API keys needed.

community

xiaoran8210/case-report-ppt

Sanitized Codex skill for medical case report web decks

community

adhi-jp/agent-skills

Agent skills and eval prompts for vibe-coding plans, review loops, commit messages, prose, and Minecraft modding.

community

grammy-jiang/subagent-factory

Local Claude Code factory that turns source documents into validated, provenance-tracked subagent packages.

community

ivaavimusic/x402-Layer-Clawhub-Skill

An openclaw skill to use x402 Singularity layer and enable your agents to consume and create x402 enabled endpoints.

community

inematds/mdd

MDD — Mestre de Direção Dinâmica: skill do Claude Code que transforma qualquer assunto em pacote completo de direção de vídeo (storyboard + prompts)

community

← More Programação e Desenvolvimento skills