azrabano23/evalkit
Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
Evaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
npx add-skill azrabano23/evalkitEvaluate LLMs the right way — confidence intervals, unbiased pass@k, significance testing, bias-controlled LLM-as-judge, contamination checks. A drop-in agent skill with a numpy stats core validated against ground truth.
AI agent skills for Calimero Network development — Rust SDK, JS/Python clients, registry publishing, Desktop SSO, and CLI tooling
TeamCity from your terminal – or your AI's. Builds, logs, agents, agent terminals, queues.
Semantic memory for AI agents: capture the tacit engineering knowledge that survives turnover, and recall it through MCP. Built in Rust on Postgres and pgvector.
🤖 Build intelligent agents with pydantic-ai-backend, a simple framework for creating, managing, and deploying AI-powered applications efficiently.
Automate Conveyor tasks via Rube MCP (Composio). Always search tools first for current schemas.
ACE Step 1.5 XL music generation with dynamics-preserving mastering. Part of the AEON Media Production family.