iamalimaybe/llm-evaluation-registry

LLM Evaluation Registry is a backend-led quality layer for AI workflows. It tracks prompts, models, reusable test cases, evaluation runs, validation results, regressions, and human review notes so AI behavior can be measured instead of guessed.

Funktioniert mit~Claude Code~Codex CLI~Cursor
npx skills add iamalimaybe/llm-evaluation-registry

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

Dokumentation

iamalimaybe/llm-evaluation-registry

LLM Evaluation Registry is a backend-led quality layer for AI workflows. It tracks prompts, models, reusable test cases, evaluation runs, validation results, regressions, and human review notes so AI behavior can be measured instead of guessed.

Verwandte Skills

mapbox/mapbox-search-integration

Complete workflow for implementing Mapbox search in applications - from discovery questions to production-ready integration with best practices

community

christopherlouet/claude-base

Opinionated Claude Code foundation — Explore → TDD → Audit workflow, auto-detected stack presets (nextjs, fastapi, astro, ...), curl | bash install. MIT.

community

open-agent-craft/awesome-agent-skills

Awesome Agent Skills A curated list of high-quality Agent Skills, AI coding skills, tool-use recipes, MCP workflows, and reusable agent instructions. > Find, compare, and reuse practical skills for Codex, Claude Code, Cursor, GitHub Copilot, and other AI agents.

community

ncih/skills-and-tools

Claude Code skills, scripts, and tools for an AI-powered development workflow

community

arf-io/context-brief

Token-budgeted project briefs for AI agent tools

community

okx/okx-security

Use this skill for security scanning: check transaction safety, is this transaction safe, pre-execution check, security scan, token risk scanning, honeypot detection, DApp/URL phishing detection, message signature safety, malicious transaction detection, approval safety checks, token approval management. Triggers: 'is this token safe', 'check token security', 'honeypot check', 'scan this tx', 'scan this swap tx', 'tx risk check', 'is this URL a scam', 'check if this dapp is safe', 'phishing site check', 'is this signature safe', 'check this signing request', 'check my approvals', 'show risky approvals', 'revoke approval', 'check if this approve is safe', token authorization, ERC20 allowance, Permit2. Covers token-scan, dapp-scan, tx-scan (EVM+Solana pre-execution), sig-scan (EIP-712/personal_sign), approvals (ERC-20/Permit2). Chinese: 安全扫描, 代币安全, 蜜罐检测, 貔貅盘, 钓鱼网站, 交易安全, 签名安全, 代币风险, 授权管理, 授权查询, 风险授权, 代币授权. Do NOT use for wallet balance/send/history — use okx-agentic-wallet.

community