ebarti/skills

📚 Agent skills distilled from technical books — AI Engineering, Context Engineering, Designing Data-Intensive Applications, and more. Agent-agnostic, plain Markdown. Give your AI agent a bookshelf.

지원 대상~Claude Code~Codex CLI~Cursor

npx skills add ebarti/skills

문서

AI Evaluation

Knowledge from "AI Engineering" by Chip Huyen (Chapters 3-4). Practical methods for evaluating foundation models and AI systems built on top of them.

Quick Start

Check guidelines.md to find which files to load for your task
Load only relevant files (each topic has knowledge.md, rules.md, examples.md)
Apply guidance to your work

References

Category	Purpose
`language-modeling-metrics`	Entropy, cross-entropy, perplexity, bits-per-character
`exact-evaluation`	Functional correctness, exact match, lexical/semantic similarity, embeddings
`ai-as-judge`	When to use AI judges, how to prompt them, limitations and biases
`comparative-evaluation`	Ranking models with pairwise comparisons, Bradley-Terry, scalability challenges
`evaluation-criteria`	Domain capability, generation (factual, safety), instruction-following, cost/latency
`model-selection`	Selection workflow, open source vs API, navigating public benchmarks
`evaluation-pipeline`	End-to-end pipeline design, scoring rubrics, evaluation methods

Workflows

Task	Workflow
Choose a model (build vs buy, OS vs API)	`workflows/select-model.md`
Design an end-to-end evaluation pipeline	`workflows/design-eval-pipeline.md`

Guidelines

See guidelines.md for task-based file selection.

ebarti/skills

Ask in your favorite AI

문서

AI Evaluation

Quick Start

Contents

References

Workflows

Guidelines

관련 스킬

steipete/sag

steipete/oracle

steipete/peekaboo

obra/brainstorming

affaan-m/prisma-patterns

affaan-m/django-celery