google/google-agents-cli-eval
This skill should be used when the user wants to "run an evaluation", "evaluate my ADK agent", "write an eval dataset", "analyze eval failures", "compare eval results", "optimize agent", or needs guidance on the Agent Platform eval methodology and the Quality Flywheel. Covers eval metrics, dataset schema, LLM-as-judge scoring, and common failure causes. Do NOT use for API code patterns (use google-agents-cli-adk-code), deployment (use google-agents-cli-deploy), or project scaffolding (use google-agents-cli-scaffold).