wshobson/llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

対応~Claude Code~Codex CLI~Cursor

npx skills add https://github.com/wshobson/agents/tree/main/skills/llm-evaluation

オリジナルを見る→すべてのスキルを見る

Ask in your favorite AI

Open a new chat with this agent skill pre-loaded.

ChatGPT Claude Gemini Grok Perplexity DeepSeek

ドキュメント

wshobson/llm-evaluation

Individual skills in this repo

This repo contains 20 individual skills — each has its own dedicated page.

wshobson/accessibility-compliance

Implement WCAG 2.2 compliant interfaces with mobile accessibility, inclusive design patterns, and assistive technology support. Use when auditing accessibility, implementing ARIA patterns, building for screen readers, or ensuring inclusive user experiences.

wshobson/airflow-dag-patterns

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.

wshobson/angular-migration

Migrate from AngularJS to Angular using hybrid mode, incremental component rewriting, and dependency injection updates. Use when upgrading AngularJS applications, planning framework migrations, or modernizing legacy Angular code.

wshobson/anti-reversing-techniques

Understand anti-reversing, obfuscation, and protection techniques encountered during software analysis. Use this skill when analyzing malware evasion techniques, when implementing anti-debugging protections for CTF challenges, when reverse engineering packed binaries, or when building security research tools that need to detect virtualized environments.

wshobson/api-design-principles

Master REST and GraphQL API design principles to build intuitive, scalable, and maintainable APIs that delight developers. Use when designing new APIs, reviewing API specifications, or establishing API design standards.

wshobson/architecture-decision-records

Write and maintain Architecture Decision Records (ADRs) following best practices for technical decision documentation. Use when documenting significant technical decisions, reviewing past architectural choices, or establishing decision processes.

wshobson/architecture-patterns

Implement proven backend architecture patterns including Clean Architecture, Hexagonal Architecture, and Domain-Driven Design. Use this skill when designing clean architecture for a new microservice, when refactoring a monolith to use bounded contexts, when implementing hexagonal or onion architecture patterns, or when debugging dependency cycles between application layers.

wshobson/async-python-patterns

Master Python asyncio, concurrent programming, and async/await patterns for high-performance applications. Use when building async APIs, concurrent systems, or I/O-bound applications requiring non-blocking operations.

wshobson/attack-tree-construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

wshobson/auth-implementation-patterns

Master authentication and authorization patterns including JWT, OAuth2, session management, and RBAC to build secure, scalable access control systems. Use when implementing auth systems, securing APIs, or debugging security issues.

wshobson/backtesting-frameworks

Build robust backtesting systems for trading strategies with proper handling of look-ahead bias, survivorship bias, and transaction costs. Use when developing trading algorithms, validating strategies, or building backtesting infrastructure.

wshobson/bash-defensive-patterns

Master defensive Bash programming techniques for production-grade scripts. Use when writing robust shell scripts, CI/CD pipelines, or system utilities requiring fault tolerance and safety.

wshobson/bats-testing-patterns

Master Bash Automated Testing System (Bats) for comprehensive shell script testing. Use when writing tests for shell scripts, CI/CD pipelines, or requiring test-driven development of shell utilities.

wshobson/bazel-build-optimization

Optimize Bazel builds for large-scale monorepos. Use when configuring Bazel, implementing remote execution, or optimizing build performance for enterprise codebases.

wshobson/billing-automation

Build automated billing systems for recurring payments, invoicing, subscription lifecycle, and dunning management. Use when implementing subscription billing, automating invoicing, or managing recurring payment systems.

wshobson/binary-analysis-patterns

Master binary analysis patterns including disassembly, decompilation, control flow analysis, and code pattern recognition. Use when analyzing executables, understanding compiled code, or performing static analysis on binaries.

wshobson/block-no-verify-hook

Configure a PreToolUse hook to prevent AI agents from skipping git pre-commit hooks with --no-verify and other bypass flags. Use when setting up Claude Code projects that enforce commit quality gates.

wshobson/brand-landingpage

Brand-first landing page designer — runs a brand-identity interview (colors, typography, shape language), then generates and iterates on a polished landing page via Stitch with deployment-ready HTML. Use when the user asks to create, design, or build a landing page, homepage, or marketing page and has no established visual direction. Skip when they have a design mockup, need a dashboard or app UI, are working at component level, building a multi-page app, or restyling with known design tokens — use frontend-design instead.

wshobson/changelog-automation

Automate changelog generation from commits, PRs, and releases following Keep a Changelog format. Use when setting up release workflows, generating release notes, or standardizing commit conventions.

wshobson/code-review-excellence

Master effective code review practices to provide constructive feedback, catch bugs early, and foster knowledge sharing while maintaining team morale. Use when reviewing pull requests, establishing review standards, or mentoring developers.

wshobson/llm-evaluation

Ask in your favorite AI

ドキュメント

wshobson/llm-evaluation

Individual skills in this repo

wshobson/accessibility-compliance

wshobson/airflow-dag-patterns

wshobson/angular-migration

wshobson/anti-reversing-techniques

wshobson/api-design-principles

wshobson/architecture-decision-records

wshobson/architecture-patterns

wshobson/async-python-patterns

wshobson/attack-tree-construction

wshobson/auth-implementation-patterns

wshobson/backtesting-frameworks

wshobson/bash-defensive-patterns

wshobson/bats-testing-patterns

wshobson/bazel-build-optimization

wshobson/billing-automation

wshobson/binary-analysis-patterns

wshobson/block-no-verify-hook

wshobson/brand-landingpage

wshobson/changelog-automation

wshobson/code-review-excellence

関連スキル

Calliopeperpendicular906/openclaw-skills

twostraws/swiftui-pro

iiRoshdy/skills

Antune-L/skillzer

Rocketbullet/skills-expand-your-team-with-copilot

packit/ai-workflows