trajectoryRL/trajrl-bench
TrajRL-Bench: AI agent skills benchmark. SSH sandbox with mock services, LLM judge scoring, split-half delta evaluation. Leaderboard at trajrl.com/bench
TrajRL-Bench: AI agent skills benchmark. SSH sandbox with mock services, LLM judge scoring, split-half delta evaluation. Leaderboard at trajrl.com/bench
npx add-skill trajectoryRL/trajrl-benchTrajRL-Bench: AI agent skills benchmark. SSH sandbox with mock services, LLM judge scoring, split-half delta evaluation. Leaderboard at trajrl.com/bench
A specification framework for agentic development. Agents build from complete specs - not guesses.
Mindful skills and agents for hybrid human/AI intelligence
The capability harness for AI agents. Skills over SDKs.
Standalone Codex skill export: define-product-strategy
Reusable Agent Skill for client delivery guardrails in Codex and Claude Code
MCP server that gives any LLM its own computer — managed Docker workspaces with live browser, terminal, code execution, document skills, and autonomous sub-agents. Self-hosted and open-source.