kaushikpaul90/ai-driven-incident-management

An AI-powered system for autonomous incident detection, diagnosis, and remediation in high-performance computing (HPC) environments. Leverages agentic workflows, RAG-based knowledge retrieval, and ML on log data to reduce downtime and enhance reliability. Features runbooks for HPC issues like disk failures, memory errors, and network timeouts.

Compatible avecClaude CodeCodex CLI~CursorAntigravityGemini CLI
npx add-skill kaushikpaul90/ai-driven-incident-management

kaushikpaul90/ai-driven-incident-management

An AI-powered system for autonomous incident detection, diagnosis, and remediation in high-performance computing (HPC) environments. Leverages agentic workflows, RAG-based knowledge retrieval, and ML on log data to reduce downtime and enhance reliability. Features runbooks for HPC issues like disk failures, memory errors, and network timeouts.

Source: https://github.com/kaushikpaul90/ai-driven-incident-management

Discovered from GitHub repositories pushed in the last 24 hours for agent skills, Claude/Codex/Gemini workflows, MCP tooling, and adjacent AI-agent automation.

Skills associés