A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Autonomous agent for Kubernetes incident detection, diagnosis, and mitigation using LLMs and modular workflows. Integrat
sre-agent is an autonomous multi-agent system designed to automate Incident Response in Kubernetes environments. By leveraging Large Language Models (LLMs) and a Divide & Conquer strategy, it significantly reduces the Mean Time to Resolution (MTTR) for complex microservice faults.
This system integrates with AIOpsLab for realistic fault injection and uses a custom Model Context Protocol (MCP) server to interface with observability tools (Prometheus, Jaeger, Kubernetes API) securely and efficiently.
SRE-agent/
├── sre-agent/ # 🧠 Main Multi-Agent System implementation (LangGraph)
├── MCP-server/ # 🔌 Custom Model Context Protocol server for observability tools
├── notebooks/ # 📓 Jupyter notebooks for analysis and development
├── Results/ # 📊 Experiment outputs, logs, and reports
├── archive/ # 📦 Archive of previous project iterations
└── assets/ # 🖼️ Diagrams and static assets
The agent implements a parallel multi-agent workflow to diagnose faults efficiently:

🔍 Triage Agent (Hybrid)
📋 Planner Agent (Topology-Aware)
🔬 RCA Workers (Parallel Execution)
👔 Supervisor Agent
The repository includes a robust pipeline for automated experimentation and benchmarking.
automated_experiment.py orchestrates end-to-end batch runs: Cluster Setup → Fault Injection → Agent Execution → Evaluation → Cleanup.The system is evaluated on:
# Clone the repository
git clone https://github.com/martinimarcello00/SRE-agent.git
cd SRE-agent
# Install dependencies
poetry install
# Configure environment
cp .env.example .env
# Edit .env and add your API keys:
# nano .env
You can run the agent interactively via LangGraph Studio or as a script.
Option A: LangGraph Studio (Recommended for Dev)
cd sre-agent
poetry run langgraph dev
Option B: Python Script
# Run a specific experiment scenario
python sre-agent/sre-agent.py
To execute a batch of experiments defined in your configuration:
# Ensure your .env file is configured
# python automated_experiment.py
This script will sequentially provision the cluster, inject faults, run the agent, and save the results in Results/.
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
MCP server integration for DaVinci Resolve Studio
A Jetbrains IDE IntelliJ plugin aimed to provide coding agents the ability to leverage intelliJ's indexing of the codeba