Cloud Operations

AKS AI Ops - Self-Healing Kubernetes with AI Agents

Hands-on lab notes from LAB517-R1: AKS AI Ops.

Session: LAB517-R1
Date: Thursday, Nov 20, 2025
Time: 4:30 PM PST - 5:45 PM PST
Location: Moscone West, Level 3, Room 3001

Coming Soon

This article will be published during/after Microsoft Ignite 2025 (Nov 18-21). Full lab walkthrough of AKS AI Ops coming soon.


Lab Overview

What We're Building: Build confidence in managing AKS at scale with next-gen ops tools. In this hands-on lab:

  • Simulate a production service hit by traffic spikes
  • Discover how AI-driven alerts surface hidden bottlenecks
  • Deploy agents that self-heal nodes automatically
  • Use open-source tools and the aks-mcp server
  • Automate cluster scaling, patch management, and real-time troubleshooting
  • Let AI orchestrate Kubernetes and Azure resources with natural-language commands

Technologies:

  • Azure Kubernetes Service (AKS)
  • AI-driven monitoring and alerting
  • Self-healing agent patterns
  • aks-mcp server
  • Natural-language cluster orchestration
  • Pre-built MCP integrations

Key Learning Goals

  1. AI-Driven Ops - How do AI agents identify and resolve AKS issues?
  2. Self-Healing - What patterns enable automated node recovery?
  3. aks-mcp Server - How does MCP enable natural-language cluster management?
  4. Scale Management - How do agents handle traffic spikes and scaling?
  5. Production Readiness - What operational patterns work at enterprise scale?

Stay Tuned

Full lab walkthrough, code samples, and operational playbooks coming soon.

Session: LAB517-R1 | Nov 21, 2025 | Moscone West, Level 3, Room 3001

Previous
AI Fleet Operations (Foundry)
Built: Dec 16, 2025, 05:16 AM PST
b8041d3