Multi-Agent AI Systems

Agent-Framework Unified Platform — LAB513 Series

Microsoft's LAB513 was the session where the unified agent-framework stopped being a conference announcement and became something you could type into a terminal. The hands-on lab walked participants through building A2A-compatible agents using the merged Semantic Kernel and AutoGen platform, integrating MCP tools, and deploying multi-agent orchestrations using the Magentic-One pattern. The lab also demonstrated GitHub Copilot as a software engineering agent — not as a code completion tool, but as an autonomous agent that writes, tests, and submits code.

Sessions: LAB513 + LAB513-R1/R2/R3 Dates: Various times Nov 19-21, 2025 Location: Moscone West


Why this lab matters more than the keynote demos

Keynotes show what is possible. Labs show what is practical. The difference is the distance between a polished demo running on a presenter's machine and code that a room full of developers must get working in 75 minutes.

LAB513 tested the unified agent-framework against that reality. Could developers who had never used the merged platform build working multi-agent systems with A2A communication and MCP tool integration in a single session? The answer was mostly yes, with caveats that reveal where the framework is production-ready and where it is still sharp-edged.

The lab ran four repeat sessions across three days, which itself tells you something about demand. The first session was standing-room-only. By the third repeat, word had spread that this was the most practically useful agent lab at Ignite, and seats were gone within minutes of the doors opening.


The unified agent-framework: What the merger actually produced

The history in 30 seconds: Semantic Kernel was Microsoft's production-grade framework for single-agent orchestration — model routing, tool calling, prompt management, context handling. AutoGen was the research-born framework for multi-agent coordination — conversation patterns, group chat, agent-to-agent communication. Developers building multi-agent systems had to choose one and bolt on the other's capabilities manually, or maintain two sets of abstractions.

What agent-framework delivers: A single framework where building an agent, giving it tools, connecting it to other agents, and deploying the whole system uses one consistent API surface.

The practical difference in code:

Before (Semantic Kernel + AutoGen bolted together):

# Semantic Kernel for individual agent capabilities
from semantic_kernel import Kernel
kernel = Kernel()
kernel.add_plugin(MyToolPlugin())
agent_a = kernel.create_agent(instructions="...")

# AutoGen for multi-agent coordination (different API entirely)
from autogen import GroupChat, GroupChatManager
group_chat = GroupChat(
    agents=[wrap_sk_agent(agent_a), autogen_agent_b],
    messages=[]
)
manager = GroupChatManager(groupchat=group_chat)

After (unified agent-framework):

from agent_framework import Agent, GroupChat, MCPTool

# Single API for agent creation, tools, and coordination
agent_a = Agent(
    name="researcher",
    instructions="Research the topic thoroughly",
    tools=[MCPTool("web_search"), MCPTool("document_store")]
)

agent_b = Agent(
    name="writer",
    instructions="Write clear, concise content",
    tools=[MCPTool("editor")]
)

# Multi-agent coordination is native, not bolted on
chat = GroupChat(
    agents=[agent_a, agent_b],
    strategy="magentic_one"
)

result = await chat.run("Write an analysis of Q3 results")

The reduction in cognitive overhead is significant. One set of concepts, one import hierarchy, one debugging model. This matters more than it sounds — framework sprawl is the number one reason agent projects stall after the prototype phase.

What the lab did not show: The migration path from existing Semantic Kernel or AutoGen codebases. There was a brief mention that migration tooling exists, but participants were building from scratch with agent-framework rather than porting existing code. For enterprises with production Semantic Kernel deployments, the migration question remains the elephant in the room.


Magentic-One: The orchestration pattern the lab builds on

The lab centred its multi-agent exercises around the Magentic-One pattern, which is the flagship orchestration strategy from the AutoGen lineage, now native in agent-framework.

What Magentic-One is: An orchestration pattern where a lead agent dynamically selects which specialised agent should handle the next step of a task. Unlike a static coordinator that routes based on rules, the lead agent uses the current conversation state and each agent's declared capabilities to make routing decisions.

How it works step by step:

  1. Task arrives at the lead agent
  2. Lead agent analyses the task and current state
  3. Lead agent selects the best agent for the next step (based on capability descriptions, not hardcoded routing)
  4. Selected agent executes its step, updates shared state
  5. Lead agent evaluates progress and determines next step
  6. Repeat until task is complete or escalation required

Why this is different from a static coordinator:

A static coordinator uses if-then routing: "If the task is about venues, route to venue agent." This breaks when tasks do not fit neatly into predefined categories.

Magentic-One uses model-driven routing: the lead agent reads the task, understands the current state, and chooses the most capable agent dynamically. This handles ambiguous tasks, multi-step workflows where the optimal sequence depends on intermediate results, and novel task types that the system designer did not anticipate.

The trade-off: Model-driven routing is more flexible but less predictable. You cannot look at a routing table and know which agent will handle a given task. Debugging requires tracing the lead agent's reasoning, which is only possible with good observability — hence the OpenTelemetry integration being so important.

The cost trade-off nobody mentioned: Every routing decision is a model call. The lead agent must evaluate the current state, read capability descriptions, and select the next agent. In a ten-step workflow, that is ten additional model invocations just for routing. At GPT-4o pricing, this is measurable. At GPT-4 pricing, it is significant. The lab conveniently used pre-provisioned Azure credits and never surfaced this cost to participants.

Lab exercise: Participants built a three-agent system using Magentic-One — a research agent, an analysis agent, and a writing agent. The lead agent determined the optimal sequence based on the specific query, sometimes sending research before analysis, sometimes requesting analysis of existing data without research.


MCP tools in practice: What the lab revealed

The lab exercises went beyond the "here is what MCP is" explanations from keynote sessions and had participants actually wiring up MCP tool servers to their agents.

What participants built:

Exercise 1: Local MCP server integration

Each lab machine ran a local MCP server exposing a simulated enterprise API (inventory lookup, customer records). Agents connected to this server using the agent-framework's MCP client.

# Connecting an agent to an MCP tool server
from agent_framework import Agent, MCPToolServer

# MCP server exposes tools with self-describing schemas
inventory_server = MCPToolServer(
    endpoint="http://localhost:8080/mcp",
    auth={"type": "bearer", "token": lab_token}
)

# Agent discovers available tools automatically
agent = Agent(
    name="inventory_checker",
    instructions="Check product availability and pricing",
    mcp_servers=[inventory_server]
)

# Agent can now call any tool the MCP server exposes
# without the developer defining tool schemas manually

The key insight from this exercise: Agents discover MCP tools at runtime. The developer does not need to define tool schemas in the agent code. The MCP server describes its own capabilities, and the agent framework presents them to the model as callable functions. This means adding a new tool to the MCP server immediately makes it available to all connected agents without redeployment.

The less discussed implication: Runtime discovery means runtime surprises. If someone adds a dangerous tool to an MCP server, every connected agent gains access to it immediately. The lab did not cover MCP tool approval workflows or capability gating — both of which are essential for production deployments.

Exercise 2: Chaining MCP tools across agents

The more advanced exercise had multiple agents sharing access to the same MCP servers but using different tools from each server based on their role. The research agent used the "search" and "retrieve" tools. The analysis agent used "query" and "aggregate" tools. Same server, different capability subsets.

# Multiple agents, same MCP server, different capability subsets
research_agent = Agent(
    name="researcher",
    instructions="Use search and retrieve tools only",
    mcp_servers=[data_server],
    allowed_tools=["search", "retrieve"]
)

analysis_agent = Agent(
    name="analyst",
    instructions="Use query and aggregate tools for analysis",
    mcp_servers=[data_server],
    allowed_tools=["query", "aggregate"]
)

What this demonstrated about MCP security: MCP servers can expose tools selectively based on the calling agent's identity. Not every agent sees every tool. This is how you implement least-privilege access in a multi-agent system — the MCP server acts as the policy enforcement point.

What it did not demonstrate: Audit logging of tool invocations, rate limiting per agent, or what happens when an MCP server goes down mid-workflow. These are production concerns the lab had no time to address.


GitHub Copilot as a software engineering agent

The most forward-looking part of the lab demonstrated GitHub Copilot not as a code completion assistant but as an autonomous software engineering (SWE) agent within a multi-agent system.

What a SWE agent does differently from code completion:

Code completion (what most people know Copilot as):

  • You type code, Copilot suggests the next line
  • You accept or reject suggestions
  • Human drives, Copilot assists

SWE agent (what the lab demonstrated):

  • You describe what needs to be built
  • Agent reads the existing codebase
  • Agent writes code across multiple files
  • Agent creates tests for the code it wrote
  • Agent runs the tests
  • Agent fixes failures
  • Agent submits a pull request

The multi-agent integration: In the lab's final exercise, the SWE agent was one agent in a multi-agent system. A planning agent determined what code changes were needed. The SWE agent implemented them. A review agent checked the implementation against requirements.

# SWE agent as part of multi-agent system
planning_agent = Agent(
    name="planner",
    instructions="Analyse requirements and create implementation plan"
)

swe_agent = CopilotSWEAgent(
    name="developer",
    repo="org/project",
    instructions="Implement code changes per the plan"
)

review_agent = Agent(
    name="reviewer",
    instructions="Review implementation against requirements"
)

orchestration = GroupChat(
    agents=[planning_agent, swe_agent, review_agent],
    strategy="sequential"
)

The honest assessment of SWE agents: The lab demonstrated SWE agents working on well-scoped, clearly specified tasks — adding a new API endpoint, writing a data transformation function, creating unit tests. These are tasks where the problem is well-defined and the solution pattern is well-known. SWE agents on ambiguous requirements, architectural decisions, or legacy codebase modifications were not demonstrated, and for good reason — those tasks require human engineering judgement that models cannot reliably replicate.

What this means for engineering teams: SWE agents accelerate the mechanical parts of software development. The parts that experienced engineers find tedious — boilerplate, standard CRUD endpoints, test scaffolding, configuration files — are exactly where SWE agents deliver value. The parts that require design thinking, trade-off analysis, and domain expertise remain human territory.

The uncomfortable question the lab avoided: If SWE agents handle the mechanical work, what happens to junior engineers who currently do that mechanical work as part of their learning path? The lab did not engage with this, and Microsoft's positioning of SWE agents as "productivity tools" sidesteps the workforce development implications.


A2A protocol: Building agents that talk to agents you did not build

The lab's A2A exercises were the most architecturally interesting and the most practically challenging.

The setup: Each lab participant's agents were configured to be A2A-discoverable on a local network. The exercise: make your agent delegate a task to another participant's agent, receive the result, and incorporate it into your workflow.

What A2A discovery looks like:

{
  "agent_card": {
    "name": "expense-analyser",
    "description": "Analyses expense reports for policy compliance",
    "capabilities": [
      "expense_review",
      "policy_check",
      "anomaly_detection"
    ],
    "protocols": ["a2a/v1"],
    "endpoint": "https://agent.example.com/a2a"
  }
}

How delegation works:

  1. Your agent queries the A2A registry for agents with specific capabilities
  2. A2A returns matching agent cards with capability descriptions
  3. Your agent sends a task to the selected agent via the A2A protocol
  4. The remote agent executes the task within its own security context
  5. Results are returned to your agent via A2A

What the lab revealed about A2A in practice:

Discovery works well. Agents finding each other by capability description is genuinely useful. You do not need to know the specific agent's name or endpoint — you describe what you need and A2A finds agents that can do it.

Trust is the hard problem. The lab ran on a trusted local network. In production, delegating a task to an external agent means trusting it with potentially sensitive data, trusting that it will execute faithfully, and trusting that it will not take longer than expected. The A2A protocol includes authentication and capability negotiation, but trust policies — who can your agent delegate to, what data can it share, what timeout is acceptable — are configuration decisions the protocol does not make for you.

Latency is real. Agent-to-agent communication across network boundaries adds latency. If Agent A calls Agent B which calls Agent C, each hop adds round-trip time plus execution time. The lab's local network masked this, but production A2A across the internet or even across cloud regions will have measurable performance impact.

A2A versus just calling an API: The sceptic's question is fair — why not just call another service's API directly instead of going through the A2A protocol? The answer is capability negotiation. With A2A, your agent can discover what another agent can do at runtime. With direct API calls, you need to know the API contract at build time. For dynamic, evolving multi-agent ecosystems, this difference matters. For two known services communicating in a predictable pattern, A2A is over-engineering.


What the lab got right

Progressive complexity: Exercises started with single-agent tool calling, progressed to multi-agent Magentic-One orchestration, then added A2A cross-boundary communication. Each exercise built on the previous, and participants who fell behind on one exercise could still start the next with provided baseline code.

Working infrastructure: Unlike some Ignite labs where participants spent half the session troubleshooting environment setup, LAB513's infrastructure mostly worked. Pre-provisioned Azure resources, pre-configured MCP servers, and clear setup instructions meant most participants were writing agent code within the first 10 minutes.

Real code, not pseudocode: The exercises produced working multi-agent systems, not architecture diagrams. Participants left with code they could adapt, not just concepts they understood.

Error handling emphasis: The lab explicitly covered what happens when things go wrong — MCP server unreachable, agent timeout, model returning unexpected output. Production systems fail, and the lab did not pretend otherwise.


What the lab got wrong

Time pressure on A2A exercises: The A2A cross-boundary exercises were the most interesting but received the least time. Most participants reached the A2A section in the final 15 minutes, which was not enough to properly explore trust configuration, error handling, and the discovery mechanism. This content warranted its own dedicated lab.

SWE agent exercises were prescriptive: The Copilot SWE agent exercises had participants follow exact steps rather than experimenting with their own code generation tasks. This made the exercise completable in the time available but did not expose the SWE agent's limitations. Participants left with an inflated sense of SWE agent capability.

No cost discussion: The lab consumed Azure resources throughout but never discussed the cost implications of multi-agent orchestrations. Each model call, each MCP tool invocation, each A2A delegation has cost. Without this context, participants might prototype systems with cost profiles that are unacceptable in production.

Single-language focus: The lab was Python-only. The agent-framework's .NET and upcoming TypeScript/JavaScript implementations were not exercised. For teams working in .NET (common in enterprise), the lab provided concepts but not directly portable code.

No observability walkthrough: The lab mentioned that OpenTelemetry is integrated into agent-framework but never had participants look at traces, spans, or metrics from their multi-agent orchestrations. For production systems, observability is not optional — it is the difference between debugging a multi-agent failure and staring at error logs hoping for insight.


The strategic picture: Why Microsoft unified the frameworks

The tactical benefit of agent-framework is clear — one API surface instead of two. But the strategic motivation runs deeper.

Control the framework, control the ecosystem. If agent-framework becomes the dominant way enterprises build agents, Microsoft controls the default patterns for model selection (Azure OpenAI), tool integration (MCP with Azure services), state management (Cosmos DB, Durable Functions), and deployment (Azure Container Apps, Foundry). Each default integration drives Azure consumption.

Google built A2A as an open standard, but Microsoft implemented it first. The A2A protocol emerged from Google, but LAB513 demonstrated it working in Microsoft's framework before Google shipped equivalent developer tooling. By shipping first, Microsoft sets developer expectations for how A2A should work in practice.

The Semantic Kernel installed base is the migration opportunity. Thousands of enterprises have production Semantic Kernel deployments. Agent-framework is positioned as the natural upgrade path, bringing those enterprises into the unified framework and its associated Azure defaults.

This is not criticism — it is how platform companies operate. But developers should understand the strategic context when evaluating framework choices.


The honest assessment

What this lab demonstrated convincingly:

The unified framework works. Building multi-agent systems with a single API surface is genuinely easier than bolting Semantic Kernel and AutoGen together. The cognitive overhead reduction is real and matters for team adoption.

MCP tool integration is practical. Runtime tool discovery, shared MCP servers across agents, and capability-based tool access work in practice, not just in architecture diagrams.

Magentic-One is the right default orchestration pattern. Model-driven routing handles the ambiguity of real-world tasks better than static coordinators, with acceptable predictability trade-offs when paired with OpenTelemetry observability.

What remains unproven:

A2A at scale. The lab demonstrated A2A on a trusted local network with known agents. Production A2A across organisational boundaries, with trust negotiation, latency management, and failure handling, is a materially harder problem.

SWE agent reliability. The lab's prescriptive exercises demonstrated SWE agents succeeding on well-defined tasks. Whether SWE agents reliably handle the ambiguous, messy work that constitutes most real software engineering is not yet demonstrated.

Framework stability over time. The merger happened recently. Whether agent-framework maintains API stability, avoids breaking changes, and does not require another migration in 18 months is an open question. The existence of migration tooling is reassuring and concerning in equal measure — reassuring because Microsoft recognises the need, concerning because the need exists.


The verdict

LAB513 is the most practically useful agent session from Ignite 2025. Not the most visionary, not the most impressive demo, but the session most likely to change how participants build agents when they return to their teams.

The unified agent-framework solves the real problem of framework fragmentation. MCP tools solve the real problem of tool integration at scale. Magentic-One solves the real problem of dynamic task routing. A2A addresses the real problem of cross-boundary agent communication, even if the solution is nascent.

The gaps — A2A maturity, SWE agent reliability, framework stability, and cost management — are real but addressable. They are engineering problems, not fundamental architectural flaws.

If you are starting a multi-agent project on Microsoft's stack, agent-framework is the correct foundation. If you have an existing Semantic Kernel codebase, migration is worth evaluating but not rushing. If you are betting your architecture on A2A interoperability with external agents, wait for the ecosystem to mature.


What to watch

Agent-framework release cadence: Is the framework shipping incremental improvements or periodic large changes? Incremental updates suggest stability. Large, infrequent releases suggest the platform is still finding its shape.

A2A adoption outside Microsoft: Watch for Google Cloud, AWS, and independent framework builders implementing A2A. Protocol viability depends on ecosystem breadth.

SWE agent benchmarks: Independent evaluations of SWE agent performance on real-world (not lab-crafted) software engineering tasks. The SWE-bench leaderboard is a useful proxy.

MCP server ecosystem: Track which enterprise platforms (ServiceNow, Salesforce, SAP, Workday) ship native MCP servers. MCP's value scales with the number of available tool servers.

Migration success rates: As enterprises migrate from Semantic Kernel and AutoGen to agent-framework, watch for reports of migration friction, broken patterns, and stability issues.


Related Coverage:


Sessions: LAB513 + LAB513-R1/R2/R3 | Various times Nov 19-21 | Moscone West

Previous
Building with Azure AI Foundry
Built: Mar 13, 2026, 12:43 PM PDT
80d1fe5