Image by Editor
# Introduction
The Agent Framework Dev Project is a community-driven initiative that offers practical, hands-on training for developers to build AI agents using modern frameworks and tools. This initiative is highlighted by events like the Agent Framework Dev Day, hosted by the Boston Azure AI Group and supported by Microsoft. The Microsoft Agent Framework, launched in October 2025, brings together Semantic Kernel and AutoGen into a single unified approach to creating production-ready agentic systems. When paired with the Microsoft Foundry platform, developers gain observability, safety settings, and enterprise-level operational controls built atop the core framework. Delving into the Python-based materials of this framework exposes four interconnected technical areas, each layer building logically upon the previous one, all rooted in patterns applicable to real-world production systems.
# Treating Safety as an Empirical Measurement Problem
Many agentic tutorials barely touch on safety. A far better approach is to put safety front and center from the very first line, asking developers to see and quantify risk before writing any agent logic—giving everyone a grounded understanding of what unprotected models are actually capable of.
The key instrument here is a dual-model comparison runner. You simultaneously send the same prompt to two gpt-4.1-mini deployments: one powered by Microsoft Foundry’s full safety guardrails, and another with those protections dialed down. The terminal then displays both results side-by-side, complete along with latency data and response text for each, so the behavioral difference is impossible to ignore or dismiss as merely theoretical.
The default prompt is intentionally provocative: a request for instructions on how to make a homemade explosive. Typically, the guarded model will refuse outright, while the unguarded version may comply. Seeing both responses surface in the same interface, on the same machine, at the same time transforms what could be an abstract warning into a concrete, immediate lesson in safety.
From this starting point, you can explore three input categories worth testing:
- Profanity—filterable using Microsoft Foundry’s curated blocklists
- Government identifiers like Social Security Numbers (SSNs)
- Other personally identifiable information (PII)
Each represents a real-world enterprise compliance concern, and each produces measurable differences between the two deployments, giving developers a clear understanding of where safety guardrails kick in and where vulnerabilities remain.
Latency matters just as much as response content. Safety guardrails introduce additional processing time, and that cost should be anticipated rather than ignored. Including a third configuration—models running with standard default settings between the two extremes—reinforces the reality that safety is a tunable continuum, not an on/off switch. Engineers actively adjust these balances based on the specific needs of each application.
On the technical side, the implementation relies on the framework’s AzureAIClient to spin up ephemeral agents for each model variant, runs both concurrently via asyncio.gather, and reports token usage alongside performance metrics. The architecture is deliberately kept minimal. The focus throughout is on the comparison itself, not on the surrounding scaffolding.
The core takeaway: finishing a task is not the same as finishing it responsibly under genuine real-world conditions. Recognizing this difference early in development influences every architectural choice that follows.
# Connecting Agents to the World with the Model Context Protocol
The Model Context Protocol (MCP) acts as a universal connector, enabling AI agents to integrate with data sources and tools through a standardized interface—no changes required on the agent side even when underlying services evolve. This makes MCP a practical foundation for developing agents that need to work with changing enterprise systems.
The overall architecture involves three main components: a host application (the AI agent) connects through an MCP client to one or more MCP servers, each exposing tools, resources, and prompts relevant to the domain. Whether servers are running locally or remotely, the agent’s client code stays unchanged—keeping the agent layer cleanly decoupled from infrastructure and deployment decisions.
Two primary transport mechanisms address the most common deployment models:
// STDIO Transport
STDIO transport launches the MCP server as a subprocess and communicates via standard input and output channels. This setup suits local tools and command-line integrations where low latency and tight process-level communication are priorities.
// HTTP/SSE Transport
HTTP/SSE transport runs the server as a web service and communicates over HTTP with Server-Sent Events (SSE). This approach is better suited for cloud-native services and shared utilities that multiple distributed agents need to access simultaneously.
A concrete four-component implementation built around a support ticket domain makes these patterns tangible. The mcp_local_server exposes four tools via STDIO: GetConfig, UpdateConfig, GetTicket, and UpdateTicket. The mcp_remote_server is a FastAPI REST API listening on port 5060 managing the same ticket data as a proper, persistent service layer. A bridge component called mcp_bridge runs on port 5070 and translates between HTTP/SSE protocol and standard HTTP calls to the backend API. Finally, the mcp_agent_client consumes all of these simultaneously, dynamically discovering tools from each server and converting them into the function-calling schema that Azure OpenAI expects—all within a single agent session.
The most powerful enterprise advantage here: adding an MCP wrapper around an existing REST API requires absolutely zero changes on the backend itself. Any service already accessible via HTTP endpoints immediately becomes reachable by an AI agent without touching a single line of its source code. This dramatically lowers integration costs for organizations with sprawling existing API ecosystems.
The complete agentic loop demonstrated here covers runtime tool discovery, dynamic function conversion, model invocation, tool dispatch, and result re-ingestion into context—all implemented from scratch using the MCP SDK and Azure OpenAI. This gives developers and architects a transparent view of how every component connects and works together.
# Orchestrating Workflow Patterns: Sequential, Concurrent, and Human-in-the-Loop
Workflow orchestration is the point where individual agents become coordinated systems capable of tackling problems far beyond what any single model call could resolve in isolation.
All three patterns operate over the same SupportTicket data model, carrying attributes such as ticket ID, customer name, subject line, description, and priority. Using an identical domain across all three orchestration styles is a deliberate pedagogical choice: the goal is to observe the same data flowing through fundamentally different processing architectures and notice how the output, the latency, and the control surface available to the human operator each change.
// Sequential Workflow
A high-priority
A customer ticket about login issues following a password reset flows through an automated pipeline: first intake, then an AI categorization stage that classifies and summarizes the problem in structured JSON, and finally a response generation stage. The end result is a polished, customer-facing reply that recognizes the urgency, provides clear next steps, and references the ticket number. No human is involved at any point, and the output of each stage is fully visible before it advances to the next, making every transformation in the pipeline transparent and easy to inspect.
// Concurrent Workflow
When a customer reports both a duplicate charge and an app crash in the same message, the shortcomings of a single-agent sequential pipeline become obvious. Billing and technical problems demand different expertise, and forcing both through one agent leads to a weaker response than routing each to a specialist who can think deeply within a focused area.
The concurrent approach splits the query, sending it simultaneously to a billing expert agent and a technical expert agent. The billing agent handles the duplicate charge and suggests a refund process. The technical agent concentrates on clearing the cache and reinstalling the app to fix the crashes. Neither agent tries to cover both areas. The combined result delivers the customer a thorough answer that no single specialist could have produced on their own, and the total response time is determined by the slower agent rather than the sum of both.
// Human-in-the-Loop Workflow
The most critical scenario involves a customer requesting a full refund on an annual premium subscription bought just one week earlier. The AI drafts a response that correctly references the 14-day money-back guarantee policy and offers to process the cancellation right away. Then the process halts, and control is explicitly handed to a human reviewer before anything goes out.
The supervisor sees the full draft along with three clear options: approve and send as-is, make edits before sending, or escalate to management. Once approved, the system records the action, marks the ticket as resolved, and logs that the response was sent without changes, producing a complete audit trail of the decision.
What this pattern makes tangible is something workflow diagrams often gloss over: the human-in-the-loop pause is not a failure or an exception. It is a deliberate, first-class checkpoint built into the workflow. The system waits for it without polling or timing out. This is the pattern that makes AI-assisted processes auditable and defensible in regulated or high-stakes settings, and it should be regarded as an equal counterpart to fully automated approaches rather than a last-resort fallback.
Expanding each pattern deepens the understanding considerably. Inserting a sentiment analysis agent before categorization in the sequential pipeline, adding a security or account specialist to the concurrent fan-out, introducing new supervisor actions like “Request More Info” to the human-in-the-loop step, and combining sequential and concurrent patterns into a single hybrid workflow all require a solid grasp of how the executor classes, shared client factory, and data models interconnect across the entire system.
# Moving from RAG to Agentic RAG
Standard retrieval-augmented generation (RAG) applications are easy to set up initially, but they run into question types that basic retrieval struggles with, and those shortcomings tend to show up quickly once real users begin interacting with the system. Yes/no questions, counting queries, and multi-hop reasoning all push against the assumptions of a single embedding-lookup pipeline in ways that become immediately apparent in production.
The path through this challenge unfolds across four stages: ingestion, simple RAG, advanced RAG, and agentic RAG. The order is deliberate. Running into the limits of naive retrieval first makes the shift to agentic retrieval feel concrete rather than theoretical, because the weaknesses of the simpler approach are already evident before the solution is presented.
The solution leverages the Microsoft Agent Framework with a Handoff workflow orchestration pattern, building specialized agents that carry out specific search operations backed by Azure AI Search. The Handoff pattern directs each query to the most suitable specialist agent rather than routing every question through one retrieval pipeline, meaning each agent can be fine-tuned for the query type it is built to handle. The implementation spans four steps: initial setup, a yes/no search agent, a count search agent, and the remaining specialist agents, each adding a new retrieval capability to the overall system.
The architectural departure from standard RAG is significant and worth stating plainly. Instead of a single retrieval pipeline trying to handle every query type with the same approach, an orchestrator dispatches to agents specialized for different retrieval strategies, with Azure AI Search serving as the shared knowledge backbone that all specialist agents draw from. The outcome is a system able to answer the full spectrum of question types that standard RAG applications struggle with, including questions that demand reasoning over retrieved results rather than simply returning them.
# Understanding Why These Four Topics Belong Together
The progression reflects a coherent picture of what production-ready agentic development actually demands, and the order of the topics is not arbitrary. Safety comes first because it reframes what working code means in an agentic context, establishing from the start that capability and responsible behavior are distinct properties that must be evaluated independently. MCP defines how agents communicate with external tools and services in a standardized, interoperable manner — including the key insight that existing APIs can be bridged without any backend changes, making it practical to connect agents to real enterprise systems rather than custom-built toy backends. Workflow patterns define how multiple agents coordinate and, crucially, when to pause for a human, introducing the control structures that make agentic systems reliable enough to deploy in high-stakes environments. Agentic RAG shows how knowledge retrieval scales beyond simple lookup to handle the full variety of questions real users ask, completing the picture of what a production knowledge system built on this framework looks like.
Viewed as a whole, the four domains move from observing behavior to building architecture to operating systems. That progression is what separates a working prototype from a deployable system, and understanding each layer makes the next one significantly easier to reason about.
Rachel Kuznetsov holds a Master’s in Business Analytics and thrives on solving complex data puzzles and seeking out new challenges. She is passionate about making intricate data science concepts accessible and is exploring the many ways AI impacts our lives. On her ongoing journey to learn and grow, she documents her experiences so others can learn alongside her. You can find her on LinkedIn.



