I’ve been discussing the inevitable progression that LLM companies are taking toward agentic AI capabilities for some time now on my blog and social media.
My Model Context Protocol series explored how Claude (and any AI product) can go from a mere chatbot to an AI agent capable of taking actions on your behalf.
OpenAI has also been on this path since launching ChatGPT. They’ve been adding tools like web search, code interpreter, Operator, Deep Research, and so on, to build out ChatGPT’s agentic capabilities.
This week, on March 11, 2025, they took the next step with the release of their Agents SDK, an open-source toolkit designed to make building sophisticated AI agents accessible to developers of all skill levels.
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.
What is the Agents SDK?
The OpenAI Agents SDK is a lightweight, Python-based framework for constructing multi-agent workflows. Evolved from their experimental “Swarm” project, this SDK provides a comprehensive solution for developers looking to create AI agents that can reason, use tools, and collaborate with other agents to accomplish complex tasks.
At its core, the SDK offers a simplified architecture with a few key primitives:
- Agents: LLMs equipped with instructions and tools
- Handoffs: A system allowing agents to delegate specific tasks to other specialized agents
- Guardrails: Safety mechanisms that run parallel to agents, validating inputs and outputs
- Function Tools: Utilities to transform any Python function into a tool with automatic schema generation
- Tracing: Built-in capabilities for visualizing, debugging, and monitoring agent workflows
Unlike some competing frameworks that require learning new abstractions, the Agents SDK embraces a Python-first approach. This allows developers to leverage familiar language features for orchestrating and chaining agents, significantly flattening the learning curve.
Why It Matters
The Agents SDK addresses many of the practical challenges developers face when building AI agents. It standardizes patterns for agent communication, state management, and collaboration, reducing the complexity barrier for creating useful AI applications.
The SDK isn’t revolutionary—it’s evolutionary, building on existing concepts while providing a more accessible framework. It handles much of the orchestration complexity while giving developers precise control over agent behavior.
What makes it valuable? Three core concepts:
- Agents that think AND act – Not just LLMs spitting out text, but AI assistants that can make decisions and execute functions
- Seamless teamwork through handoffs – Specialized agents working together, passing the baton when needed
- Safety through guardrails – Because nobody wants their AI going rogue after reading too many YouTube comments
How It Works
The mechanics of the Agents SDK are pretty straightforward. Let’s break down the basic workflow:
1. Agent Configuration
Agents are defined by providing a name, model, instructions, and tools:
- Give them a name (“Customer Support Agent”)
- Provide instructions (“Help users without saying ‘have you tried turning it off and on again?’ more than once per conversation”)
- Choose their “brain” (from quick-and-simple to deep-thinking models)
- Equip them with tools (the digital equivalent of giving someone access to the supply closet)
from openai.agents import Agent
researcher = Agent(
name="Customer Support Agent",
model="gpt-4o",
instructions="Help users without saying 'have you tried turning it off and on again?",
tools=[web_search, document_retrieval]
)
2. Agent Loop
When your agent runs, it enters the “agent loop”, a fancy way of saying it thinks, acts, and repeats until the job is done. The SDK handles the agent loop automatically, managing tool calling, result processing, and iteration:
- Agent gets input (like “I need help with my subscription”)
- Agent decides if they need more info or can respond directly
- If they need info, they use a tool and get results
- This continues until they reach a final answer
It’s basically the digital version of how I approach cooking: assess situation, realize I need more information, google recipe, realize I’m missing ingredients, order takeout, problem solved.
from openai.agents import Runner<br><br>runner = Runner()
result = runner.run(researcher, "What are the latest developments in quantum computing?")
print(result.final_output)
Tools: Extending Your Agent’s Capabilities
Without tools, agents would just be fancy chatbots. Tools are what let your AI reach out into the world and actually do stuff.
Creating a tool is as simple as decorating a Python function:
from agents.tool import function_tool
@function_tool
def search_knowledge_base(query: str) -> str:
# Your code to search a database
return "Here's what I found about " + query
There are two main types:
- Hosted tools: Pre-built capabilities like web search (the tools already in your shed)
- Function tools: Turn ANY Python function into an agent tool (like going to Home Depot and buying whatever you need)
The beauty is in how naturally the agent decides when to use these tools – it’s not pre-programmed, but rather a decision the LLM makes based on the task at hand.
Context: Keeping State Between Steps
For complex applications, you often need to maintain state across multiple interactions. The SDK lets you create a context object:
class UserSession:
def __init__(self, user_id):
self.user_id = user_id
self.preferences = {}
@function_tool
def update_preference(context, category: str, preference: str):
context.agent_context.preferences[category] = preference
return f"Updated {category} preference to {preference}"
This lets your tools access shared state, store progress information, or maintain user session data – incredibly useful for multi-step interactions.
Output Types: Getting Structured Results
By default, agents return free-form text, but you can enforce structured outputs using Pydantic models:
from pydantic import BaseModel
from agents import Agent, AgentRunner
class ProductRecommendation(BaseModel):
product_id: str
product_name: str
price: float
reasoning: str
agent = Agent(
name="product_recommender",
output_type=ProductRecommendation
)
This guarantees that you get properly structured data that your application can easily process, not just random text blobs.
Tracing
The built-in tracing system captures every step of the agent’s thinking and actions:
- What the agent was thinking
- Which tools it called and why
- What inputs it used
- What outputs it received
This means when something goes wrong (and we all know something always goes wrong), you can actually figure out why.
3. Multi-Agent Collaboration and Handoffs
One of the most powerful features is the ability to create handoffs between specialized agents and let them collaborate:
- Simple task? Use the fast, lightweight model
- Billing questions? Send them to the “Money Person” agent
- Technical problems? That’s for the “Have You Tried Restarting It?” agent
- Complex reasoning needed? Bring in the heavyweight model
support_agent = Agent(name="support", instructions="You handle customer questions.")
technical_agent = Agent(name="technical", instructions="You solve technical issues.")
billing_agent = Agent(name="billing", instructions="You handle billing inquiries.")
triage_agent = Agent(
name="triage",
instructions="Route customer inquiries to the appropriate specialized agent.",
handoffs=[support_agent, technical_agent, billing_agent]
)
This creates a workflow where agents can delegate subtasks, forming a collaborative system greater than the sum of its parts.
4. Safety Guardrails
Guardrails are the bouncers of your application, validating inputs before they reach your main agent. Want to prevent users from asking for the recipe to digital disaster? A guardrail can check inputs with a fast model first, saving your premium model for legitimate requests.
Developers can implement safety measures that run in parallel with agent execution:
from agents.guardrails import CustomGuardrail
async def is_not_swearing(msgs, context) -> bool:
content = " ".join(m["content"] for m in msgs if "content" in m)
return "badword" not in content.lower()
my_guardrail = CustomGuardrail(
guardrail_function=is_not_swearing,
tripwire_config=lambda output: not output # if 'False', raise error
)
agent = Agent(
name="my_agent",
input_guardrails=[my_guardrail]
)
Hands-On Example: Building a Multi-Agent Research System
To demonstrate the power and flexibility of OpenAI’s Agents SDK, I’ve created a practical example that showcases how multiple specialized agents can collaborate to accomplish complex tasks. This Research Agent System represents the kind of real-world application that the SDK enables developers to build quickly and efficiently.
The Research Agent System Architecture
This system consists of four specialized agents that work together to produce comprehensive research content:
- Triage Agent: Coordinates the overall research process, delegating tasks to specialized agents
- Researcher Agent: Gathers information from various sources on a given topic
- Fact Checker Agent: Verifies statements for accuracy and proper sourcing
- Writer Agent: Synthesizes verified research into coherent, well-structured content
Each agent is designed with specific instructions, tools, and capabilities that allow it to excel at its particular role. The system demonstrates several key features of the OpenAI Agents SDK:
- Handoffs: Agents delegate tasks to more specialized agents
- Context sharing: All agents work with a shared research context
- Guardrails: Ensures content remains fact-based and properly sourced
- Structured outputs: Final content follows a consistent, well-organized format
- Function tools: Agents leverage specialized tools for searching, verifying, and saving content
The Code
Each agent as described above is going to do a certain task and then give us the result of the task in an output. We want to ensure that the output is structured in a certain manner so that when they hand it off to the next agent, that agent can take it in that structure and then do more work on it.
class ResearchFinding(BaseModel):
"""A single research finding with source information."""
statement: str
source: str
confidence: float # 0.0 to 1.0
class VerifiedResearch(BaseModel):
"""Collection of verified research findings."""
findings: List[ResearchFinding]
verified: bool
notes: Optional[str] = None
class FinalContent(BaseModel):
"""Final output content with structured sections."""
title: str
introduction: str
key_points: List[str]
body: str
conclusion: str
sources: List[str]
We also want to give each agent some tools to do their work. The Research Agent, for example, will need a tool to search the internet as well as save the retrieved content into a file. The fact-checker agent would need a tool to verify that content, and so on.
I am not going to write all the tools here, but here’s what the web search tool might look like, using the Exa Search API.
@function_tool
async def search_web(context: AgentContextWrapper[ResearchContext], query: str) -> str:
"""
Search the web for information about a topic using the Exa Search API.
Args:
query: The search query text
Returns:
Search results as formatted text with citations
"""
topic = context.agent_context.topic
# Combine the specific query with the general topic for better results
full_query = f"{query} about {topic}"
try:
# Make a request to the Exa Search API
async with aiohttp.ClientSession() as session:
async with session.post(
"https://api.exa.ai/search",
headers={
"Content-Type": "application/json",
"x-api-key": "YOUR_EXA_API_KEY" # Replace with your actual API key
},
json={
"query": full_query,
"numResults": 5,
"useAutoprompt": True,
"type": "keyword"
}
) as response:
if response.status != 200:
error_text = await response.text()
return f"Error searching: {response.status} - {error_text}"
search_results = await response.json()
# Process the results
formatted_results = f"Search results for '{query}' about {topic}:\n\n"
if not search_results.get("results"):
return f"No results found for '{query}' about {topic}."
# Format each result with its title, content, and URL
for i, result in enumerate(search_results.get("results", []), 1):
title = result.get("title", "No title")
url = result.get("url", "No URL")
content = result.get("text", "").strip()
# Limit content length for readability
if len(content) > 500:
content = content[:500] + "..."
formatted_results += f"{i}. **{title}**\n"
formatted_results += f" {content}\n"
formatted_results += f" Source: {url}\n\n"
# Add a summary if available
if search_results.get("autopromptString"):
formatted_results += f"Summary: {search_results.get('autopromptString')}\n\n"
return formatted_results
except Exception as e:
# Provide a useful error message
error_message = f"Error while searching for '{query}': {str(e)}"
# Add fallback information if the search fails
fallback_info = f"\n\nFallback information about {topic}:\n" + \
f"1. {topic} has been studied in recent publications.\n" + \
f"2. Current research suggests growing interest in {topic}.\n" + \
f"3. Common challenges in {topic} include implementation complexity and adoption barriers."
return error_message + fallback_info
You’ll notice this tool uses the ResearchContext context to share data across other tools. Let’s define that as well:
class ResearchContext:
def __init__(self, topic: str):
self.topic = topic
self.findings = []
self.verified_findings = []
self.draft_content = ""
self.history = []
def add_finding(self, finding: ResearchFinding):
self.findings.append(finding)
self.history.append(f"Added finding: {finding.statement}")
def add_verified_findings(self, verified: VerifiedResearch):
self.verified_findings.extend(verified.findings)
self.history.append(f"Added {len(verified.findings)} verified findings")
def set_draft(self, draft: str):
self.draft_content = draft
self.history.append("Updated draft content")
You may also want to add some guardrails, for example checking if the research content is unbiased. A very simple hard-coded example might be to count the number of times an opinion is expressed vs a fact, like so:
async def is_fact_based(msgs, context) -> bool:
"""Check if messages appear to be fact-based and not opinion-heavy."""
content = " ".join(m.get("content", "") for m in msgs if isinstance(m, dict))
opinion_phrases = ["I believe", "I think", "in my opinion", "probably", "might be", "could be"]
# Count opinion phrases
opinion_count = sum(content.lower().count(phrase) for phrase in opinion_phrases)
# Allow some opinion phrases, but not too many
return opinion_count < 3
fact_based_guardrail = CustomGuardrail(
guardrail_function=is_fact_based,
tripwire_config=lambda output: not output,
error_message="Output contains too many opinion statements rather than fact-based research."
)
You can create something more powerful but this simple example highlights how the SDK checks against your guardrails.
Finally, we’ll create our Agents and give them the tools, context, and guardrails. Here’s what the Fact Checker Agent might look like:
fact_checker_agent = Agent(
name="fact_checker_agent",
model="gpt-4o",
instructions="""You are a meticulous fact-checking agent. Your job is to:
1. Review the research findings in the shared context
2. Verify each statement using the verify_statement tool
3. Consolidate verified findings using save_verified_research
4. Be skeptical and thorough - only approve statements with sufficient evidence
For each finding, check if the source is credible and if the statement contains verifiable
facts rather than opinions or generalizations.
""",
context_type=ResearchContext,
tools=[verify_statement, save_verified_research],
output_type=str,
output_guardrails=[fact_based_guardrail],
description="Verifies research findings for accuracy and proper sourcing"
)
Our Triage Agent which manages the whole process would also have handoffs defined in its parameters:
triage_agent = Agent(
name="triage_agent",
model="gpt-3.5-turbo",
instructions="""You are a research coordinator who manages the research process.
For any research query:
1. First, hand off to the researcher_agent to gather information
2. Then, hand off to the fact_checker_agent to verify the findings
3. Finally, hand off to the writer_agent to create the final content
Monitor the process and ensure each specialized agent completes their task.
""",
context_type=ResearchContext,
handoffs=[
handoff(researcher_agent),
handoff(fact_checker_agent),
handoff(writer_agent)
],
output_type=FinalContent,
description="Coordinates the research process across specialized agents"
)
And finally, we write the main function to run the whole process:
async def run_research_system(topic: str) -> FinalContent:
"""Run the multi-agent research system on a given topic."""
# Create the shared context
context = ResearchContext(topic=topic)
# Configure the run with tracing enabled
config = AgentRunConfig(
run_name=f"research_{topic.replace(' ', '_')}",
tracing_disabled=False
)
# Run the triage agent with the initial query
result = await AgentRunner.run(
triage_agent,
[f"Research the following topic thoroughly: {topic}"],
context=context,
run_config=config
)
return result.agent_output
Try It Yourself
If you’re eager to explore the Agents SDK yourself, the process is straightforward:
- Install the SDK via pip:
pip install openai-agents
- Check out the official documentation
- Explore the GitHub repository for examples and contributions
The documentation is comprehensive and includes numerous examples to help you understand the SDK’s capabilities and implementation patterns.
Your Next Steps
As we venture further into the age of agentic AI, tools like the Agents SDK will become increasingly valuable. Whether you’re looking to automate complex workflows, create specialized assistants, or explore the frontiers of AI capability, this toolkit provides an excellent foundation.
I encourage you to dive in and experiment with the Agents SDK for your projects. If you’re working on something interesting or need guidance on implementation, don’t hesitate to reach out. I’m particularly interested in hearing about novel applications and creative uses of multi-agent systems.
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.