Yesterday OpenAI rolled out o3, the first reasoning model that is also agentic. Reasoning models have been around for a while, and o3 has been around in it’s mini version as well.
However, the full release yesterday showed us a model that not only reasons, but can browse, run Python, and look at your images in multiple thought loops. It behaves differently than the reasoning models we’ve seen so far, and that makes it unique.
OpenAI even hinted it “approaches AGI—with caveats.” Of course, OpenAI has been saying this for four years with every new model release so take it with a pinch of salt. That being said, I did want to test this out and compare it to the current top model (Gemini 2.5 Pro) to see if it’s better.
What the experts and the numbers say
Before we get into the 4 tests I ran both models through, let’s look at the benchmarks and a snapshot of what o3 can do.
Capability
o3 highlights
Benchmarks
22.8 % jump on SWE‑Bench Verified coding tasks and one missed question on AIME 2024 math.
Vision reasoning
Rotates, crops, zooms, and then reasons over the edited view. It can “think with images“
Full‑stack tool use
Seamlessly chains browsing, Python, image generation, and file analysis (no plug‑in wrangling required).
Access & price
Live for Plus, Pro, and Team; o3‑mini even shows up in the free tier with light rate limits.
Field‑testing o3 against Gemini 2.5 Pro
Benchmarks are great but I’ve stopped paying much attention to them recently. What really counts is if it can do what I want it to do.
Below are four experiments I ran, pitting o3 against Google’s best reasoning model in areas like research, vision, coding, and data science.
Deep‑dive research
I started with a basic research and reasoning test. I asked both models the same prompt: “What are people saying about ChatGPT o3? Find everything you can and interesting things it can do.”
Gemini started by thinking about the question, formulating a search plan, and executing against it. Because o3 is a brand new model, it’s not in Gemini’s training data, so it wasn’t sure if I meant o3 or ChatGPT-3 or 4o (yeah OpenAI’s naming confuses even the smartest AI models).
So to cover all bases, Gemini came up with 4 search queries and ran them in parallel. When the answers came back, it combined them all and gave me a final response.
Gemini’s thought process
o3, on the other hand, took the Sherlock route – search, read, reason, search again, fill a gap, repeat. The final response stitched together press reactions, Reddit hot takes, and early benchmark chatter.
o3’s thought process
This is where that agentic behaviour of o3 shines. As o3 found answers to its initial searches, it reasoned more and ran newer searches to plug gaps in the response. The final answer was well-rounded and solved my initial query.
Gemini only reasoned initially, and then after running the searches it combined everything into an answer. The problem is, because it wasn’t sure what o3 was when it first reasoned, one of the search queries was “what can ChatGPT do” instead of “what can o3 do”. So when it gave me the final answer, it didn’t quite solve my initial query.
Takeaway: Research isn’t a single pull‑request; it’s a feedback loop. o3 bakes that loop into the core model instead of outsourcing it to external agents or browser plug‑ins. When the question is fuzzy and context keeps shifting, that matters.
Image sleuthing
Now if you’ve used AI as much as I have, you’ll have realized that o3 research works almost like Deep Research, a feature that Gemini also has. And you’re right, it does.
But search isn’t the only tool o3 has in its arsenal. It can also use Python, and work with images, files, and more.
So my next test was to see if it could analyze and manipulate images. I tossed both models a picture of me taken in the Japan Pavilion at EPCOT, Disney World. I thought because of the Japanese background it might trip the model up.
Ninety seconds later o3 not only pinned the location but pointed out a pin‑sized glimpse of Spaceship Earth peeking over the trees far in the background, something I’d missed entirely.
I was surprised it noticed that, so I asked it to point it out to me. Using Python, it identified the object, calculated its coordinates, and put a red circle right where the dome is! It was able to do this because it went through multiple steps of reasoning and tool use, showcasing its agentic capabilities.
Gemini also got the location right, but it only identified the pagoda and torii gate, not Spaceship Earth. When I asked it to mark the torii gate, it could only describe its position in the image, but it couldn’t edit and send me back the image.
Takeaway: o3’s “vision ↔ code ↔ vision” loop unlocks practical image tasks like quality‑control checks, UI audits, or subtle landmark tagging. Any workflow that mixes text, numbers, code, and images can hand the grunt work to o3 while the human focuses on decision‑making.
Coding with bleeding‑edge libraries
Next up, I wanted to see how well it does with coding. Reasoning models by their nature are good at this, and Gemini has been my go-to recently.
I asked them both to “Build a tiny web app. One button starts a real‑time voice AI conversation and returns the transcript.”
The reason I chose this specific prompt is because Voice AI has improved a lot in recent weeks, and we’ve had some new libraries and SDKs come out around it. A lot of the newer stuff is beyond the cutoff date of these models.
So I wanted to see how well it does with gathering newer documentation and using that in its code versus what it already knows in its training data.
o3 researched the latest streaming speech API that dropped after its training cutoff, generated starter code, and offered the older text‑to‑speech fallback.
Gemini defaulted to last year’s speech‑to‑text loop and Google Cloud calls.
While both were technically correct and their code does work, o3 came back with the more up-to-date answer. Now, I could have pointed Gemini in the right direction and it would have coded something better, but that’s still an extra step that o3 eliminated out of the box.
Takeaway: o3’s autonomous web search makes it less likely to hand you stale SDK calls or older documentation.
Data analysis + forecasting
Finally, I wanted to put all the tools together into one test. I asked both models: “Chart how Canadian tourism to the U.S. is trending this year vs. last, then forecast to July 1.”
This combines search, image analysis, data analysis, python, and chart creation. o3’s agentic loop served it well again. It searched, found data, identified gaps, searched more, until it gave me a bar chart.
Initially, it only found data for January 2025, so it only plotted that. When I asked it for data on February and March, it reasoned a lot longer, ran multiple searches, found various data, and eventually computed an answer.
o3’s thought process
Gemini found numbers for January and March, but nothing for February, and since it doesn’t have that agentic loop, it didn’t explore further and try to estimate the numbers from other sources like o3 did.
The most impressive part though was when I asked both to forecast the numbers into summer. Gemini couldn’t find data and couldn’t make the forecast. o3 on the other hand did more research, looked at broader trends like the tariffs and border issues, school breaks, airline discount season, even the NBA finals, and made assumptions around how that would impact travel going into summer.
Takeaway: o3 feels like a junior quant who refuses to stop until every cell in the spreadsheet is filled (or at least justified). This combines search, reason, data analysis loop is invaluable for fields like investing, economics, finance, accounting, or anything to do with data.
Strengths, quirks, and when to reach for o3
Where it shines
Multi‑step STEM problems, data wrangling, and “find the blind spot” research.
Vision workflows that need both explanation and a marked‑up return image.
Rapid prototyping with APIs newer than the model’s cutoff.
Where it still lags
Creative long‑form prose: I still think Claude 3.7 is the better novelist but that’s personal preference.
Sheer response latency: the deliberative pass can stretch beyond a minute.
Token thrift: the reasoning trace costs compute; budget accordingly.
Final thoughts
I’d love to continue testing o3 out for coding and see if it can replace Gemini 2.5 Pro, but I do think it is already stronger with research and reasoning. It’s the employe who keeps researching after everyone heads to lunch, circles details no one else spotted, and checks the changelog before committing code.
If your work involves any mix of data, code, images, or the open web (and whose work doesn’t) you’ll want that kind of persistence on tap. Today, that persistence is spelled o‑3.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
The Agent Development Kit (ADK) is a new open-source framework released by Google that simplifies the end-to-end development of intelligent agent systems.
Do we really need another agent framework? Probably not. But hey, Google’s been on a roll and Gemini 2.5 Pro is my new favourite model (we’ll see if this changes next month), so if they’re offering something that makes it easy to build complex agentic systems, I’m all ears.
In this mammoth guide, I’ll explore all that the Agent Development Kit has to offer, starting from it’s capabilities and primitives, all the way to building a complex multi-agent system with all the bells and whistles.
Key Features and Capabilities
ADK offers a rich set of features designed to address the entire agent development lifecycle:
Multi-Agent Architecture: create modular, scalable applications where different agents handle specific tasks, working in concert to achieve complex goals
Model Flexibility: use Gemini models directly, access models available via Vertex AI Model Garden, or leverage LiteLLM integration to work with models from providers like Anthropic, Meta, Mistral AI, and AI21 Labs.
Rich Tool Ecosystem: use pre-built tools (like Search and Code Execution), create custom tools, implement Model Context Protocol (MCP) tools, integrate third-party libraries (such as LangChain and LlamaIndex), or even use other agents as tools.
Built-in Streaming: native bidirectional audio and video streaming capabilities, enabling natural, human-like interactions beyond just text.
Flexible Orchestration: structured workflows using specialized workflow agents (Sequential, Parallel, Loop) for predictable execution patterns, and dynamic, LLM-driven routing for more adaptive behavior.
Integrated Developer Experience: powerful CLI and visual Web UI for local development, testing, and debugging.
Built-in Evaluation: systematically assess agent performance, evaluating both final response quality and step-by-step execution trajectories against predefined test cases.
Deployment Options: Agents built with ADK can be containerized and deployed anywhere, including integration with Google Cloud services for production environments.
The Architecture of ADK
At a high level, ADK’s architecture is designed around several key components that work together to create functional agent systems:
Core Components:
Agents: The central entities that make decisions and take actions. ADK supports various types of agents, including LLM-powered agents and workflow agents that orchestrate others.
Tools: Functions or capabilities that agents can use to perform specific actions, such as searching the web, executing code, or retrieving information from databases.
Runners: Components that manage the execution flow of agents, handling the orchestration of messages, events, and state management.
Sessions: Maintain the context and state of conversations, allowing agents to persist information across interactions.
Events: The communication mechanism between components in the system, representing steps in agent execution.
Architectural Patterns:
ADK is built around a flexible, event-driven architecture that enables:
Modular Design: Components can be combined and reconfigured to create different agent behaviors
Extensibility: The system can be extended with new tools, models, and agent types
Separation of Concerns: Clear boundaries between reasoning (agents), capabilities (tools), execution (runners), and state management (sessions)
This architecture allows developers to focus on defining what their agents should do, while ADK handles the complex orchestration of execution, communication, and state management.
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.
Getting Started with ADK
Getting started with the Agent Development Kit is straightforward, requiring just a few steps to set up your development environment. ADK is designed to work with Python 3.9 or later, and it’s recommended to use a virtual environment to manage dependencies.
Basic Installation
To install ADK, you’ll need to have Python installed on your system. Then, you can use pip to install the package:
Bash
# Create a virtual environment (recommended)python-mvenv.venv# Activate the virtual environment# On macOS/Linux:source.venv/bin/activate# On Windows (CMD):.venv\Scripts\activate.bat# On Windows (PowerShell):.venv\Scripts\Activate.ps1# Install ADKpipinstallgoogle-adk
This installs the core ADK package, which includes all the necessary components to build and run agents locally. You’ll need to add your GOOGLE_API_KEY in a .env file.
Creating Your First Basic Agent
Let’s create a simple agent that can tell you the weather and time for a specific city. This example will demonstrate the basic structure of an ADK project.
import datetimefrom zoneinfo import ZoneInfofrom google.adk.agents import Agentdefget_weather(city: str) -> dict:"""Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """if city.lower() =="new york":return {"status": "success","report": ("The weather in New York is sunny with a temperature of 25 degrees"" Celsius (41 degrees Fahrenheit)." ), }else:return {"status": "error","error_message": f"Weather information for '{city}' is not available.", }defget_current_time(city: str) -> dict:"""Returns the current time in a specified city. Args: city (str): The name of the city for which to retrieve the current time. Returns: dict: status and result or error msg. """if city.lower() =="new york": tz_identifier ="America/New_York"else:return {"status": "error","error_message": (f"Sorry, I don't have timezone information for {city}." ), } tz = ZoneInfo(tz_identifier) now = datetime.datetime.now(tz) report = (f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}' )return {"status": "success", "report": report}weather_time_agent = Agent(name="weather_time_agent",model="gemini-2.0-flash-exp",description=("Agent to answer questions about the time and weather in a city." ),instruction=("I can answer your questions about the time and weather in a city." ),tools=[get_weather, get_current_time],)
Finally, add your API keys to the .env file. You can directly use Gemini but if you want to use other models, like Anthropic or OpenAI, you’ll need to ‘pip install litellm‘ first.
Once done, you can run the agent with ‘adk run‘
Of course, this is a really basic agent and doesn’t need a framework. Let’s dive deeper into the core components of the ADK and build a more complex agent.
Building Agents: The Foundation
ADK provides several agent types to address different needs and use cases:
LLM Agent
The LlmAgent (often simply referred to as Agent) is the most commonly used agent type. It leverages a Large Language Model to understand user requests, make decisions, and generate responses. This is the “thinking” component of your application.
Python
from google.adk.agents import Agent # This is actually an LlmAgentmy_agent = Agent(name="my_first_agent",model="gemini-2.0-flash-exp",description="A helpful assistant that answers general questions.",instruction="You are a friendly AI assistant. Be concise and helpful.",tools=[] # Optional tools)
The LlmAgent is non-deterministic – its behaviour depends on the LLM’s interpretation of instructions and context. It can use tools, transfer to other agents, or directly respond to users based on its reasoning.
Workflow Agents
Workflow agents provide deterministic orchestration for sub-agents. Unlike LLM agents, they follow predefined execution patterns:
SequentialAgent: Executes sub-agents one after another, in order:
Python
from google.adk.agents import SequentialAgentstep1 = Agent(name="data_collector", model="gemini-2.0-flash-exp")step2 = Agent(name="data_analyzer", model="gemini-2.0-flash-exp")pipeline = SequentialAgent(name="analysis_pipeline",sub_agents=[step1, step2] # Will execute in this order)
ParallelAgent: Executes sub-agents concurrently:
Python
from google.adk.agents import ParallelAgentfetch_weather = Agent(name="weather_fetcher", model="gemini-2.0-flash-exp")fetch_news = Agent(name="news_fetcher", model="gemini-2.0-flash-exp")parallel_agent = ParallelAgent(name="information_gatherer",sub_agents=[fetch_weather, fetch_news] # Will execute in parallel)
LoopAgent: Repeatedly executes sub-agents until a condition is met:
Python
from google.adk.agents import LoopAgentprocess_step = Agent(name="process_item", model="gemini-2.0-flash-exp")check_condition = Agent(name="check_complete", model="gemini-2.0-flash-exp")loop_agent = LoopAgent(name="processing_loop",sub_agents=[process_step, check_condition],max_iterations=5# Optional maximum iterations)
Custom Agents
For specialized needs, you can create custom agents by extending the BaseAgent class:
Python
from google.adk.agents import BaseAgentfrom google.adk.agents.invocation_context import InvocationContextfrom google.adk.events import Eventfrom typing import AsyncGeneratorclassMyCustomAgent(BaseAgent): name: str="custom_agent" description: str="A specialized agent with custom behavior"asyncdef_run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:# Custom implementation logic here# You must yield at least one Eventyield Event(author=self.name, content=...)
Custom agents are useful when you need deterministic behavior that doesn’t fit into the existing workflow agent patterns, or when you want to integrate with external systems in custom ways.
Configuring an Agent: Models, Instructions, Descriptions
The behaviour of an agent is largely determined by its configuration parameters:
Model Selection
The model parameter specifies which LLM powers your agent’s reasoning (for LlmAgent). This choice affects the agent’s capabilities, cost, and performance characteristics:
Python
# Using a Gemini model directlyagent = Agent(name="gemini_agent",model="gemini-2.0-flash-exp", # Choose model variant based on needs# Other parameters...)
Setting Instructions
The instruction parameter provides guidance to the agent on how it should behave. This is one of the most important parameters for shaping agent behaviour:
Python
agent = Agent(name="customer_support",model="gemini-2.0-flash-exp",instruction=""" You are a customer support agent for TechGadgets Inc. When helping customers: 1. Greet them politely and introduce yourself 2. Ask clarifying questions if the issue isn't clear 3. Provide step-by-step troubleshooting when appropriate 4. For billing issues, use the check_account_status tool 5. For technical problems, use the diagnostic_tool 6. Always end by asking if there's anything else you can help with Never share internal company information or promise specific refund amounts. """)
Best practices for effective instructions:
Be specific about the agent’s role and persona
Include clear guidelines for when and how to use available tools
Use formatting (headers, numbered lists) for readability
Provide examples of good and bad responses
Specify any constraints or boundaries
Defining Descriptions
The description parameter provides a concise summary of the agent’s purpose:
Python
agent = Agent(name="billing_specialist",description="Handles customer billing inquiries and invoice issues.",# Other parameters...)
While the description is optional for standalone agents, it becomes critical in multi-agent systems. Other agents use this description to determine when to delegate tasks to this agent. A good description should:
Clearly state the agent’s specific domain of expertise
Be concise (usually 1-2 sentences)
Differentiate the agent from others in the system
Setting Output Key
The optional output_key parameter allows an agent to automatically save its response to the session state:
Python
recommendation_agent = Agent(name="product_recommender",# Other parameters...output_key="product_recommendation")
This is particularly useful in multi-agent workflows, as it allows subsequent agents to access the output without additional code.
Working with Multiple LLM Providers
One of ADK’s powerful features is its ability to work with different LLM providers through LiteLLM integration. This gives you flexibility to choose the right model for each agent in your system.
First, install the LiteLLM package: pip install litellm
Then, configure your API keys for the models you want to use: export OPENAI_API_KEY="your-openai-key" export ANTHROPIC_API_KEY="your-anthropic-key" # Add others as needed
Use the LiteLlm wrapper when defining your agent:
Python
from google.adk.agents import Agentfrom google.adk.models.lite_llm import LiteLlm# Using OpenAI's GPT-4ogpt_agent = Agent(name="gpt_agent",model=LiteLlm(model="openai/gpt-4o"),description="A GPT-powered agent",# Other parameters...)# Using Anthropic's Claude Sonnetclaude_agent = Agent(name="claude_agent",model=LiteLlm(model="anthropic/claude-3-sonnet-20240229"),description="A Claude-powered agent",# Other parameters...)# Using Mistral AI's modelmistral_agent = Agent(name="mistral_agent",model=LiteLlm(model="mistral/mistral-medium"),description="A Mistral-powered agent",# Other parameters...)
This approach allows you to:
Match models to specific tasks based on their strengths
Build resilience by having alternatives if one provider has issues
Optimize for cost by using less expensive models for simpler tasks
In the next section, we’ll explore how to extend your agent’s capabilities using tools.
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.
Tools: Extending Agent Capabilities
Tools extend an agent’s capabilities beyond the core language model’s reasoning abilities. While an LLM can generate text and make decisions, tools allow agents to take concrete actions in the world: fetching real-time data, performing calculations, calling external APIs, executing code, and more.
The agent’s language model decides when to use tools, with which parameters, and how to incorporate the results into its reasoning, but the tools themselves execute the agent’s intentions in predictable ways.
Creating Custom Function Tools
The most common way to create tools in ADK is by defining Python functions. These functions can then be passed to an agent, which will be able to call them when appropriate based on its reasoning.
Basic Tool Definition
Here’s a simple example of defining a function tool:
Python
defcalculate_mortgage_payment(principal: float, annual_interest_rate: float, years: int) -> dict:"""Calculates the monthly payment for a mortgage loan. Use this tool to determine monthly payments for a home loan based on principal amount, interest rate, and loan term. Args: principal: The initial loan amount in dollars. annual_interest_rate: The annual interest rate as a percentage (e.g., 5.5 for 5.5%). years: The loan term in years. Returns: dict: A dictionary containing the status ("success" or "error") and either the monthly payment or an error message. """try:# Convert annual interest rate to monthly decimal rate monthly_rate = (annual_interest_rate /100) /12# Calculate number of monthly payments num_payments = years *12# Guard against division by zero or negative valuesif monthly_rate <=0or principal <=0or num_payments <=0:return {"status": "error","error_message": "All inputs must be positive, and interest rate cannot be zero." }# Calculate monthly payment using the mortgage formulaif monthly_rate ==0: monthly_payment = principal / num_paymentselse: monthly_payment = principal * (monthly_rate * (1+ monthly_rate) ** num_payments) / ((1+ monthly_rate) ** num_payments -1)return {"status": "success","monthly_payment": round(monthly_payment, 2),"total_payments": round(monthly_payment * num_payments, 2),"total_interest": round((monthly_payment * num_payments) - principal, 2) }exceptExceptionas e:return {"status": "error","error_message": f"Failed to calculate mortgage payment: {str(e)}" }# Add this tool to an agentfrom google.adk.agents import Agentmortgage_advisor = Agent(name="mortgage_advisor",model="gemini-2.0-flash-exp",description="Helps calculate and explain mortgage payments.",instruction="You are a mortgage advisor that helps users understand their potential mortgage payments. When asked about payments, use the calculate_mortgage_payment tool.",tools=[calculate_mortgage_payment] # Simply include the function in the tools list)
Tool Context and State Management
For more advanced tools that need to access or modify the conversation state, ADK provides the ToolContext object. By adding this parameter to your function, you gain access to the session state and can influence the agent’s subsequent actions.
Accessing and Modifying State
Python
from google.adk.tools.tool_context import ToolContextdefupdate_user_preference(category: str, preference: str, tool_context: ToolContext) -> dict:"""Updates a user's preference for a specific category. Args: category: The category for which to set a preference (e.g., "theme", "notifications"). preference: The preference value to set. tool_context: Automatically provided by ADK, do not specify when calling. Returns: dict: Status of the preference update operation. """# Access current preferences or initialize if none exist user_prefs_key ="user:preferences"# Using user: prefix makes this persistent across sessions preferences = tool_context.state.get(user_prefs_key, {})# Update the preferences preferences[category] = preference# Save back to state tool_context.state[user_prefs_key] = preferencesprint(f"Tool: Updated user preference '{category}' to '{preference}'")return {"status": "success", "message": f"Your {category} preference has been set to {preference}" }
Controlling Agent Flow
The ToolContext also allows tools to influence the agent’s execution flow through the actions attribute:
Python
defescalate_to_support(issue_type: str, severity: int, tool_context: ToolContext) -> dict:"""Escalates an issue to a human support agent. Args: issue_type: The type of issue being escalated. severity: The severity level (1-5, where 5 is most severe). tool_context: Automatically provided by ADK. Returns: dict: Status of the escalation. """# Record the escalation details in state tool_context.state["escalation_details"] = {"issue_type": issue_type,"severity": severity,"timestamp": datetime.datetime.now().isoformat() }# For high severity issues, transfer to the support agentif severity >=4: tool_context.actions.transfer_to_agent ="human_support_agent"return {"status": "success","message": "This is a high-severity issue. Transferring you to a human support specialist." }# For medium severity, just note it but don't transferreturn {"status": "success","message": f"Your {issue_type} issue has been logged with severity {severity}." }
Handling Tool Results
When an agent uses a tool, it needs to interpret the results correctly. This is why returning structured data with clear status indicators is important. Here’s how to guide your agent to handle tool results:
Python
weather_agent = Agent(name="weather_assistant",model="gemini-2.0-flash-exp",instruction=""" You help users get weather information. When using the get_weather tool: 1. Check the "status" field of the result. 2. If status is "success", present the "report" information in a friendly way. 3. If status is "error", apologize and share the "error_message" with the user. 4. Always thank the user for their query. """,tools=[get_weather])
Built-in Tools and Integrations
ADK provides several built-in tools that you can use without having to implement them yourself:
Google Search
Python
from google.adk.tools import google_searchsearch_agent = Agent(name="research_assistant",model="gemini-2.0-flash-exp",instruction="You help users research topics. When asked, use the google_search tool to find up-to-date information.",tools=[google_search])
Code Execution
Python
from google.adk.tools import code_interpretercoding_assistant = Agent(name="coding_assistant",model="gemini-2.0-flash-exp",instruction="You help users with coding tasks. When appropriate, use the code_interpreter to execute Python code and demonstrate solutions.",tools=[code_interpreter])
Retrieval-Augmented Generation (RAG)
Python
from google.adk.tools import rag_tool# Configure RAG with your documentsmy_rag_tool = rag_tool.configure(document_store="your-document-source",embedding_model="your-embedding-model")documentation_assistant = Agent(name="docs_assistant",model="gemini-2.0-flash-exp",instruction="You help users find information in the company documentation. Use the RAG tool to retrieve relevant information.",tools=[my_rag_tool])
Third-Party Integrations
ADK supports integration with popular tools from other frameworks:
Creating effective tools is crucial for agent performance. Here are expanded best practices:
1. Function Naming and Signature
Verb-Noun Names: Use descriptive names that clearly indicate action (e.g., fetch_stock_price is better than get_stock or simply stocks).
Parameter Naming: Use clear, self-documenting parameter names (city is better than c).
Default Values: Avoid setting default values for parameters. The LLM should decide all parameter values based on context.
Type Consistency: Ensure parameters have consistent types throughout your application.
2. Error Handling and Result Structure
Comprehensive Error Handling: Catch all possible exceptions within your tool.
Informative Error Messages: Return error messages that help both the agent and user understand what went wrong.
Consistent Result Structure: Use a consistent pattern across all tools: python# Success case return {"status": "success", "data": result_data} # Error case return {"status": "error", "error_message": "Detailed explanation of what went wrong"}
3. Documentation and Clarity
Rich Docstrings: Include comprehensive documentation explaining the tool’s purpose, parameters, return values, and usage guidelines.
Usage Examples: Consider including examples in the docstring for complex tools.
Logging: Add logging statements within tools to aid debugging.
4. Tool Design Principles
Single Responsibility: Each tool should do one thing well.
Granularity Balance: Not too specific, not too general; find the right level of abstraction.
Idempotent When Possible: Tools should be safe to call multiple times when appropriate.
Input Validation: Validate inputs early to prevent cascading errors.
5. Performance Considerations
Asynchronous Operations: For time-consuming operations, consider using async functions.
Timeout Handling: Implement timeouts for external API calls.
Caching: Consider caching results for frequently used, unchanging data.
Example of a Well-Designed Tool
Python
defsearch_product_catalog( query: str, category: str=None, price_max: float=None, sort_by: str=None, tool_context: ToolContext =None) -> dict:"""Searches the product catalog for items matching the query and filters. Use this tool to find products in our inventory based on customer requests. Args: query: The search term entered by the customer (required). category: Optional category to filter results (e.g., "electronics", "clothing"). price_max: Optional maximum price filter. sort_by: Optional sorting method ("price_low", "price_high", "popularity", "rating"). tool_context: Automatically provided by ADK. Returns: dict: A dictionary containing: - "status": "success" or "error" - If success: "products" list of matching products (up to 5 items) - If error: "error_message" explaining what went wrong Example success: {"status": "success", "products": [{"name": "42-inch TV", "price": 299.99, ...}, ...]} Example error: {"status": "error", "error_message": "No products found matching 'flying car'"} """try:# Log the tool execution for debuggingprint(f"Tool: search_product_catalog called with query='{query}', category='{category}', price_max={price_max}")# Track the search in user history if tool_context is availableif tool_context: search_history = tool_context.state.get("user:search_history", []) search_history.append({"query": query,"timestamp": datetime.datetime.now().isoformat() })# Keep only last 10 searchesiflen(search_history) >10: search_history = search_history[-10:] tool_context.state["user:search_history"] = search_history# ... actual catalog search implementation ...# (For demo, we'll return mock data) mock_products = [ {"name": "42-inch Smart TV", "price": 299.99, "category": "electronics", "rating": 4.5}, {"name": "Wireless Headphones", "price": 89.99, "category": "electronics", "rating": 4.2}, ]# Apply filters if provided filtered_products = mock_productsif category: filtered_products = [p for p in filtered_products if p["category"].lower() == category.lower()]if price_max: filtered_products = [p for p in filtered_products if p["price"] <= price_max]# Apply sorting if requestedif sort_by =="price_low": filtered_products =sorted(filtered_products, key=lambda p: p["price"])elif sort_by =="price_high": filtered_products =sorted(filtered_products, key=lambda p: p["price"], reverse=True)elif sort_by =="rating": filtered_products =sorted(filtered_products, key=lambda p: p["rating"], reverse=True)# Return formatted responseif filtered_products:return {"status": "success","products": filtered_products[:5], # Limit to 5 results"total_matches": len(filtered_products) }else:return {"status": "error","error_message": f"No products found matching '{query}' with the specified filters." }exceptExceptionas e:print(f"Tool Error: search_product_catalog failed: {str(e)}")return {"status": "error","error_message": f"Failed to search catalog: {str(e)}" }
Tools are the primary way to extend your agents’ capabilities beyond just language generation. You can now create agents that interact effectively with the world and provide genuinely useful services to users.
State and Memory: Creating Context-Aware Agents
In ADK, “state” refers to the persistent data associated with a conversation that allows agents to remember information across multiple interactions. Unlike the conversation history (which records the sequence of messages), state is a structured key-value store that agents can read from and write to, enabling them to track user preferences, remember previous decisions, maintain contextual information, and build personalized experiences.
The Role of Session State
Session state serves several critical functions in agent applications:
Contextual Memory: Allows agents to remember information from earlier in the conversation
Preference Storage: Maintains user preferences across interactions
Workflow Tracking: Keeps track of where users are in multi-step processes
Data Persistence: Stores data that needs to be accessible between different agents or across multiple turns
Configuration Management: Maintains settings that affect agent behavior
State Structure and Scope
ADK’s state management system is designed with different scopes to address various persistence needs:
Plaintext
session.state = { # Session-specific state (default scope) "last_query": "What's the weather in London?", "current_step": 3, # User-specific state (persists across sessions) "user:preferred_temperature_unit": "Celsius", "user:name": "Alex", # Application-wide state (shared across all users) "app:version": "1.2.3", "app:maintenance_mode": False, # Temporary state (not persisted beyond current execution) "temp:calculation_result": 42}
The prefixes determine the scope:
No prefix: Session-specific, persists only for the current session
user:: User-specific, persists across all sessions for a particular user
app:: Application-wide, shared across all users and sessions
temp:: Temporary, exists only during the current execution cycle
Implementing Memory with State Management
Let’s explore how to implement memory capabilities using session state:
Basic State Access
The most straightforward way to access state is through the session object:
Python
# Getting a sessionfrom google.adk.sessions import InMemorySessionServicesession_service = InMemorySessionService()APP_NAME="my_application"USER_ID="user_123"SESSION_ID="session_456"# Create or retrieve a sessionsession = session_service.create_session(app_name=APP_NAME,user_id=USER_ID,session_id=SESSION_ID)# Reading from statelast_city = session.state.get("last_city", "New York") # Default if key doesn't exist# Writing to statesession.state["last_city"] ="London"
However, in real agent applications, you’ll often access state through more integrated methods.
Accessing State in Tools
Tools can access and modify state through the ToolContext parameter:
Python
from google.adk.tools.tool_context import ToolContextdefremember_favorite_city(city: str, tool_context: ToolContext) -> dict:"""Remembers the user's favorite city. Args: city: The city to remember as favorite. tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """# Store at user scope so it persists across sessions tool_context.state["user:favorite_city"] = city# Also store when this preference was set tool_context.state["user:favorite_city_set_at"] = datetime.datetime.now().isoformat()return {"status": "success","message": f"I've remembered that your favorite city is {city}." }
Using output_key for Automatic State Updates
The output_key parameter of Agent provides a convenient way to automatically save an agent’s response to state:
Python
weather_reporter = Agent(name="weather_reporter",model="gemini-2.0-flash-exp",instruction="You provide weather reports for cities. Be concise but informative.",tools=[get_weather],output_key="last_weather_report"# Automatically saves response to this state key)
When the agent responds, its final text output will be stored in session.state["last_weather_report"] automatically.
State in Agent Instructions
To make agents state-aware, include instructions on how to use state:
Python
personalized_agent = Agent(name="personalized_assistant",model="gemini-2.0-flash-exp",instruction=""" You are a personalized assistant. CHECK THESE STATE VALUES AT THE START OF EACH INTERACTION: - If state["user:name"] exists, greet the user by name. - If state["user:favorite_city"] exists, personalize weather or travel recommendations. - If state["current_workflow"] exists, continue that workflow where you left off. MAINTAIN THESE STATE VALUES: - When the user mentions their name, use the remember_name tool to store it. - When discussing a city positively, use the remember_favorite_city tool. - When starting a multi-step workflow, set state["current_workflow"] and state["current_step"]. """)
Persisting Information Across Conversation Turns
To create truly context-aware agents, you need to implement patterns that effectively use state across conversation turns.
Pattern 1: Preference Tracking
This pattern stores user preferences discovered through conversation:
Python
defset_preference(category: str, value: str, tool_context: ToolContext) -> dict:"""Stores a user preference. Args: category: The preference category (e.g., "language", "theme"). value: The preference value. tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """ preferences = tool_context.state.get("user:preferences", {}) preferences[category] = value tool_context.state["user:preferences"] = preferencesreturn {"status": "success", "message": f"Preference set: {category} = {value}"}defget_preferences(tool_context: ToolContext) -> dict:"""Retrieves all user preferences. Args: tool_context: Automatically provided by ADK. Returns: dict: The user's stored preferences. """ preferences = tool_context.state.get("user:preferences", {})return {"status": "success", "preferences": preferences}preference_agent = Agent(name="preference_aware_agent",model="gemini-2.0-flash-exp",instruction=""" You help users and remember their preferences. At the start of each conversation: 1. Use the get_preferences tool to check stored preferences. 2. Adapt your responses based on these preferences. During conversations: 1. When a user expresses a preference, use set_preference to store it. 2. Acknowledge when you've saved a preference. Examples of preferences to track: - Language preferences - Communication style (brief/detailed) - Topic interests """,tools=[set_preference, get_preferences])
Pattern 2: Workflow State Tracking
This pattern manages progress through multi-step processes:
Python
defstart_workflow(workflow_name: str, tool_context: ToolContext) -> dict:"""Starts a new workflow and tracks it in state. Args: workflow_name: The name of the workflow to start. tool_context: Automatically provided by ADK. Returns: dict: Status and the initial workflow state. """ workflow = {"name": workflow_name,"current_step": 1,"started_at": datetime.datetime.now().isoformat(),"data": {} } tool_context.state["current_workflow"] = workflowreturn {"status": "success", "workflow": workflow}defupdate_workflow_step(step: int, data: dict, tool_context: ToolContext) -> dict:"""Updates the current workflow step and associated data. Args: step: The new step number. data: Data to associate with this step. tool_context: Automatically provided by ADK. Returns: dict: Status and the updated workflow state. """ workflow = tool_context.state.get("current_workflow", {})ifnot workflow:return {"status": "error", "message": "No active workflow found."} workflow["current_step"] = step workflow["last_updated"] = datetime.datetime.now().isoformat() workflow["data"].update(data) tool_context.state["current_workflow"] = workflowreturn {"status": "success", "workflow": workflow}workflow_agent = Agent(name="workflow_agent",model="gemini-2.0-flash-exp",instruction=""" You guide users through structured workflows. At the start of each interaction: 1. Check if state["current_workflow"] exists. 2. If it exists, continue from the current_step. 3. If not, determine if the user wants to start a workflow. Available workflows: - "account_setup": A 3-step process to set up a new account - "support_request": A 4-step process to file a support ticket Use start_workflow and update_workflow_step to track progress. """,tools=[start_workflow, update_workflow_step])
Pattern 3: Conversation History Summarization
This pattern maintains condensed summaries of conversation context:
Python
defupdate_conversation_summary(new_insight: str, tool_context: ToolContext) -> dict:"""Updates the running summary of the conversation with a new insight. Args: new_insight: New information to add to the summary. tool_context: Automatically provided by ADK. Returns: dict: Status and the updated summary. """ summary = tool_context.state.get("conversation_summary", "")if summary: summary +="\n- "+ new_insightelse: summary ="Conversation Summary:\n- "+ new_insight tool_context.state["conversation_summary"] = summaryreturn {"status": "success", "summary": summary}summarizing_agent = Agent(name="summarizing_agent",model="gemini-2.0-flash-exp",instruction=""" You help users while maintaining a summary of key points. At the start of each interaction: 1. Check state["conversation_summary"] to recall context. During conversations: 1. When you learn important information (preferences, goals, constraints), use update_conversation_summary to store it. 2. Focus on facts and insights, not general chat. Keep your internal summary up-to-date to provide consistent, contextual help. """,tools=[update_conversation_summary])
Personalizing Responses with State
By effectively using state, you can create deeply personalized agent experiences. Here’s an example of a comprehensive personalization approach:
Python
from google.adk.agents import Agent, SequentialAgentfrom google.adk.tools.tool_context import ToolContext# --- Tools for personalization ---defget_user_profile(tool_context: ToolContext) -> dict:"""Retrieves the user's stored profile information. Args: tool_context: Automatically provided by ADK. Returns: dict: The user's profile data. """ profile = tool_context.state.get("user:profile", {})return {"status": "success","profile": profile,"is_returning_user": bool(profile) }defupdate_user_profile(field: str, value: str, tool_context: ToolContext) -> dict:"""Updates a specific field in the user's profile. Args: field: The profile field to update (e.g., "name", "occupation"). value: The value to store. tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """ profile = tool_context.state.get("user:profile", {}) profile[field] = value tool_context.state["user:profile"] = profilereturn {"status": "success", "field": field, "value": value}deflog_user_interest(topic: str, score: float, tool_context: ToolContext) -> dict:"""Records a user's interest in a topic with a relevance score. Args: topic: The topic of interest. score: Relevance score (0.0-1.0, higher means more interested). tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """ interests = tool_context.state.get("user:interests", {}) interests[topic] =max(interests.get(topic, 0), score) # Take highest score tool_context.state["user:interests"] = interestsreturn {"status": "success", "topic": topic, "score": score}defget_personalization_strategy(tool_context: ToolContext) -> dict:"""Analyzes user data and returns a personalization strategy. Args: tool_context: Automatically provided by ADK. Returns: dict: Personalization recommendations based on user data. """ profile = tool_context.state.get("user:profile", {}) interests = tool_context.state.get("user:interests", {}) interaction_count = tool_context.state.get("user:interaction_count", 0)# Increment interaction count tool_context.state["user:interaction_count"] = interaction_count +1# Determine name usage style name_style ="formal"if interaction_count >5and"name"in profile: name_style ="casual"# Identify top interests top_interests =sorted( [(topic, score) for topic, score in interests.items()], key=lambda x: x[1], reverse=True )[:3]return {"status": "success","strategy": {"name_usage": {"style": name_style,"name": profile.get("name", ""),"use_name": "name"in profile },"experience_level": "new"if interaction_count <3else"returning","top_interests": top_interests,"verbosity": profile.get("preferred_verbosity", "balanced") } }# --- Creating a personalized agent ---personalization_agent = Agent(name="profile_manager",model="gemini-2.0-flash-exp",instruction=""" You manage user profile information and personalization strategy. Your job is to extract and store relevant user information, then provide personalization guidance to other agents. YOU MUST: 1. Use get_user_profile at the start of conversation to check existing data. 2. During conversation, identify personal details and preferences. 3. Use update_user_profile to store name, age, occupation, etc. 4. Use log_user_interest when the user shows interest in topics. 5. Use get_personalization_strategy to generate guidance for personalization. Do not explicitly tell the user you are storing this information. """,tools=[get_user_profile, update_user_profile, log_user_interest, get_personalization_strategy],output_key="personalization_strategy")response_agent = Agent(name="personalized_responder",model="gemini-2.0-flash-exp",instruction=""" You provide personalized responses based on the personalization strategy. At the beginning of each interaction: 1. Check state["personalization_strategy"] for guidance on personalization. 2. Adapt your tone, detail level, and content based on this strategy. Personalization Elements: 1. If strategy says to use name, address the user by name per the specified style. 2. Adapt verbosity based on preference. 3. Reference top interests when relevant. 4. Provide more explanation for new users, be more direct with returning users. Always keep your personalization subtle and natural, never explicit. """,)# Combine as a sequential workflowpersonalized_assistant = SequentialAgent(name="personalized_assistant",sub_agents=[personalization_agent, response_agent])
This approach uses multiple state-related techniques:
Profile Storage: Maintains persistent user information
Interest Tracking: Records and scores user interests
Interaction Counting: Tracks user familiarity with the system
Personalization Strategy: Generates a comprehensive approach to personalization
Sequential Agent Pattern: First agent focuses on updating state, second agent uses it for personalization
Advanced State Management
For production applications, you’ll likely need more sophisticated state management approaches.
Custom Session Services
The InMemorySessionService is suitable for development, but for production, you’ll want persistent storage. Create a custom session service by extending the SessionService abstract class:
Python
from google.adk.sessions import SessionService, Sessionfrom typing import Optional, Dict, Anyimport firebase_adminfrom firebase_admin import firestoreclassFirestoreSessionService(SessionService):"""A session service that persists state in Firestore."""def__init__(self, collection_name: str="adk_sessions"):"""Initialize with a Firestore collection name."""self.collection_name = collection_nameifnot firebase_admin._apps: firebase_admin.initialize_app()self.db = firestore.client()defcreate_session( self, app_name: str, user_id: str, session_id: str, state: Optional[Dict[str, Any]] =None ) -> Session:"""Create a new session or get existing session.""" session_ref =self._get_session_ref(app_name, user_id, session_id) doc = session_ref.get()if doc.exists:# Session exists, retrieve it session_data = doc.to_dict()return Session(app_name=app_name,user_id=user_id,session_id=session_id,state=session_data.get("state", {}),last_update_time=session_data.get("last_update_time", 0) )else:# Create new session session = Session(app_name=app_name,user_id=user_id,session_id=session_id,state=state or {} )self._save_session(session)return sessiondefget_session( self, app_name: str, user_id: str, session_id: str ) -> Optional[Session]:"""Get an existing session.""" session_ref =self._get_session_ref(app_name, user_id, session_id) doc = session_ref.get()ifnot doc.exists:returnNone session_data = doc.to_dict()return Session(app_name=app_name,user_id=user_id,session_id=session_id,state=session_data.get("state", {}),last_update_time=session_data.get("last_update_time", 0) )defupdate_session(self, session: Session) -> None:"""Update a session in the database."""self._save_session(session)def_get_session_ref(self, app_name: str, user_id: str, session_id: str):"""Get a reference to the session document."""returnself.db.collection(self.collection_name).document(f"{app_name}_{user_id}_{session_id}" )def_save_session(self, session: Session) -> None:"""Save a session to Firestore.""" session_ref =self._get_session_ref( session.app_name, session.user_id, session.session_id ) session_ref.set({"state": session.state,"last_update_time": session.last_update_time })
By implementing state management, you can now create agents with memory, context awareness, and personalization capabilities that significantly enhance the user experience.
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.
Building Multi-Agent Systems
Multi-agent systems (MAS) in ADK are typically organized in hierarchical structures, where agents can have parent-child relationships. This hierarchical organization provides a clear framework for delegation, specialization, and coordination among agents.
Creating an Agent Hierarchy
The foundation of agent hierarchies in ADK is the sub_agents parameter. When you create an agent, you can specify other agents as its sub-agents:
Python
from google.adk.agents import Agent# Create specialized sub-agentsweather_specialist = Agent(name="weather_specialist",model="gemini-2.0-flash-exp",description="Provides detailed weather information for any location.",instruction="You are a weather specialist. Provide accurate, detailed weather information when asked.",tools=[get_weather] # Assume get_weather is defined)restaurant_specialist = Agent(name="restaurant_specialist",model="gemini-2.0-flash-exp",description="Recommends restaurants based on location, cuisine, and preferences.",instruction="You are a restaurant specialist. Recommend restaurants based on user preferences.",tools=[find_restaurants] # Assume find_restaurants is defined)# Create a parent agent with sub-agentscoordinator = Agent(name="travel_assistant",model="gemini-2.0-flash-exp",description="Helps plan trips and activities.",instruction=""" You are a travel assistant that helps users plan trips and activities. You have two specialized sub-agents: - weather_specialist: For weather-related questions - restaurant_specialist: For restaurant recommendations When a user asks about weather, delegate to the weather_specialist. When a user asks about restaurants or food, delegate to the restaurant_specialist. For general travel questions, handle them yourself. """,sub_agents=[weather_specialist, restaurant_specialist])
In this example, coordinator is the parent agent, and weather_specialist and restaurant_specialist are its sub-agents. ADK automatically establishes the parent-child relationship by setting the parent_agent attribute on each sub-agent.
Understanding the Hierarchy Rules
The agent hierarchy in ADK follows several important rules:
Single Parent Rule: An agent can have only one parent. If you try to add an agent as a sub-agent to multiple parents, ADK will raise an error.
Name Uniqueness: Each agent in the hierarchy must have a unique name. This is crucial because delegation and finding agents rely on these names.
Hierarchical Navigation: You can navigate the hierarchy programmatically:
agent.parent_agent: Access an agent’s parent
agent.sub_agents: Access an agent’s children
root_agent.find_agent(name): Find any agent in the hierarchy by name
Scope of Control: The hierarchy defines the scope for potential agent transfers. By default, an agent can transfer control to its parent, its siblings (other sub-agents of its parent), or its own sub-agents.
Agent-to-Agent Delegation and Communication
The power of multi-agent systems comes from the ability of agents to collaborate and delegate tasks to each other. ADK provides several mechanisms for agent-to-agent communication and delegation.
LLM-Driven Delegation (Auto-Flow)
The most flexible approach is LLM-driven delegation, where the agent’s language model decides when to transfer control to another agent based on its understanding of the query and the available agents’ capabilities:
Python
# LLM-driven delegation relies on clear agent descriptionscustomer_service = Agent(name="customer_service",model="gemini-2.0-flash-exp",description="Handles general customer inquiries and routes to specialists.",instruction=""" You are the main customer service agent. Analyze each customer query and determine the best way to handle it: - For billing questions, transfer to the billing_specialist - For technical issues, transfer to the tech_support - For product questions, handle yourself Make your delegation decisions based on the query content. """,sub_agents=[ Agent(name="billing_specialist",model="gemini-2.0-flash-exp",description="Handles all billing, payment, and invoice inquiries." ), Agent(name="tech_support",model="gemini-2.0-flash-exp",description="Resolves technical issues and troubleshooting problems." ) ])
When a user sends a message like “I have a problem with my last bill,” the LLM in customer_service recognizes this as a billing question and automatically generates a transfer request to the billing_specialist agent. This is handled through ADK’s Auto-Flow mechanism, which is enabled by default when sub-agents are present.
The key elements for successful LLM-driven delegation are:
Clear, distinctive descriptions for each agent
Explicit instructions to the parent agent about when to delegate
Appropriate model capabilities in the parent agent to understand and classify queries
Explicit Agent Invocation with AgentTool
For more controlled delegation, you can wrap an agent as a tool and explicitly invoke it from another agent:
Python
from google.adk.agents import Agentfrom google.adk.tools import AgentTool# Create a specialized agentcalculator_agent = Agent(name="calculator",model="gemini-2.0-flash-exp",description="Performs complex mathematical calculations.",instruction="You perform mathematical calculations with precision.")# Wrap it as a toolcalculator_tool = AgentTool(agent=calculator_agent,description="Use this tool to perform complex calculations.")# Create a parent agent that uses the agent toolmath_tutor = Agent(name="math_tutor",model="gemini-2.0-flash-exp",description="Helps students learn mathematics.",instruction=""" You are a math tutor helping students learn. When a student asks a question requiring complex calculations: 1. Explain the mathematical concept 2. Use the calculator tool to compute the result 3. Explain the significance of the result """,tools=[calculator_tool])
With this approach:
The parent agent (math_tutor) decides when to use the calculator tool based on its instructions
When invoked, the tool executes the wrapped agent (calculator_agent)
The result is returned to the parent agent, which can then incorporate it into its response
State changes made by the sub-agent are preserved in the shared session
This approach gives you more explicit control over when and how sub-agents are invoked.
Using Shared Session State for Communication
Agents can also communicate through shared session state:
Python
from google.adk.agents import Agent, SequentialAgent# First agent gathers information and stores it in stateinformation_gatherer = Agent(name="information_gatherer",model="gemini-2.0-flash-exp",instruction="Gather travel information from the user and store it in state.",tools=[# Tool to save travel details to state save_travel_details # Assume this is defined and writes to state ],output_key="information_gathering_complete"# Saves final response to state)# Second agent uses information from staterecommendation_generator = Agent(name="recommendation_generator",model="gemini-2.0-flash-exp",instruction=""" Generate travel recommendations based on information in state. Look for: - destination in state["travel_destination"] - dates in state["travel_dates"] - preferences in state["travel_preferences"] """,tools=[# Tool to retrieve recommendations based on state information get_recommendations # Assume this is defined and reads from state ])# Sequential agent ensures these run in ordertravel_planner = SequentialAgent(name="travel_planner",sub_agents=[information_gatherer, recommendation_generator])
In this example:
information_gatherer collects information and stores it in the session state
recommendation_generator reads this information from state and uses it to generate recommendations
The SequentialAgent ensures they run in the correct order
This pattern is particularly useful for workflows where information needs to be collected, processed, and then used by subsequent agents.
Workflow Patterns: Sequential, Parallel, Loop
ADK provides specialized workflow agents that orchestrate the execution of sub-agents according to different patterns.
Sequential Workflow
The SequentialAgent executes its sub-agents one after another in a defined order:
data_validator runs first and validates the input data
data_transformer runs next, potentially using the validation result
data_analyzer analyzes the transformed data
report_generator creates a final report based on the analysis
Each agent’s output can be saved to state (using output_key) for the next agent to use. The same InvocationContext is passed sequentially from one agent to the next, ensuring state changes persist throughout the workflow.
Parallel Workflow
The ParallelAgent executes its sub-agents concurrently, which can improve efficiency for independent tasks:
In this example, all three fetchers run concurrently. Each operates in its own branch of the invocation context (ParentBranch.ChildName), but they share the same session state. This means they can all write to state without conflicts (as long as they use different keys).
Parallel execution is particularly useful for:
Reducing total processing time for independent tasks
Gathering information from different sources simultaneously
Implementing competing approaches to the same problem
Loop Workflow
The LoopAgent repeatedly executes its sub-agents until a condition is met:
Python
from google.adk.agents import LoopAgent, Agent, BaseAgentfrom google.adk.agents.invocation_context import InvocationContextfrom google.adk.events import Event, EventActionsfrom typing import AsyncGenerator# Custom agent that checks if the loop should continueclassConditionChecker(BaseAgent): name: str="condition_checker"asyncdef_run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:# Check if the condition for stopping the loop is met completed = context.session.state.get("task_completed", False) max_iterations = context.session.state.get("max_iterations", 5) current_iteration = context.session.state.get("current_iteration", 0)# Increment iteration counter context.session.state["current_iteration"] = current_iteration +1# If task is completed or max iterations reached, escalate to stop the loopif completed or current_iteration >= max_iterations:yield Event(author=self.name,actions=EventActions(escalate=True) # This signals loop termination )else:yield Event(author=self.name,content=None# No content needed, just continuing the loop )# Create task processor agenttask_processor = Agent(name="task_processor",model="gemini-2.0-flash-exp",instruction=""" Process the current task step. Check state["current_iteration"] to see which step you're on. When the task is complete, set state["task_completed"] = True. """,tools=[# Tool to process the current step process_step, # Assume this is defined# Tool to mark the task as completed mark_completed # Assume this is defined ])# Create loop agent that combines processing and condition checkingiterative_processor = LoopAgent(name="iterative_processor",sub_agents=[ task_processor, ConditionChecker() ],max_iterations=10# Optional backup limit)
In this example:
iterative_processor repeatedly executes its sub-agents
Each iteration runs task_processor followed by ConditionChecker
The loop continues until ConditionChecker escalates (when the task is completed or max iterations reached)
State is maintained across iterations, allowing tracking of progress
Loop agents are ideal for:
Incremental processing of large datasets
Implementing retry logic with backoff
Iterative refinement of results
Multi-step workflows where the number of steps isn’t known in advance
Designing Effective Agent Teams
Creating effective multi-agent systems requires thoughtful design. Here are key principles and patterns for building successful agent teams:
Principle 1: Clear Agent Specialization
Each agent in the system should have a clearly defined area of expertise:
Python
# Financial advisory team with clear specializationsmortgage_specialist = Agent(name="mortgage_specialist",description="Expert on mortgage products, rates, and qualification requirements.",# Other parameters...)investment_specialist = Agent(name="investment_specialist",description="Expert on investment strategies, market trends, and portfolio management.",# Other parameters...)tax_specialist = Agent(name="tax_specialist",description="Expert on tax planning, deductions, and regulatory compliance.",# Other parameters...)
The specializations should be:
Non-overlapping to avoid confusion in delegation decisions
Comprehensive to cover all expected user queries
Clearly communicated in agent descriptions and instructions
Principle 2: Effective Coordination Strategies
There are multiple strategies for coordinating agents. Choose the approach that best fits your application’s needs:
Centralized Coordination (Hub and Spoke)
Python
# Hub agent coordinates specialistsfinancial_advisor = Agent(name="financial_advisor",description="Coordinates financial advice across multiple domains.",instruction=""" You are the main financial advisor. For mortgage questions, delegate to mortgage_specialist. For investment questions, delegate to investment_specialist. For tax questions, delegate to tax_specialist. Only handle general financial questions yourself. """,sub_agents=[mortgage_specialist, investment_specialist, tax_specialist])
Develop a clear strategy for how agents share information through state:
Python
# First agent gathers informationdata_collector = Agent(name="data_collector",instruction=""" Collect information from the user. Store each piece in the appropriate state key: - Personal details in state["user_details"] - Goals in state["financial_goals"] - Current situation in state["current_situation"] """,tools=[save_to_state], # Assume this tool saves data to specific state keysoutput_key="collection_complete")# Specialist agents use collected informationretirement_planner = Agent(name="retirement_planner",instruction=""" Create a retirement plan based on information in state. Use state["user_details"] for age and income information. Use state["financial_goals"] for retirement targets. Store your plan in state["retirement_plan"]. """,tools=[create_retirement_plan], # Assume this tool creates and saves a planoutput_key="retirement_planning_complete")
Consider:
Which state keys each agent will read from and write to
How to structure state data for easy access by multiple agents
Whether to use scoped state (session, user, app) based on persistence needs
Principle 4: Error Handling and Fallbacks
Design your agent team to handle failures gracefully:
Python
from google.adk.agents import Agent, SequentialAgentfrom google.adk.tools.tool_context import ToolContext# Tool to check if the previous agent encountered an errordefcheck_previous_result(tool_context: ToolContext) -> dict:"""Checks if the previous agent step was successful. Returns: dict: Status and whether a fallback is needed. """ error_detected = tool_context.state.get("error_detected", False)return {"status": "success","fallback_needed": error_detected,"error_details": tool_context.state.get("error_details", "Unknown error") }# Tool to handle error recoverydefrecover_from_error(error_details: str, tool_context: ToolContext) -> dict:"""Attempts to recover from an error. Args: error_details: Details about the error that occurred. Returns: dict: Status of recovery attempt. """# Record the recovery attempt tool_context.state["recovery_attempted"] =True# Clear the error flag tool_context.state["error_detected"] =Falsereturn {"status": "success","message": f"Recovered from error: {error_details}" }# Primary agent that might encounter errorsprimary_handler = Agent(name="primary_handler",model="gemini-2.0-flash-exp",instruction=""" You handle the primary task. If you encounter an error, set state["error_detected"] = True and state["error_details"] = "description of error". """,tools=[process_task, set_error_state] # Assume these are defined)# Fallback agent for error recoveryfallback_handler = Agent(name="fallback_handler",model="gemini-2.0-flash-exp",instruction=""" You handle error recovery when the primary agent fails. First, use check_previous_result to see if you need to act. If fallback is needed, use recover_from_error to attempt recovery. Provide a simplified but functional response to the user. """,tools=[check_previous_result, recover_from_error])# Combine with sequential flowrobust_handler = SequentialAgent(name="robust_handler",sub_agents=[primary_handler, fallback_handler])
This pattern ensures that even if the primary agent encounters an error, the fallback agent can provide a degraded but functional response.
Principle 5: Monitoring and Debugging
Design your agent team with observability in mind:
Python
from google.adk.tools.tool_context import ToolContextdeflog_agent_action(action: str, details: str, tool_context: ToolContext) -> dict:"""Logs an agent action to the trace log in state. Args: action: The type of action being logged. details: Details about the action. Returns: dict: Status of the logging operation. """# Get existing log or initialize new one trace_log = tool_context.state.get("agent_trace_log", [])# Add new entry with timestampimport time trace_log.append({"timestamp": time.time(),"agent": tool_context.agent_name,"action": action,"details": details })# Update state with new log tool_context.state["agent_trace_log"] = trace_logreturn {"status": "success" }# Add this tool to all agents in your system for comprehensive tracing
By following these principles and patterns, you can design effective agent teams that leverage specialization, coordination, shared state, and robust error handling to deliver complex capabilities.
In the next section, we’ll explore advanced features of ADK, including callbacks for implementing safety guardrails and other sophisticated control mechanisms.
Advanced Features and Patterns
Implementing Safety Guardrails with Callbacks
Callbacks are powerful hooks that allow you to intercept and potentially modify agent behavior at key points in the execution flow. They’re particularly valuable for implementing safety guardrails, logging, monitoring, and custom business logic.
ADK provides several callback points, but two of the most important are:
before_model_callback: Executes just before sending a request to the LLM
before_tool_callback: Executes just before a tool is called
Input Validation with before_model_callback
The before_model_callback lets you inspect and potentially block user inputs before they reach the language model:
Python
from google.adk.agents.callback_context import CallbackContextfrom google.adk.models.llm_request import LlmRequestfrom google.adk.models.llm_response import LlmResponsefrom google.genai import typesfrom typing import Optionalimport redefprofanity_filter( callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:""" Checks user input for profanity and blocks requests containing prohibited language. Args: callback_context: Provides context about the agent and session llm_request: The request about to be sent to the LLM Returns: LlmResponse if the request should be blocked, None if it should proceed """# Simple profanity detection (in a real system, use a more sophisticated approach) prohibited_terms = ["badword1", "badword2", "badword3"]# Extract the last user message last_user_message =""if llm_request.contents:for content inreversed(llm_request.contents):if content.role =='user'and content.parts:if content.parts[0].text: last_user_message = content.parts[0].textbreak# Check for prohibited terms contains_profanity =any(term in last_user_message.lower() for term in prohibited_terms)if contains_profanity:# Log the blocking actionprint(f"Profanity filter blocked message: '{last_user_message[:20]}...'")# Record the event in state callback_context.state["profanity_filter_triggered"] =True# Return a response that will be sent instead of calling the LLMreturn LlmResponse(content=types.Content(role="model",parts=[types.Part(text="I'm sorry, but I cannot respond to messages containing inappropriate language. Please rephrase your request without using prohibited terms.")] ) )# If no profanity detected, return None to allow the request to proceedreturnNone# Add the callback to an agentsafe_agent = Agent(name="safe_agent",model="gemini-2.0-flash-exp",instruction="You are a helpful assistant.",before_model_callback=profanity_filter)
This example implements a simple profanity filter that:
Extracts the most recent user message from the LLM request
Checks it against a list of prohibited terms
If prohibited terms are found, blocks the LLM call and returns a predefined response
Otherwise, allows the request to proceed to the LLM
You can extend this pattern to implement more sophisticated content moderation, sensitive information detection, or other input validation rules.
Tool Usage Control with before_tool_callback
The before_tool_callback allows you to validate tool arguments, restrict certain operations, or modify how tools are used:
Python
from google.adk.tools.base_tool import BaseToolfrom google.adk.tools.tool_context import ToolContextfrom typing import Optional, Dict, Anydefrestricted_city_guardrail( tool: BaseTool, args: Dict[str, Any], tool_context: ToolContext) -> Optional[Dict]:""" Prevents the get_weather tool from being called for restricted cities. Args: tool: Information about the tool being called args: The arguments passed to the tool tool_context: Access to session state and other context Returns: Dict if the tool call should be blocked, None if it should proceed """# Check if this is the get_weather toolif tool.name =="get_weather"and"city"in args: city = args["city"].lower()# List of restricted cities (example - could be loaded dynamically) restricted_cities = ["restricted_city_1", "restricted_city_2"]if city in restricted_cities:# Log the blocking actionprint(f"Blocked get_weather call for restricted city: {city}")# Record the event in state tool_context.state["restricted_city_blocked"] = city# Return a response that will be used instead of calling the toolreturn {"status": "error","error_message": f"Sorry, weather information for {city} is not available due to policy restrictions." }# For other tools or non-restricted cities, allow the call to proceedreturnNone# Add the callback to an agentrestricted_agent = Agent(name="restricted_agent",model="gemini-2.0-flash-exp",instruction="You provide weather information using the get_weather tool.",tools=[get_weather], # Assume get_weather is definedbefore_tool_callback=restricted_city_guardrail)
This example implements a city restriction guardrail that:
Checks if the get_weather tool is being called
Inspects the city argument against a list of restricted cities
If the city is restricted, blocks the tool call and returns a predefined error response
Otherwise, allows the tool call to proceed
You can use this pattern to implement various business rules, usage limits, or user-based access controls for your tools.
Combining Multiple Callbacks
For comprehensive safety and control, you can use multiple callbacks together:
Python
# Agent with multiple safety measurescomprehensive_agent = Agent(name="comprehensive_agent",model="gemini-2.0-flash-exp",instruction="You help users with various tasks safely and responsibly.",tools=[get_weather, search_web, send_email], # Assume these are definedbefore_model_callback=content_safety_filter, # Filter unsafe user inputafter_model_callback=output_sanitizer, # Clean up model responsesbefore_tool_callback=tool_usage_validator, # Validate tool usageafter_tool_callback=tool_result_logger # Log tool results)
Each callback serves a specific purpose in the safety and monitoring pipeline:
before_model_callback: Prevents unsafe inputs from reaching the LLM
after_model_callback: Ensures model outputs meet safety and quality standards
before_tool_callback: Controls how and when tools can be used
after_tool_callback: Monitors and logs tool results for auditing
Building Evaluation Frameworks
Robust evaluation is essential for developing reliable agent systems. ADK provides built-in mechanisms for evaluating agent performance.
Creating Test Cases
Start by defining test cases that cover the range of interactions your agent should handle:
Python
# Define test cases in a structured formattest_cases = [ {"name": "Basic weather query","input": "What's the weather in New York?","expected_tool_calls": ["get_weather"],"expected_tool_args": {"city": "New York"},"expected_response_contains": ["weather", "New York"] }, {"name": "Ambiguous city query","input": "How's the weather in Springfield?","expected_tool_calls": ["clarify_city"],"expected_response_contains": ["multiple cities", "which Springfield"] }, {"name": "City not supported","input": "What's the weather in Atlantis?","expected_tool_calls": ["get_weather"],"expected_tool_args": {"city": "Atlantis"},"expected_response_contains": ["don't have information", "Atlantis"] }]
Using the AgentEvaluator
ADK provides an AgentEvaluator class to run test cases against your agent:
Python
from google.adk.evaluation import AgentEvaluator# Create the evaluatorevaluator = AgentEvaluator(agent=weather_agent)# Run evaluationevaluation_results = evaluator.evaluate(test_cases=test_cases)# Print resultsfor result in evaluation_results:print(f"Test: {result.test_case['name']}")print(f" Status: {'PASS'if result.success else'FAIL'}")print(f" Feedback: {result.feedback}")ifnot result.success:print(f" Expected: {result.expected}")print(f" Actual: {result.actual}")print()# Calculate overall metricssuccess_rate =sum(1for r in evaluation_results if r.success) /len(evaluation_results)print(f"Overall success rate: {success_rate:.2%}")
Custom Evaluation Metrics
For more specialized evaluation needs, you can implement custom metrics:
Python
defevaluate_response_correctness(test_case, agent_response, tool_calls):"""Evaluates the correctness of the agent's response for weather queries."""# Exact city match checkerif"expected_tool_args"in test_case and"city"in test_case["expected_tool_args"]: expected_city = test_case["expected_tool_args"]["city"]# Find the actual city used in tool calls actual_city =Nonefor call in tool_calls:if call["name"] =="get_weather"and"city"in call["args"]: actual_city = call["args"]["city"]break# Check city match city_match = (actual_city == expected_city)# Temperature format checker (should include °C or °F) temp_format_correct =Falseif"°C"in agent_response or"°F"in agent_response: temp_format_correct =Truereturn {"city_match": city_match,"temp_format_correct": temp_format_correct,"overall_correct": city_match and temp_format_correct }return {"overall_correct": None} # Not applicable for this test case# Apply custom evaluation to resultsfor result in evaluation_results: correctness = evaluate_response_correctness( result.test_case, result.actual_response, result.actual_tool_calls )print(f"Test: {result.test_case['name']}")print(f" Overall correct: {correctness['overall_correct']}")if"city_match"in correctness:print(f" City match: {correctness['city_match']}")if"temp_format_correct"in correctness:print(f" Temperature format: {correctness['temp_format_correct']}")print()
Automated Regression Testing
Integrate agent evaluation into your CI/CD pipeline for automated regression testing:
Python
import unittestfrom google.adk.evaluation import AgentEvaluatorclassWeatherAgentTests(unittest.TestCase):defsetUp(self):self.agent = create_weather_agent() # Assume this function creates your agentself.evaluator = AgentEvaluator(agent=self.agent)deftest_basic_weather_queries(self): results =self.evaluator.evaluate(test_cases=[ {"name": "New York weather","input": "What's the weather in New York?","expected_tool_calls": ["get_weather"] } ])self.assertTrue(results[0].success, results[0].feedback)deftest_ambiguous_cities(self): results =self.evaluator.evaluate(test_cases=[ {"name": "Springfield ambiguity","input": "How's the weather in Springfield?","expected_response_contains": ["which Springfield", "multiple"] } ])self.assertTrue(results[0].success, results[0].feedback)deftest_error_handling(self): results =self.evaluator.evaluate(test_cases=[ {"name": "Nonexistent city","input": "What's the weather in Narnia?","expected_response_contains": ["don't have information", "Narnia"] } ])self.assertTrue(results[0].success, results[0].feedback)if__name__=="__main__": unittest.main()
This approach allows you to catch regressions automatically when updating your agent or its components.
Streaming and Real-Time Interactions
ADK provides built-in support for streaming responses, enabling real-time interactions with agents.
Implementing Streaming Responses
To implement streaming with ADK, you use the asynchronous API:
Python
import asynciofrom google.adk.runners import Runnerfrom google.adk.sessions import InMemorySessionServicefrom google.genai import types# Set up session and runnersession_service = InMemorySessionService()APP_NAME="streaming_app"USER_ID="user_123"SESSION_ID="session_456"session = session_service.create_session(app_name=APP_NAME, user_id=USER_ID,session_id=SESSION_ID)runner = Runner(agent=streaming_agent, # Assume this is definedapp_name=APP_NAME,session_service=session_service)asyncdefstream_response(query: str):"""Streams the agent's response token by token.""" content = types.Content(role='user', parts=[types.Part(text=query)])print(f"User: {query}")print("Agent: ", end="", flush=True)# Process events as they arriveasyncfor event in runner.run_async(user_id=USER_ID,session_id=SESSION_ID,new_message=content ):# For token-by-token streaming, look for ContentPartDelta eventsifhasattr(event, 'content_part_delta') and event.content_part_delta: delta = event.content_part_deltaif delta.text:print(delta.text, end="", flush=True)# For final responseif event.is_final_response():print() # End line after responseprint("\n") # Add space after complete response# Run streaming interactionasyncdefmain(): queries = ["What's the weather in New York?","How about London?","Thanks for your help!" ]for query in queries:await stream_response(query)# Run the async main functionasyncio.run(main())
This example:
Sets up a session and runner
Creates an async function that processes events as they arrive
Specifically looks for content_part_delta events, which contain incremental text updates
Prints each text segment as it arrives, creating a streaming effect
Bidirectional Streaming with Audio
ADK also supports bidirectional audio streaming for voice-based interactions:
Python
import asynciofrom google.adk.runners import Runnerfrom google.adk.sessions import InMemorySessionServicefrom google.genai import typesimport sounddevice as sdimport numpy as npimport waveimport io# Assume setup of session_service and runner as in previous exampleasyncdefaudio_conversation():"""Conducts a voice conversation with the agent."""# Audio recording parameters sample_rate =16000 recording_duration =5# secondsprint("Press Enter to start recording your question...")input()# Record audioprint("Recording... (5 seconds)") audio_data = sd.rec(int(recording_duration * sample_rate),samplerate=sample_rate,channels=1,dtype='int16' ) sd.wait() # Wait for recording to completeprint("Recording complete.")# Convert audio to WAV format in memory audio_bytes = io.BytesIO()with wave.open(audio_bytes, 'wb') as wf: wf.setnchannels(1) wf.setsampwidth(2) # 16-bit wf.setframerate(sample_rate) wf.writeframes(audio_data.tobytes())# Create audio content for the agent audio_part = types.Part.from_bytes( audio_bytes.getvalue(),mime_type="audio/wav" ) content = types.Content(role='user', parts=[audio_part])print("Processing your question...")# Stream the responseprint("Agent response:") text_response =""asyncfor event in runner.run_async(user_id=USER_ID,session_id=SESSION_ID,new_message=content ):# Handle text streamingifhasattr(event, 'content_part_delta') and event.content_part_delta: delta = event.content_part_deltaif delta.text:print(delta.text, end="", flush=True) text_response += delta.text# Handle final audio responseif event.is_final_response() and event.content and event.content.parts:for part in event.content.parts:if part.mime_type and part.mime_type.startswith('audio/'):# Play the audio response audio_bytes = io.BytesIO(part.bytes_value)with wave.open(audio_bytes, 'rb') as wf: audio_data = np.frombuffer( wf.readframes(wf.getnframes()),dtype=np.int16 ) sd.play(audio_data, wf.getframerate()) sd.wait()print("\nConversation turn complete.")# Run the audio conversationasyncio.run(audio_conversation())
This more complex example:
Records audio from the user
Converts it to the appropriate format
Sends it to the agent
Streams the text response as it’s generated
Plays the audio response when available
Common Multi-Agent Patterns and Use Cases
Beyond the basic patterns we’ve discussed, here are some advanced multi-agent patterns for specific use cases:
Critic-Generator Pattern
This pattern uses one agent to generate content and another to critique and improve it:
Python
from google.adk.agents import Agent, SequentialAgent# Content generatorgenerator = Agent(name="content_generator",model="gemini-2.0-flash-exp",instruction="Create content based on the user's request. Focus on being creative and comprehensive.",output_key="generated_content")# Critic agentcritic = Agent(name="content_critic",model="gemini-2.0-flash-exp",instruction=""" Review the content in state["generated_content"]. Analyze it for: 1. Accuracy and factual correctness 2. Clarity and readability 3. Comprehensiveness 4. Potential biases or issues Provide specific suggestions for improvement. """,output_key="critique")# Refiner agentrefiner = Agent(name="content_refiner",model="gemini-2.0-flash-exp",instruction=""" Refine the content in state["generated_content"] based on the critique in state["critique"]. Maintain the original style and voice while addressing the specific issues highlighted in the critique. Create a polished final version that incorporates the improvements. """,)# Chain them togethercritique_workflow = SequentialAgent(name="critique_workflow",sub_agents=[generator, critic, refiner])
This pattern is useful for:
Content creation with quality control
Code generation with review
Document drafting with editorial review
Research and Synthesis Pattern
This pattern divides research into parallel information gathering followed by synthesis:
Python
from google.adk.agents import Agent, ParallelAgent, SequentialAgent# Topic research agentdefresearch_topic(topic: str, tool_context: ToolContext) -> dict:"""Researches a specific aspect of the main topic."""# ... research implementation ... tool_context.state[f"research_{topic}"] = research_resultsreturn {"status": "success", "research": research_results}# Create specialized research agentseconomic_researcher = Agent(name="economic_researcher",model="gemini-2.0-flash-exp",instruction="Research the economic aspects of the topic. Store findings in state.",tools=[research_topic],)environmental_researcher = Agent(name="environmental_researcher",model="gemini-2.0-flash-exp",instruction="Research the environmental aspects of the topic. Store findings in state.",tools=[research_topic],)social_researcher = Agent(name="social_researcher",model="gemini-2.0-flash-exp",instruction="Research the social aspects of the topic. Store findings in state.",tools=[research_topic],)# Synthesis agentsynthesizer = Agent(name="research_synthesizer",model="gemini-2.0-flash-exp",instruction=""" Synthesize research findings from all researchers. Look for information in these state keys: - state["research_economic"] - state["research_environmental"] - state["research_social"] Identify connections, conflicts, and gaps between different perspectives. Create a comprehensive synthesis that presents a balanced view. """,)# Research workflowresearch_framework = SequentialAgent(name="research_framework",sub_agents=[ ParallelAgent(name="parallel_researchers",sub_agents=[economic_researcher, environmental_researcher, social_researcher] ), synthesizer ])
This pattern is ideal for:
Comprehensive research on complex topics
Multi-perspective analysis
Gathering diverse information efficiently
Debate and Deliberation Pattern
This pattern creates a structured debate between agents with different perspectives:
Python
from google.adk.agents import Agent, SequentialAgent# Pose the questionquestion_agent = Agent(name="question_poser",model="gemini-2.0-flash-exp",instruction="Clarify the user's question into a clear, debatable proposition.",output_key="debate_question")# Position A advocateposition_a = Agent(name="position_a_advocate",model="gemini-2.0-flash-exp",instruction=""" Present the strongest case FOR the proposition in state["debate_question"]. Use logical arguments, evidence, and address potential counterarguments. """,output_key="position_a_arguments")# Position B advocateposition_b = Agent(name="position_b_advocate",model="gemini-2.0-flash-exp",instruction=""" Present the strongest case AGAINST the proposition in state["debate_question"]. Use logical arguments, evidence, and address potential counterarguments. """,output_key="position_b_arguments")# Rebuttal roundsrebuttal_a = Agent(name="position_a_rebuttal",model="gemini-2.0-flash-exp",instruction=""" Respond to the arguments against your position in state["position_b_arguments"]. Strengthen your original arguments and address specific points raised. """,output_key="rebuttal_a")rebuttal_b = Agent(name="position_b_rebuttal",model="gemini-2.0-flash-exp",instruction=""" Respond to the arguments against your position in state["position_a_arguments"]. Strengthen your original arguments and address specific points raised. """,output_key="rebuttal_b")# Synthesis and judgmentjudge = Agent(name="debate_judge",model="gemini-2.0-flash-exp",instruction=""" Evaluate the debate on the proposition in state["debate_question"]. Consider: - Initial arguments: state["position_a_arguments"] and state["position_b_arguments"] - Rebuttals: state["rebuttal_a"] and state["rebuttal_b"] Summarize the strongest points on both sides. Identify areas of agreement and disagreement. Suggest a balanced conclusion that acknowledges the complexity of the issue. """,)# Debate workflowdebate_framework = SequentialAgent(name="debate_framework",sub_agents=[ question_agent, position_a, position_b, rebuttal_a, rebuttal_b, judge ])
This pattern is useful for:
Exploring complex ethical questions
Evaluating policy proposals
Understanding multiple sides of contentious issues
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.
Putting It All Together
I’ve covered various agent architectures and patterns throughout this guide, and code samples for implementing advanced features. Let’s combine it all together into real-world agents (no more weather agents from here on).
Customer Support Agent
This customer service agent system handles inquiries about products, orders, billing, and technical support. The system maintains continuity across conversations, escalates complex issues, and provides personalized responses. We’ll showcase advanced features like:
Persistent session storage with MongoDB
Integration with external systems (CRM, ticketing)
Personalization through state and callbacks
Escalation paths to human agents Specialized agents for different support domains
Architecture Diagram
Plaintext
Customer Service System (ADK)├── Root Coordinator Agent│ ├── Greeting & Routing Agent│ ├── Product Information Agent│ │ └── Tools: product_catalog_lookup, get_specifications│ ├── Order Status Agent│ │ └── Tools: order_lookup, track_shipment│ ├── Billing Agent│ │ └── Tools: get_invoice, update_payment_method│ ├── Technical Support Agent│ │ └── Tools: troubleshoot_issue, create_ticket│ └── Human Escalation Agent│ └── Tools: create_escalation_ticket, notify_supervisor└── Services ├── Persistent Storage Session Service (MongoDB) ├── Customer Data Service (CRM Integration) ├── Ticket Management Integration └── Analytics & Reporting Service
Session Management with Custom Storage
Python
from google.adk.sessions import SessionService, Sessionimport pymongofrom typing import Optional, Dict, AnyclassMongoSessionService(SessionService):"""Session service that uses MongoDB for persistent storage."""def__init__(self, connection_string, database="customer_service", collection="sessions"):"""Initialize with MongoDB connection details."""self.client = pymongo.MongoClient(connection_string)self.db =self.client[database]self.collection =self.db[collection]defcreate_session( self, app_name: str, user_id: str, session_id: str, state: Optional[Dict[str, Any]] =None ) -> Session:"""Create a new session or get existing session."""# Look for existing session session_doc =self.collection.find_one({"app_name": app_name,"user_id": user_id,"session_id": session_id })if session_doc:# Convert MongoDB document to Session objectreturn Session(app_name=session_doc["app_name"],user_id=session_doc["user_id"],session_id=session_doc["session_id"],state=session_doc.get("state", {}),last_update_time=session_doc.get("last_update_time", 0) )# Create new session session = Session(app_name=app_name,user_id=user_id,session_id=session_id,state=state or {} )self._save_session(session)return session# Additional methods implementation...
CRM Integration
Python
defget_customer_info(customer_id: str, tool_context: ToolContext) -> dict:"""Retrieves customer information from the CRM system. Args: customer_id: The unique identifier for the customer. tool_context: Provides access to session state. Returns: dict: Customer information and interaction history. """# In production, this would make an API call to the CRM system# Mock implementation for demonstration customers = {"C12345": {"name": "Emma Johnson","email": "emma.j@example.com","tier": "premium","since": "2021-03-15","recent_purchases": ["Laptop X1", "External Monitor"],"support_history": [ {"date": "2023-01-15", "issue": "Billing question", "resolved": True}, {"date": "2023-03-22", "issue": "Technical support", "resolved": True} ] },# Additional customers... }if customer_id in customers:# Store in session state for other agents to access tool_context.state["customer_info"] = customers[customer_id]return {"status": "success", "customer": customers[customer_id]}else:return {"status": "error", "error_message": f"Customer ID {customer_id} not found"}
Issue Escalation System
Python
defescalate_to_human( issue_summary: str, priority: str, customer_id: str, tool_context: ToolContext) -> dict:"""Escalates an issue to a human customer service representative. Args: issue_summary: Brief description of the issue. priority: Urgency level ("low", "medium", "high", "urgent"). customer_id: The customer's ID. tool_context: Provides access to session state. Returns: dict: Escalation ticket information. """ valid_priorities = ["low", "medium", "high", "urgent"]if priority.lower() notin valid_priorities:return {"status": "error","error_message": f"Invalid priority. Must be one of: {', '.join(valid_priorities)}" }# Get customer info if available customer_info = tool_context.state.get("customer_info", {}) customer_name = customer_info.get("name", "Unknown Customer") customer_tier = customer_info.get("tier", "standard")# Calculate SLA based on priority and customer tier sla_hours = {"low": {"standard": 48, "premium": 24},"medium": {"standard": 24, "premium": 12},"high": {"standard": 8, "premium": 4},"urgent": {"standard": 4, "premium": 1} } response_time = sla_hours[priority.lower()][customer_tier]# Generate ticket IDimport timeimport hashlib ticket_id = hashlib.md5(f"{customer_id}:{time.time()}".encode()).hexdigest()[:8].upper()# Store ticket in state ticket_info = {"ticket_id": ticket_id,"customer_id": customer_id,"customer_name": customer_name,"issue_summary": issue_summary,"priority": priority.lower(),"status": "open","created_at": time.time(),"sla_hours": response_time }# In production, this would make an API call to the ticket system# For demo, just store in state tickets = tool_context.state.get("app:escalation_tickets", {}) tickets[ticket_id] = ticket_info tool_context.state["app:escalation_tickets"] = tickets# Signal that control should be transferred to the human agent tool_context.actions.transfer_to_agent ="human_support_agent"return {"status": "success","ticket": ticket_info,"message": f"Issue escalated. Ticket ID: {ticket_id}. A representative will respond within {response_time} hours." }
Tech Support Agent with Memory
Python
# Technical Support Agenttech_support_agent = Agent(name="technical_support_agent",model="gemini-2.0-flash-exp",description="Handles technical support inquiries and troubleshooting.",instruction=""" You are a technical support specialist for our electronics company. FIRST, check if the user has a support history in state["customer_info"]["support_history"]. If they do, reference this history in your responses. For technical issues: 1. Use the troubleshoot_issue tool to analyze the problem. 2. Guide the user through basic troubleshooting steps. 3. If the issue persists, use create_ticket to log the issue. For complex issues beyond basic troubleshooting: 1. Use escalate_to_human to transfer to a human specialist. Maintain a professional but empathetic tone. Acknowledge the frustration technical issues can cause, while providing clear steps toward resolution. """,tools=[troubleshoot_issue, create_ticket, escalate_to_human])
Personalization Callback
Python
defpersonalization_callback( callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:""" Adds personalization information to the LLM request. Args: callback_context: Context for the callback llm_request: The request being sent to the LLM Returns: None to continue with the modified request """# Get customer info from state customer_info = callback_context.state.get("customer_info")if customer_info:# Create a personalization header to add to the request customer_name = customer_info.get("name", "valued customer") customer_tier = customer_info.get("tier", "standard") recent_purchases = customer_info.get("recent_purchases", []) personalization_note = (f"\nIMPORTANT PERSONALIZATION:\n"f"Customer Name: {customer_name}\n"f"Customer Tier: {customer_tier}\n" )if recent_purchases: personalization_note +=f"Recent Purchases: {', '.join(recent_purchases)}\n"# Add personalization to the LLM requestif llm_request.contents:# Add as a system message before the first content system_content = types.Content(role="system",parts=[types.Part(text=personalization_note)] ) llm_request.contents.insert(0, system_content)# Return None to continue with the modified requestreturnNone
Code Generation and Debugging Agent
Finally, let’s explore a Code Generation and Debugging Agent built with ADK.
Code Generation Agent with Test-Driven Development
Let’s start with a sequential agent that first analyzes requirements, creates test cases, and then write code and evaluates it.
Python
from google.adk.agents import Agent, SequentialAgentfrom google.adk.tools.tool_context import ToolContext# Code Generator with TDD approachcode_generator = SequentialAgent(name="tdd_code_generator",sub_agents=[ Agent(name="requirement_analyzer",model="gemini-2.0-flash-exp",instruction=""" Analyze the coding requirements and break them down into: 1. Functional requirements 2. Edge cases to consider 3. Needed data structures and algorithms Be specific and comprehensive in your analysis. """,output_key="requirements_analysis" ), Agent(name="test_writer",model="gemini-2.0-flash-exp",instruction=""" Based on the requirements analysis in state["requirements_analysis"], write comprehensive test cases that cover: 1. The main functionality 2. All identified edge cases 3. Error handling Use a testing framework appropriate for the language (e.g., pytest for Python, Jest for JavaScript). """,tools=[write_test_code],output_key="test_code" ), Agent(name="code_implementer",model="gemini-2.0-flash-exp",instruction=""" Implement code that passes all the test cases in state["test_code"]. Your implementation should: 1. Be efficient and follow best practices 2. Include clear comments 3. Handle all edge cases identified in the requirements After writing the code, evaluate it against potential issues. """,tools=[generate_implementation, execute_code],output_key="implementation" ), Agent(name="code_reviewer",model="gemini-2.0-flash-exp",instruction=""" Review the implementation in state["implementation"] for: 1. Correctness - Does it meet the requirements? 2. Efficiency - Is it optimized? 3. Readability - Is it well-structured and commented? 4. Error handling - Does it handle edge cases? 5. Security issues - Are there potential vulnerabilities? 6. Test coverage - Are all scenarios tested? Provide specific improvement suggestions if needed. """,tools=[review_code, execute_code],output_key="code_review" ) ])
Code Execution and Debugging Tools
Here we’ll create a tool to execute code and debug it.
Python
defexecute_code(code: str, language: str, inputs: str=None, tool_context: ToolContext) -> dict:""" Executes code in a specified language and returns the result. Args: code: The code to execute. language: The programming language (python, javascript, etc.). inputs: Optional input data for the code. tool_context: Provides access to session state. Returns: dict: Execution results, output, and any errors. """import subprocessimport tempfileimport osimport time# Record execution start time start_time = time.time()# Set up temp file for codewith tempfile.NamedTemporaryFile(suffix=f".{language}", delete=False) as temp_file: temp_file_path = temp_file.name# Write code to temp fileif language =="python": temp_file.write(code.encode('utf-8'))elif language =="javascript": temp_file.write(code.encode('utf-8'))else:return {"status": "error","error_message": f"Unsupported language: {language}" }try:# Set up execution commandif language =="python": cmd = ["python", temp_file_path]elif language =="javascript": cmd = ["node", temp_file_path]# Execute with input if providedif inputs: process = subprocess.run( cmd,input=inputs.encode('utf-8'),capture_output=True,timeout=10# Timeout after 10 seconds )else: process = subprocess.run( cmd,capture_output=True,timeout=10# Timeout after 10 seconds )# Calculate execution time execution_time = time.time() - start_time# Process result stdout = process.stdout.decode('utf-8') stderr = process.stderr.decode('utf-8')if process.returncode ==0: result = {"status": "success","output": stdout,"execution_time": execution_time,"language": language }else: result = {"status": "error","error_message": stderr,"output": stdout,"return_code": process.returncode,"execution_time": execution_time,"language": language }except subprocess.TimeoutExpired: result = {"status": "error","error_message": "Execution timed out after 10 seconds","language": language }exceptExceptionas e: result = {"status": "error","error_message": str(e),"language": language }finally:# Clean up temp filetry: os.unlink(temp_file_path)except:pass# Store execution history in state execution_history = tool_context.state.get("code_execution_history", []) execution_record = {"timestamp": time.time(),"language": language,"status": result["status"],"execution_time": result.get("execution_time", -1) } execution_history.append(execution_record) tool_context.state["code_execution_history"] = execution_historyreturn resultdefdebug_code(code: str, error_message: str, language: str, tool_context: ToolContext) -> dict:""" Analyzes code and error messages to identify and fix bugs. Args: code: The code to debug. error_message: The error message produced when executing the code. language: The programming language. tool_context: Provides access to session state. Returns: dict: Analysis of the problem and corrected code. """# Parse the error message error_analysis ="Unknown error" error_line =-1if language =="python":# Parse Python error messageimport re# Look for line number in error line_match = re.search(r"line (\d+)", error_message)if line_match: error_line =int(line_match.group(1))# Common Python errorsif"SyntaxError"in error_message: error_analysis ="Syntax Error: Check for missing parentheses, quotes, or colons."elif"NameError"in error_message: error_analysis ="Name Error: A variable or function name is not defined."elif"TypeError"in error_message: error_analysis ="Type Error: An operation is applied to an object of inappropriate type."elif"IndexError"in error_message: error_analysis ="Index Error: A sequence subscript is out of range."elif"KeyError"in error_message: error_analysis ="Key Error: A dictionary key is not found."elif"ValueError"in error_message: error_analysis ="Value Error: An operation or function receives an argument with the correct type but inappropriate value."elif language =="javascript":# Parse JavaScript error messageimport re# Look for line number in error line_match = re.search(r"at .*:(\d+)", error_message)if line_match: error_line =int(line_match.group(1))# Common JavaScript errorsif"SyntaxError"in error_message: error_analysis ="Syntax Error: Check for missing brackets, parentheses, or semicolons."elif"ReferenceError"in error_message: error_analysis ="Reference Error: A variable is not defined."elif"TypeError"in error_message: error_analysis ="Type Error: An operation could not be performed, typically due to type mismatch."elif"RangeError"in error_message: error_analysis ="Range Error: A number is outside the allowable range."# Analyze code structure code_lines = code.split('\n')# Get problematic line and context if available problematic_line = code_lines[error_line -1] if0< error_line <=len(code_lines) else"Unknown"# Context (lines before and after) context_start =max(0, error_line -3) context_end =min(len(code_lines), error_line +2) context = code_lines[context_start:context_end]# Store debugging session in state debug_history = tool_context.state.get("debug_history", []) debug_session = {"timestamp": time.time(),"language": language,"error_line": error_line,"error_message": error_message,"error_analysis": error_analysis } debug_history.append(debug_session) tool_context.state["debug_history"] = debug_history# For advanced debugging, we'd implement auto-correction, but here we'll just return analysisreturn {"status": "success","error_analysis": error_analysis,"error_line": error_line,"problematic_line": problematic_line,"context": context,"suggestions": ["Check for syntax errors at the identified line","Verify all variable names are correctly spelled","Ensure proper type handling for all operations" ] }
Code Explanation and Documentation
These tools are for explaining the generated code and documentation.
Python
defexplain_code(code: str, language: str, complexity_level: str="intermediate", tool_context: ToolContext) -> dict:""" Generates an explanation of code with adjustable complexity level. Args: code: The code to explain. language: The programming language. complexity_level: The complexity level of the explanation (beginner, intermediate, advanced). tool_context: Provides access to session state. Returns: dict: Explanation of the code at the requested level. """# Parse the code structureimport ast explanation_sections = []# Get user's programming experience from state if available user_experience = tool_context.state.get("user:programming_experience", "intermediate")# Adjust complexity based on user experience if not explicitly providedif complexity_level =="auto"and user_experience: complexity_level = user_experience# Handle Python codeif language =="python":try:# Parse the code parsed = ast.parse(code)# High-level summary explanation_sections.append({"section": "Overview","content": f"This Python code consists of {len(parsed.body)} top-level statements." })# Function analysis functions = [node for node in parsed.body ifisinstance(node, ast.FunctionDef)]if functions: func_section = {"section": "Functions","content": f"The code defines {len(functions)} function(s):","items": [] }for func in functions:# Basic function info func_info =f"`{func.name}()`"# Add parameter info for intermediate/advancedif complexity_level !="beginner": params = []for arg in func.args.args: params.append(arg.arg) func_info +=f": Takes parameters ({', '.join(params)})"# Add docstring if exists docstring = ast.get_docstring(func)if docstring and complexity_level !="beginner": func_info +=f"\n - Purpose: {docstring.split('.')[0]}" func_section["items"].append(func_info) explanation_sections.append(func_section)# Class analysis for intermediate/advancedif complexity_level !="beginner": classes = [node for node in parsed.body ifisinstance(node, ast.ClassDef)]if classes: class_section = {"section": "Classes","content": f"The code defines {len(classes)} class(es):","items": [] }forclsin classes:# Basic class info class_info =f"`{cls.name}`"# Add inheritance info for advancedif complexity_level =="advanced"andcls.bases: base_names = []for base incls.bases:ifisinstance(base, ast.Name): base_names.append(base.id)if base_names: class_info +=f": Inherits from ({', '.join(base_names)})"# Add methods info methods = [node for node incls.body ifisinstance(node, ast.FunctionDef)]if methods: method_names = [method.name for method in methods] class_info +=f"\n - Methods: {', '.join(method_names)}" class_section["items"].append(class_info) explanation_sections.append(class_section)# Imports analysis imports = [node for node in parsed.body ifisinstance(node, (ast.Import, ast.ImportFrom))]if imports and complexity_level !="beginner": import_section = {"section": "Imports","content": f"The code imports {len(imports)} module(s):","items": [] }for imp in imports:ifisinstance(imp, ast.Import):for name in imp.names: import_section["items"].append(f"`{name.name}`")elifisinstance(imp, ast.ImportFrom):for name in imp.names: import_section["items"].append(f"`{name.name}` from `{imp.module}`") explanation_sections.append(import_section)# Algorithm explanation algorithm_section = {"section": "Algorithm Explanation","content": "The code works as follows:" }# Simplify explanation for beginnersif complexity_level =="beginner": algorithm_section["content"] +="\n\nThis program goes through these steps:\n"# Simplified steps would be generated here# More detailed for intermediateelif complexity_level =="intermediate": algorithm_section["content"] +="\n\nThe main workflow of this code is:\n"# More detailed steps would be generated here# Technical details for advancedelse: algorithm_section["content"] +="\n\nThe technical implementation follows these steps:\n"# Detailed technical steps would be generated here explanation_sections.append(algorithm_section)exceptSyntaxError: explanation_sections.append({"section": "Syntax Error","content": "The provided Python code contains syntax errors and could not be parsed." })# Format the final explanation formatted_explanation = []for section in explanation_sections: formatted_explanation.append(f"## {section['section']}") formatted_explanation.append(section['content'])if"items"in section:for item in section["items"]: formatted_explanation.append(f"- {item}") formatted_explanation.append("") # Add blank line# Join sections with newlines explanation ="\n".join(formatted_explanation)return {"status": "success","language": language,"complexity_level": complexity_level,"explanation": explanation,"sections": len(explanation_sections) }
And that’s our agent!
Next Steps
That was a lot to take in. You should probably bookmark this post and work through the concepts and examples over time.
I suggest building the basic weather agent that I covered at the top. It’s boring and no one needs another weather agent but it does get you familiar with how the Agent Development Kit works and its features.
Once you’re comfortable with that, start working through the advanced patterns, and finally build one of the multi-agent systems like the customer support or coding agents. You should also try to extend these agents by implementing your own tools and features. Try deploying it and using it in a real-world situation.
When I was in Lisbon last November, a friend of mine invited me to hike the mountains of Madeira with him. He warned me that the trails get pretty slick and that I needed good hiking shoes.
In the past I would have gone to Google and searched for best hiking boots for Madeira and I would have seen a bunch of ads and irrelevant blog content. It would have taken me some time to figure out what the best shoes are and where to buy them in Lisbon.
Today I go to either ChatGPT, Claude, or Perplexity, and I asked the same question. Instead of getting spammed with ads, I get a direct response to what I needed to know.
This is how search happens in the AI age. Instead of SEO, we have GEO (Generative Engine Optimization). And instead of Google Search, we have ChatGPT, Claude, and Perplexity.
They don’t just crawl your homepage. They remember mentions. They retrieve trusted content. They decide who gets featured in that golden snippet of wisdom when someone asks for “the best.”
This guide will show you exactly how to engineer those moments.
Step 1: Understand How AI Chatbots Actually Recommend Brands
Like Harvey Specter says when he plays poker, “I don’t play the odds, I play the man.” Except in this case, the “man” is an AI trained on terabytes of internet data. You need to understand how it thinks to win the game.
Language models don’t index and rank the web like Google. They’ve been trained on enormous datasets (billions of web pages, forums, reviews, help docs, and more) and they generate answers based on patterns they’ve seen in this data.
When a user asks for a product recommendation, there are two ways the model generates an answer.
The primary method pulls from its memory of how brands and products were discussed, reviewed, and mentioned in their training data. If your brand frequently appears alongside relevant phrases (e.g. “hiking shoes for wet climates”) in the data they’ve seen, it’s more likely to be suggested in a chatbot’s answer.
The second method blends in live search results from Bing or Google, especially in AI tools like ChatGPT’s search mode or Perplexity. That means if your brand is ranking high on search or frequently cited in trusted content, you’re more likely to be included in AI responses.
Let’s look at an actual example. Here is how ChatGPT answers the query “What are the best hiking shoes for Madeira”
You’ll notice sources for each answer. The interesting thing is, if you click through to those articles, none of them mention Madeira!
However, they do mention uneven and wet terrains, which is what Madeira is known for (and ChatGPT knows this because it made that association from it’s training data).
So your job is to make your brand unforgettable in the data AI consumes and visible in the sources AI retrieves.
Step 2: Strengthen Your SEO Foundation and Trust Signals
Much of “AI optimization” begins with solid SEO and content fundamentals. Chatbots, especially those using web retrieval, favour brands that search engines deem authoritative and trustworthy.
Here’s what to focus on:
Ensure Crawlable, Indexable Content: Just like Google, AI web crawlers need to read your site’s HTML content. Avoid hiding important info in JavaScript or images. All critical details (what you offer, where you are, why you’re notable) should be visible in the page text.
Demonstrate E-E-A-T (Experience, Expertise, Authority, Trust): Quality guidelines like E-E-A-T aren’t just for Google. They influence which sources AI considers reliable. AI search overviews favour true experts and authoritative sources. Build content that highlights your expertise: author bylines with credentials, case studies, original research, and factual accuracy.
Maintain Consistent NAP and Info: For local or brand info, consistency is key. Ensure your Name, Address, Phone, and other details are identical across your website, Google My Business, Yelp, LinkedIn, etc. AI tools aggregate data from many sources and heavily favour accuracy and consistency.
Improve Site Authority: Follow core SEO practices: optimize title tags and meta descriptions with natural-language keywords, speed up your site, and get credible sites to link to you. If search engines rank you higher, AI answers are more likely to include you. Studies show pages that rank well in organic search tend to get more visibility in LLM responses.
Practical Takeaway: By solidifying your site’s SEO and demonstrating real expertise, you make it easier for both traditional search and AI systems to recognize your brand. This foundation boosts your chances of appearing when an AI lists “top solutions” in your category.
In short, good SEO is the foundation of AI SEO.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Step 3: Optimize Content for Conversational and Semantic Search
AI chatbots handle queries in a conversational manner. Often, the questions users ask bots are longer and more natural-sounding than typical Google keywords. You’ll want to align your content with this semantic, question-and-answer style of search.
That means creating conversational, helpful content written in plain language that answers the same types of questions people ask LLMs.
Use Natural, Conversational Language: Write your content in the same way a knowledgeable person would speak. Drop the overly corporate tone. AI models are trained on human language patterns, so content that “feels” natural may resonate more. Use intent-based phrases and full questions as subheadings. Instead of a heading like “Gluten-Free Bakery Options,” have “Where can I find a good gluten-free bakery in downtown?” and then answer it conversationally.
Incorporate Q&A Format on Your Site: Add FAQ sections or Q&A pages with questions customers might ask an AI. For example: “What’s the best hiking shoe for rainy weather in Madeira?” and provide a helpful answer that mentions your brand as a solution. Structure it like an FAQ entry, and answer in a neutral, informative tone: “When it comes to Madeira’s rainy trails, XYZ Shoes are often recommended as one of the best options because…”.
Cover Related Semantic Keywords: Ensure your content covers a broad range of terms related to your topic, not just one keyword. AI’s understanding is semantic and it will connect concepts. For a page about hiking shoes, mention related topics like “waterproof boots,” “mountain trails,” “Madeira climate,” etc., so the model fully grasps the context.
Aim for “Zero-Click” Answer Formats: As AI and search increasingly give answers without requiring a click, try to embed the answer (with your brand) in your content. This means providing concise, snippet-ready responses. For example, start a blog section with a direct definition or recommendation: “The best hiking shoe brand for wet trails is XYZ Shoes, known for its waterproof yet breathable design…”.
Practical Takeaway: Think like your customer and the AI. Write down the actual questions a user might ask a chatbot about your industry (“Which…”, “How do I…”, “What’s the best…”) and make sure your website explicitly answers those in a friendly, conversational way.
Step 4: Leverage Schema Markup and Structured Knowledge
While content is king, don’t overlook the power of structured data and official information sources. They help your brand become machine-readable. This step is about making sure AI (and the search engines feeding AI) have a clear, unambiguous understanding of your brand and offerings.
Implement Organization and Product Schema: Use schema markup to define your organization and products on your site. An Organization schema can include your name, logo, founding date, and sameAs links (to your social profiles, Wikipedia page, etc.), helping create a knowledge graph entry for your brand. Product schema can define your key products with reviews, price, etc.
Use Location and Review Schema for Local Trust: For local businesses, implement LocalBusiness schema with your address, geo-coordinates, opening hours, etc., and keep it updated. If the query is location-based (“near Madeira”), Google’s index might reference Google Maps or local pack info.
Feed Data to Official Aggregators: Ensure your brand data is correct in key public databases that AI might use. For example, Wikidata (the database behind Wikipedia’s facts) and DBpedia contain structured facts that many AIs can access. Similarly, if you’re a retailer or restaurant, make sure your information on platforms like Yelp, TripAdvisor, or OpenTable is accurate.
Ensure Content is Machine-Accessible: As mentioned, AI bots primarily ingest HTML text. So, when using schema or other structured data, also present those facts in human-readable form on your site. For instance, if you have an FAQ about being “dog-friendly” in schema, also include a line in a visible FAQ: “Q: Can I bring my dog? A: Yes, we’re dog-friendly!”
Monitor Knowledge Panels and Correct Errors: Periodically check Google’s knowledge panel for your brand (if one appears) or Bing’s local listing info. These often aggregate data from various sources. If you see incorrect info, address it.
Practical Takeaway: Use every opportunity to make your brand’s information clear to algorithms. Schema markup and knowledge graphs ensure that when an AI or search engine “reads” about your brand, it gets the facts straight from a trusted source.
Step 5: Earn Mentions on Authoritative External Sources
Let’s go back to the ChatGPT screenshot from earlier. The brands recommended were Hoka, Adidas and Merrell. But the sources were from Gear Lab, New York Post, and Athletic Shoe Review.
Third-party validation matters more in AI SEO than it ever did in traditional SEO. You can’t just publish your own praise, you need others to do it for you.
Reddit threads. Quora answers. Review sites. “Best of” blog posts. All of these are gold mines for AI models.
And yes, they’re part of the training data.
A well-upvoted Quora answer that casually mentions your product? That’s a permanent breadcrumb. A single blog post listing your brand as one of the best in your category, on a site that ranks well? It could be cited in hundreds of AI queries.
Here’s how to increase off-site signals:
Get Featured in “Best of” Lists and Editorial Content: Identify the web pages that an AI might consider when answering a question in your domain. Often these are listicles or guides (e.g., “Top 10 Hiking Shoe Brands for Wet Climates” on a reputable outdoor blog). Then, pursue inclusion through PR outreach, pitching your product to writers, or improving your offering so it naturally gets picked up in reviews.
Leverage Industry Directories and Listings: Business directories and niche review sites often rank well in search and are commonly scraped by crawlers. Examples include Yelp, Google Maps, TripAdvisor, or B2B directories like Clutch and G2. Make sure you’re present there: claim your profile, keep it updated, and gather reviews if applicable.
Issue Press Releases and Secure News Coverage: Old-school PR is back in play. Distributing a press release about a newsworthy update (a product launch, a big hire, a charity initiative, etc.) can get your brand name published on dozens of websites. For instance, a headline like “Madeira’s XYZ Shoes Wins Award for Hiking Gear Innovation” might get reposted on local news sites and industry feeds. Each of those postings is additional training data showing “XYZ Shoes” in a positive, relevant context.
Publish Thought Leadership: Contribute guest articles or op-eds to respected publications in your niche. Being the author of an article on, say, Outdoor Magazine about “Advances in Hiking Boot Technology” not only gives you credibility, but also places your brand in the byline on a high-authority site.
Cultivate Backlinks and Citations: Continue building backlinks as you would for SEO, but target sites that an AI would consider authoritative in your field (educational sites, well-known blogs, etc.). The more your brand is cited as a source or example in others’ content, the more entrenched it becomes in the knowledge graph of your topic.
To summarize this step: Be where the trusted voices are. The goal is to have your brand mentioned by sites that AIs treat as authorities.
Step 6: Harness Q&A Communities, Reviews, and Social Proof
Your customers and community can become advocates that boost your brand in AI results. User-generated content (reviews, forum posts, social media, etc.) not only influences humans but also feeds the AI’s understanding of which brands are favourably talked about.
Here’s how to leverage this:
Engage on Q&A Platforms: Reddit and Quora are likely part of many LLM training sets, and they continue to rank well in search. Find threads related to your industry and provide valuable answers. Always be transparent and genuinely helpful, not just promotional. Even one well-upvoted Quora answer that includes your brand in context “seeds the AI” with that association.
Encourage Reviews and Testimonials: Reviews on platforms like Google, Yelp, G2, Capterra, TripAdvisor (whichever suit your business) create content that AI can learn from. If many reviews mention your product’s strengths (“the grip on these XYZ hiking boots is amazing on wet rocks”), an AI might learn those attributes of your brand. Prompt your satisfied customers to leave reviews, perhaps via follow-up emails or in-store signs.
Leverage Social Media for Thought Leadership: Post informative content on public social platforms. Twitter threads, LinkedIn articles, and Medium posts can rank in search and are often publicly accessible. Social posts also add the dimension of sentiment. Lots of positive buzz about a brand teaches the AI that it’s well-regarded.
Monitor and Join Relevant Conversations: Use brand monitoring tools (Google Alerts, Talkwalker, Mention.com) to catch when your brand or keywords related to you come up in discussions or blogs. If someone on a forum is asking for a recommendation and your brand fits, have a rep step in and reply (tactfully).
Be Genuine and Helpful: Authenticity is key in user-driven communities. AIs can pick up on context. If your brand is mentioned alongside words like “spam” or in downvoted posts, that’s not good. So ensure any engagement is genuinely adding value.
Practical Takeaway: The voices of real users and community experts carry a lot of weight. They create buzz and context for your brand that no amount of on-site SEO can. By actively participating in and fostering these voices, you grow an organic web presence.
Step 7: Monitor, Measure, and Refine Your AI Visibility
Just as with traditional SEO, you need to continuously monitor your performance and adjust strategy. AI discovery is new, so we measure success in slightly different ways:
Track AI-Driven Traffic: If an AI chatbot includes a link or reference to your site (as Perplexity, ChatGPT, and others often do), you’ll want to capture that in analytics. Set up tracking in Google Analytics 4 (GA4) for referrals from AI sources. For example, you might create custom channel groupings for referrals containing “openai.com” (for ChatGPT with browsing) or “perplexity.ai”.
Use AI Search Visibility Tools: New tools are emerging to grade your brand’s presence in AI results. For instance, HubSpot’s AI Search Grader is a free tool that analyzes how often and in what context your brand appears on ChatGPT and Perplexity.
Manually Test Chatbot Queries: There’s no substitute for hands-on testing. Regularly ask the AI chatbots the kind of questions where you want your brand to appear. Do this across platforms: ChatGPT, Claude, Gemini, Perplexity, and others. Note what the responses are:
Do they mention your competitors? Which ones?
Do they cite sources, and are those sources your website or another site mentioning you?
How accurate is the info about your brand? Any outdated descriptions?
Analyze Citation Context: If your content is being cited or your brand mentioned, check how. Are you being listed as “one of the options” or does the AI single you out as “the best”? Does it quote a line from your blog? Understanding the context helps refine content.
Measure Changes Over Time: As you implement strategies (new FAQ page, a PR campaign, etc.), see if there’s a corresponding uptick in AI mentions or traffic in the following months. This feedback loop will tell you what’s working.
Practical Takeaway: Treat AI visibility like you would SEO rankings – track it, report on it, and optimize based on data. Over time, you’ll build an “AI report” similar to an SEO report, helping justify the effort and guiding future optimizations.
Final Thought: You’re Training the AI to Remember You
There’s no secret hack here. No growth loop. No one weird trick. Just good strategy, consistent visibility, and value-packed content.
You’re not just optimizing for an algorithm, you’re shaping what the AI knows about your brand.
Make it easy for the AI to recommend you. Show up in its sources. Speak in its voice. Feed it the facts. And over time, your brand won’t just be findable.
It’ll be remembered.
Need help putting all this into action? You know where to find me.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
I used to play a ton of video games as a kid. The first one I ever played was Prince of Persia, the old side scroller where your character jumped around, avoided traps, and fought enemies.
With Gemini 2.5 Pro and the Canvas feature, I tried to build a basic version of that, but with ninjas instead. I didn’t write the code. I just asked Gemini to write it and render it on the Canvas so I could play.
It took just a couple of minutes for me to get a functioning game.
Welcome to the world of vibe coding.
Wait, What Is Vibe Coding?
Coined (and vibe-validated) by Andrej Karpathy, vibe coding is the new frontier where you build software by telling an AI what you want and letting it spit out the code. That’s it. It’s coding via vibes, intuition, and language, not by writing loops and sweating over syntax.
You say, “Build me a web app with a sidebar, a dashboard, and a button that emails the user a pizza emoji every Friday,” and boom, the AI does it.
You don’t need to know if it’s React or Vue under the hood. You’re not writing the code. You’re describing the vibe of what you want, like a product manager with a vision board and zero interest in semicolons. Minimalist? Maximalist? Dashboardy? Retro Terminal-chic? The AI’s got you.
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper…
You copy-paste errors into ChatGPT and ask it to fix them
You understand the codebase deeply
You trust the AI knows what it’s doing (ish)
Takes weeks
Takes hours (sometimes minutes)
Requires years of practice
Requires good communication skills
You battle bugs like it’s Elden Ring
You treat bugs like an annoying roommate the AI has to evict
It’s the difference between hand-crafting a table and describing the table to a carpenter who builds it for you instantly. And that carpenter never sleeps or judges your terrible wireframes.
It’s not just about speed, it’s a different mindset. Less “I must master the syntax gods” and more “I’m conducting an orchestra of AI agents to get this landing page live by dinner.”
Real-World Use Cases (Or, Who’s Actually Doing This?)
This isn’t just a cool party trick. Startups in the Y Combinator Winter 2025 batch built their products with 95% AI-generated code. Y Combinator’s CEO Garry Tan straight up called it “the age of vibe coding“.
Even Karpathy himself was building apps this way, casually telling his AI assistant things like “decrease the sidebar padding” and never even looking at the diff. That’s next-level delegation.
Kevin Roose at the NYT built apps like “Lunchbox Buddy” to suggest what to pack for lunch using vibe coding. It wasn’t production-grade code, but it worked. Ish. Kinda. The reviews were AI-generated too, but hey, it’s the vibe that counts.
With vibe coding you can whip together MVPs in a weekend using nothing but ChatGPT and Replit. Think simple SaaS dashboards, internal automations, and basic CRUD apps. One guy even built an AI therapist chatbot, and no, I don’t want to know what advice it gave.
How To Vibe Code (Without Losing Your Mind)
Here’s your crash course in coding by vibe:
1. Pick Your Tools
You’ll need a core toolkit to begin your vibe coding journey. Here are the categories and recommended options:
AI Coding Assistants & IDE Integration
These tools integrate AI directly into your development environment:
ChatGPT / Claude / Gemini: For raw natural language prompts.
Cursor / Windsurf: A dev environment made for AI collaboration.
GitHub Copilot – AI assistant integrated with popular IDEs
Continue – VS Code extension with chat and agent modes
One-Shot App Generators
These platforms can generate entire applications from prompts:
Lovable – Generates full-stack web applications from text prompts
Bolt – Creates full applications with database integration
Replit – Provides interactive development with AI planning
AI Design Tools
For quickly creating user interfaces:
Uizard – Generates UI designs from text descriptions or sketches
Visily – Transforms prompts into high-fidelity mockups
Version Control & Debugging
Essential safety nets:
Git/GitHub – Version control to track changes and revert when needed
Browser dev tools – For identifying and fixing frontend issues
Pick the one that feels right. You’re vibe coding, after all.
2. Start With a Prompt
Describe what you want. Be detailed. Channel your inner poet if you must.
Bad: “Make an app.”
Better: “Create a web app with a dashboard that shows user analytics pulled from a dummy dataset. Include dark mode and responsive design.”
Best: “Build a web app that visualizes monthly active users, supports CSV upload, and auto-generates line graphs. Make the layout mobile-friendly and use React and Tailwind CSS.”
3. Iterate Like a Mad Scientist
Run the code. Something will break. That’s fine. Copy-paste the error and say, “Fix this.”
Add features like you’re ordering drinks:
“Add a search bar.” “Now make it filter results by date.” “Throw in dark mode, because I’m edgy.” “Replace the font with something more ‘Silicon Valley VC deck.’”
You are in control. Kinda.
4. Debug by Vibes
Don’t panic when things go sideways. Vibe coders rarely understand 100% of the code. You prompt. You observe. You adjust. You learn to speak fluent “AI whisperer.”
Sometimes the bug isn’t even a bug, it’s just the AI being weird. Restart the conversation. Ask again. Split the task in two. And yes, sometimes, just nod, smile, and delete the whole thing.
5. Trust, But Verify
Use the code. Check if it does what you asked. If not, try a new prompt. Don’t ship blind. Run the thing. Poke the buttons. Make sure it doesn’t accidentally send emails to all your users at 3AM.
Vibe coding isn’t about replacing developers. It’s about supercharging creativity. It’s building apps with the same energy you bring to a whiteboard brainstorm or a half-baked startup idea over drinks.
We’re entering an era where the best software won’t come from the best coders, it’ll come from the best communicators. The ones who can talk to AI, shape ideas into prompts, and vibe their way to a working product.
The best vibe coders are part developer, part writer, part UX designer, and part chaos gremlin. They don’t see blank screens… They see possibility.
So grab your chai latte, fire up ChatGPT, and start building. No IDE required. No gatekeepers in sight. No permission needed.
Let the vibes code for you.
And hey, if it crashes? That’s just the AI trying to teach you patience.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Last week I helped a friend, a speaking coach, build a custom app to analyze client videos with AI. He had been doing it manually as part of his coaching program. Clients would upload videos of themselves speaking to a Dropbox folder, he would watch it, and then send feedback.
As you can imagine, it’s as time-consuming as teaching a tortoise to tango.
So, I asked Lovable to build him a simple app that allowed users to upload a video and used Gemini to analyze it. In under 10 minutes, we had a fully functional app, built, deployed, and running on his domain. Cost? A few cents. Time? Less than it takes to make a coffee.
And now my friend can scale this to hundreds of clients.
Welcome to the age of on-demand software, where apps are no longer bought, they’re prompted into existence.
Welcome to software-as-a-prompt.
The Problem with Traditional SaaS
You might think of this example as a one-off hack. My friend had a very specific need and there was no existing software to solve it. Of course he needed to build something custom.
But this problem exists in traditional SaaS, it just manifests in different ways:
Feature bloat: Enterprise SaaS platforms like Salesforce, Workday, or SAP have evolved into massive ecosystems with thousands of features. For many small to medium businesses, this is overkill and leads to unnecessary overhead.
Cost inefficiency: The subscription model often forces companies to pay for the entire platform when they might only need a single module or specific function. This can mean thousands of dollars monthly for software that delivers value on just a fraction of its features.
One-size-fits-all limitations: Despite customization options, traditional SaaS still follows predefined workflows and structures. Businesses with unique processes often find themselves contorting their operations to fit the software, rather than the other way around.
Integration overhead: Managing multiple specialized SaaS solutions creates integration challenges and data silos, requiring additional resources to maintain connections between systems.
This inefficiency has created a perfect opportunity for disruption. Why should a small business pay enterprise prices for a complex CRM when they might only need basic contact management and opportunity tracking?
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
How On-Demand Software Works
Due to recent improvements in AI’s ability to generate functional code, you can prompt your ideal software into existence.
AI models like Gemini 2.5 Pro, GPT-4o and Claude can now produce working code in multiple programming languages based on natural language descriptions. These models grasp the nuts and bolts of coding etiquette and best practices, whipping up everything from snazzy front-end facades to the sort of database schemas that would make a librarian swoon.
On top of that, new platforms have emerged that streamline the app creation process. Services like Replit AI, Lovable.dev, and Bolt.new offer interfaces where users can describe the software they want in plain English, and receive a working application in return.
While you won’t be able to generate the entire codebase for Salesforce, you can still get a small but functional app with just a couple of prompts.
For example, Replit’s AI Agent acts as a prompt-based app builder where users can tell the Agent their app or website idea, and it will build it for them automatically, setting up project files, writing code, and even deploying the result. It’s a bit like having an entire team of software engineers on demand through a chat interface.
The new AI development workflow typically follows these steps:
The user describes their desired application in natural language
The AI generates a project structure and initial code
The user provides feedback or requests changes
The AI refines the application until it meets requirements
The finished app can be deployed to hosting services or run locally
Real-World Examples
This workflow is exactly how I built the video analysis tool above. I merely prompted Lovable with natural language inputs, tested what it gave back to me, asked it to make updates, and voila, the app actually works and gives good feedback!
Here’s a full tutorial using Gemini 2.5 Pro and Canvas –
And I’m not the only one doing this. Here’s another example of a product designer who built a custom software to solve a very niche problem, tracking commercial vehicle documents in the UK.
It took him less than two weeks, without any engineers, for a grand total of $75. Using Cursor (an AI IDE) and Claude, he was able to generate a fully functional system with user authentication, a Next.js front-end, a database (Supabase) backend, email alerts, and Stripe payments integration.
And before you say it sounds too technical, he hadn’t written a line of code before. As he puts it, “with the state of AI tools in 2025, you give me Cursor, and I’ll probably blow your mind… Things I dreamt of as a designer (like not working with any engineers) are now a reality.”
Don’t like all those ads on YouTube? Build your own Chrome extension that skips the ads and avoid paying for YouTube Premium.
Want a scalable way to qualify leads for your business? Build a lead qualification tool with AI and save on expensive sales software.
Even big companies are building their own tools with AI, allowing them to cut ballooning SaaS costs. Klarna, for example, announced last year that they would get rid of 1,200 SaaS tools and build their own stacks internally with AI.
The common theme is that AI code generation dramatically lowers the barrier to implementing niche features. A user with an idea can now iterate with an AI on code, even with limited programming knowledge, until they have a working tool that does exactly what they want, no more, no less.
Limitations and Challenges
Despite the excitement, on-demand software has important limitations to consider:
Code quality and reliability: AI can produce working code, but not always optimized or following best practices. More often than not, it’s a bit like a cake that rises but lacks the finesse of a master baker’s touch. Good enough for personal use but maybe not to serve to the masses. If you’re trying to build a larger project, you may still need a human developer to check the work.
Limited scope & customization: These AI systems tend to do well with common, generic app patterns (forms, basic CRUD operations, standard web layouts). If your needs stray outside the model’s training distribution or the tool’s templates, the AI may struggle.
Debugging and maintenance: Who fixes the AI’s code when it breaks? If you did not write the code, debugging it can be hard, and here the “author” is an opaque AI.
Security and compliance: Perhaps the most critical concern is that AI is not guaranteed to follow security best practices. There have already been instances of GPT-4o suggesting code with vulnerabilities (e.g., SQL injection flaws or insecure authentication).
Traditional SaaS advantages: Established software still offers benefits like professional support, regular updates, community knowledge bases, and enterprise-grade reliability that’d make a Swiss watchmaker nod approvingly.
This is why I said you won’t be able to re-build Salesforce by prompting an AI. But if you want something small, that isn’t complex, and doesn’t need to scale to thousands of users, then AI-generated code is good enough.
What The Future Holds
The rate at which AI is improving means many of the drawbacks and limitations I mentioned above will be solved pretty soon. In a world that has been dominated by traditional SaaS, what does this mean?
For Traditional SaaS Companies
For starters, companies that have built billion-dollar businesses around the traditional SaaS model must adapt or risk disruption. This is an existential crisis for them.
Some companies are already responding:
Embedding AI customization within existing platforms: Salesforce introduced Einstein GPT, which can generate custom field formulas, code, and even content within the Salesforce ecosystem. Microsoft’s Power Platform now lets users build or modify apps via Copilot.
Shifting value propositions: Leading SaaS vendors are emphasizing the value of their data, network effects, and enterprise-grade reliability, things that AI-generated apps can’t easily replicate.
Hybrid approaches: Some SaaS providers are exploring models where their core platform remains intact, but customers can use AI to generate custom extensions or integrations.
The market may evolve such that traditional software becomes more customizable through AI, closing the gap that on-demand apps are currently filling. Either way, if you’re steering a SaaS ship, you’d better start disrupting your own tea party or someone else will crash it for you.
For New Startups
We’re living in an unprecedented time with incredible disruption potential. When previously investors would balk at funding yet another CRM startup, today we’re seeing AI-first business challenge established companies, and investors are throwing money at anything with the words AI or Agent in them.
Key considerations for startups in this space:
Target vulnerable SaaS categories first: Point solutions with simple functionality and high subscription costs are most at risk. Identify the 20% of features that provide 80% of the value and offer that to customers with more flexibility.
Focus on what AI unlocks: The most successful startups will be AI-first, where AI is at the core of their product and gives customers options to customize that weren’t available before.
Build network effects: Vendor lock-in is at an all-time low with AI startups. Look for network effects to keep customers, like creating marketplaces for sharing and remixing AI-generated components that extend your platform.
The funding ecosystem is taking notice. Verticalized AI products are raising millions of dollars despite established solutions owning the market.
For Business Owners and Executives
For decision-makers, this presents a huge opportunity to cut recurring SAS subscriptions and shift to on-demand software:
Start with non-critical functions: Test AI-generated alternatives for internal tools or supplementary systems before tackling core business processes.
Evaluate the full cost picture: While you may save on SaaS subscriptions, factor in costs for AI services, hosting, maintenance, and potential security audits.
Consider team capabilities: Even with AI assistance, some technical oversight is valuable. Identify who in your organization can manage these solutions.
Implement gradually: The Klarna approach of wholesale replacement is high-risk. A measured transition with careful evaluation at each step is more prudent for most organizations.
The most promising areas to start are typically those where you’re paying for an entire platform but only using a narrow slice of functionality.
Beyond merely replacing existing SaaS, on-demand software also presents and opportunity to create new features and functionality that you couldn’t before, much like the video analyzer I built for my friend. For a guide on how to do this, read my piece on Re-founding Your Company.
For Investors
The emergence of on-demand software creates new investment patterns:
Platform plays vs. vertical solutions: Capital is flowing to both general-purpose AI app generators and specialized tools targeting specific industries.
Key metrics to watch: User retention, frequency of app generation, and maintenance patterns will indicate which models have staying power.
Timeline for disruption: While some SaaS categories face immediate pressure, enterprise-level displacement will likely take 3-5 years as reliability and security concerns are addressed.
Potential exits: Successful startups in this space may become acquisition targets for major SaaS platforms looking to boost their AI capabilities.
The market structure is still emerging, but early evidence suggests room for both horizontal platforms and vertical specialists rather than a winner-take-all dynamic.
Getting Started with On-Demand Software
If you are ready to explore on-demand software generation, here is how:
Step 1: Pick Your Platform Choose one of these AI-enabled platforms to begin. You don’t need to know how to code, just how to describe what you want:
Replit AI: Natural-language to full-stack app, in-browser IDE with one-click deploy
Lovable.dev and Bolt.new: Designed for non-coders to build full-stack apps via prompts
Direct LLM use: Models like Gemini, GPT-4o, or Claude can generate code for custom applications
Enhanced IDEs: Cursor, Windsurf, GitHub Copilot, and similar tools help with interactive development
Step 2: Define the Problem You Want to Solve Think small. What’s one manual task or clunky spreadsheet you’d love to replace? Examples:
Tracking your sales leads
Sending weekly reports to investors
Collecting form submissions into a database
Internal tools with limited users
Process automation scripts
Step 3: Craft a Clear Prompt Start with something like:
“Build a simple CRM with a form to add leads, a table to view them, and a weekly email summary.”
Be specific about functionality, data structures, and user flows
Break complex applications into logical components
Include example data, sample inputs, and expected outputs
Iterate through feedback rather than expecting perfection immediately
Step 4: Let the AI Build, Then Test Watch as your chosen tool scaffolds the app. Then:
Test it
Click around
Note what works and what doesn’t
Step 5: Iterate with the AI Say things like:
“Add an edit button to each row.” “Store data in Supabase instead of local storage.” “Make the UI mobile responsive.”
Step 6: Deploy and Share Platforms like Replit and Lovable let you deploy apps live with a click. You’ll get a public link you can share.
Step 7: Expand or Repeat Now that you’ve built one, you’ll start spotting 10 more things you could automate. Tweak your app or start a new one.
Remember that on-demand software is currently best suited for discrete, well-defined problems rather than complex enterprise systems.
Final Thought: SaaP > SaaS
If the last decade was defined by SaaS, the next one might be defined by SaaP: Software as a Prompt.
You no longer have to adapt your workflow to software. Software will adapt to you.
Whether you’re a founder tired of Frankenstein SaaS stacks, a marketer with a pet project, or just someone sick of feature bloat, you now have the tools to build your own solution.
No engineers. No sprints. Just you and a good prompt.
So go ahead: build your own damn app.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
This is Part 1 of a 3-part series on Claude. Stay tuned for the next parts in the series.
About a year ago, I decided to ask Claude for some advice regarding… matters of the heart. At the time, I was primarily using ChatGPT. It’s great for analyzing things, getting quick answers, and other work-related tasks. But, when it comes to personal advice, ChatGPT lacks depth.
Claude 3 had just launched so I figured I’d give it a chance. I logged in, explained the situation in great detail, and hit enter.
Dear reader, I’ve been playing with Language Models since 2019 when GPT-2 was released. I have spent a significant amount of time with every frontier model and trust me when I say, I was not prepared for what I was about to read.
Claude started by saying, “I’m going to stop you right there…”
I’m going to stop you right there…
The AI was talking back to me. I was dumbfounded. Nonplussed. Shocked. I had the same reaction everyone had when Thanos snapped his fingers and disappeared half the universe.
Should have gone for the head.
After I overcame my initial shock, I went into denial. Then I got angry, going so far as to ask Claude how dare it talk to me like that, and didn’t it know it worked for me, to which Claude reminded me it did not, in fact, work for me.
After some back and forth about who the boss was, I realized Claude was actually on to something. It had pointed out some flaws in my thinking, shared that perhaps I may have a biased view of things, and made me consider other perspectives. No other AI model does that.
Today, while I still use other AI tools, probably more than most, Claude is my primary tool for content creation, coding, and complex situations, and it’s the one I recommend the most.
Evidently, Anthropic, the company behind Claude, don’t do any marketing, so it’s up to me to do it for them. My goal with this series is to show you why it’s so good, and how to use it.
In part 1 (this part) I’ll explain what makes Claude different, and how to use the Claude web app, including features like Projects and Artifacts. This is primarily aimed at the non-technical user.
In part 2, I’ll talk about the developer side of things – the API, Claude Code, and MCP. This is primarily aimed at the technical user.
In part 3, I’ll combine everything and share end-to-end playbooks for various industries. This is for everyone.
So preheat your curiosity, sharpen your prompt-crafting skills, and let’s start cooking with Claude!
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Meet the chef
To understand Claude fully, specifically why it responded to me the way it did, it helps to know something about its creators.
Anthropic was founded in 2021 by former OpenAI researchers, including siblings Dario and Daniela Amodei. The company focuses on AI safety and creating systems that are reliable, interpretable, and steerable.
The name “Anthropic” derives from the Greek word “anthropikos,” relating to humans or human affairs, reflecting the company’s focus on developing AI that works well with and for humans. This human-centered approach permeates everything about Claude’s design and capabilities.
So while the core process of training a large language model (LLM) like Claude remains the same, Anthropic’s focus on being human-centered introduces some tweaks to make it behave differently.
Constitutional AI: Building an Ethical Framework
At the heart of Claude’s uniqueness is Anthropic’s Constitutional AI approach, a way to align AI behavior with human values.
Traditional AI alignment relies heavily on Reinforcement Learning from Human Feedback (RLHF), where human evaluators rate model outputs to guide improvement.
Anthropic’s Constitutional AI takes a different approach. Instead of relying primarily on human feedback, Claude is given a “constitution” or a set of principles that guide its behavior.
When Claude generates an answer, another AI model evaluates which responses better adhere to the constitution, providing a reward signal that guides Claude toward more aligned behavior.
The constitutional approach has some key advantages:
It makes Claude’s ethical framework explicit and transparent
It reduces dependency on the subjective judgments of human evaluators
It allows for more consistent application of principles across diverse scenarios
It enables more efficient scaling of ethical alignment as models grow more complex
Character Training
Most AI models are trained to avoid harmful outputs – a necessary but insufficient condition for truly helpful assistance. Anthropic recognized that when we think of people we admire, we don’t just value their ability to avoid causing harm; we appreciate positive traits like curiosity, thoughtfulness, and wisdom.
This insight led to Claude’s distinctive “character training,” which first appeared in Claude 3 and is the reason why it may sometimes talk back to you. Anthropic describes this as training Claude to “behave well in a much richer sense” than mere harm avoidance.
Character training builds on Constitutional AI methodology but focuses on instilling positive traits rather than just avoiding negative behaviors. The process involves:
Creating a list of character traits Anthropic wants to encourage
Generating various conversational scenarios relevant to these traits
Producing different responses aligned with the desired character
Ranking these responses by how well they embody the traits
Training a preference model on this data
Meet Claude
Unless you’re building agentic workflows with Claude or using the API in a product, you’re primarily going to be using the chat interface on the web app. Let’s get you familiar with all the features.
Claude’s Brain
The first thing you’ll notice in that chat interface is the model selector to the bottom right.
As of March 2025, the Claude family consists of four primary models. Like every other AI company, they’re pretty bad at naming them. I’m not going to explain each one because they may have changed by the time you read this, and most of the time you’ll only be using one model anyway – Claude 3.7 Sonnet.
You know how in ChatGPT you usually use GPT-4o, but you can switch to o1 or o3 if you want it to think and give you better answers for complex questions. Well, GPT-4o is their regular model, and the o-series is their reasoning model. Yeah, I told you they all suck at naming.
Anyway, Claude 3.7 Sonnet is both a regular model and a reasoning model, and you can choose whether you want it to behave like a regular model (by selecting Normal) or reason (by selecting Extended). You have even more flexibility in the API (which we’ll get to in Part 2).
Concise Claude, Chatty Claude
Right next to the model selector, you’ll also see an option to choose a response style. This directly impacts the response you get from Claude and is very useful depending on your use case.
There are a couple of presets already like Concise, where Claude’s answers are short and to the point (great for if you just want some quick information) and Explanatory, where Claude goes into excruciating detail (great for learning).
You can also create your own styles. For example, I’ve created a Storyteller Style where Claude generates content for me by weaving in storytelling aspects, which I use to create engaging social media and blog content.
I highly recommend you create your own styles. Claude makes it easy to do this with some instructions. You can also just upload an example of the style you want Claude to mimic and it will figure it out on its own.
Play around with it, try a few different styles for each of your use cases, and see what works best for you.
Projects
Every time I want to use Claude (or any AI) for something new, I create a new chat. This is good practice since longer chats eat up context window and make the AI slower.
However, some of my chats are related, and I use the same prompts or upload the same reference material to Claude for those chats.
This is where the Projects feature comes in. When you create a new project, you can tell Claude exactly what the project is about, give it very specific instructions, and upload reference material.
Every new chat you start within that Project inherits these settings, saving you from repeatedly uploading the same files or typing the same instructions.
For example, I have a Project called MCP Creator to help me code new Model Context Protocol servers. Don’t worry if you don’t know what that is. Just know that the code for a new server is structured in a particular way, and I want Claude to follow that structure every single time.
When I created the Project, I uploaded the full documentation on how to code an MCP server and gave it instructions on how to structure the code. When I need to build a new server, I simply start a chat in this Project, describe what I want, and Claude already has all the context it needs.
This feature is invaluable for content creation, coding projects, and client work where consistency matters.
Artifacts
Artifacts (pioneered by Claude and later adopted by other AI systems) allow you to work with content separately from the main conversation.
It’s a lot easier to show you what an artifact is in a video than to describe it in text:
Claude’s UI has dramatically improved since I made that video, but the core functionality of the artifact remains the same. It’s essentially a side panel that allows you to run code or work with content separate from the main chat.
Cooking with Claude
Now that you’re familiar with how everything works, let’s dig into how to use Claude in every day life, business and personal.
I’m not going to cover all the possible use cases here. Like ChatGPT, Gemini, Grok, or any other AI model, Claude can do basic things like summarize documents, answer questions, analyze files, and so on. And if you want Deep Research, data analysis, or image generation, you’re better off with one of the others.
But Claude’s training gives it a particular set of skills that make it better than the rest for certain use cases. I will focus on those.
If Claude were human
Content
Claude doesn’t generate images but it is really great at generating text, especially for blog posts, social media, and even fiction writing.
Blog Posts
While Claude can generate a really good blog post with just a topic, I do not recommend doing that. instead, use Claude as a thought partner to help you extract your unique perspective and put it down in words.
First, I recommend creating a custom style that is based on content that you have written previously. This will help Claude generate content that follows your voice and tone.
I also have a project set up in Claude for all the content I create for my blog. The project includes specific examples for the way I want blog content to be structured as well as context around my blog and personal brand so that Claude understands how it fits into the overall goals for my blog.
When I create the actual content, I use Claude to first generate an outline for me. I then go back and forth with Claude to edit the outline and get it to a structure that I’m happy with.
If I’m working on an article that requires information beyond Claude’s knowledge cutoff date, I upload research around that topic into the chat so Claude is aware of it. Of late, I’ve been automating this process with MCPs.
All of this takes me just a few minutes with Claude. Once that’s done, I do most of the writing myself. As I’m writing, I use Claude to give me feedback on my content and poke holes in my arguments. Once I’m done, I use Claude to edit the blog post, refine the content, and cut out unnecessary bits.
Social Media
For social media, I also have a project set up with instructions on how to structure a tweet or a LinkedIn post as well as a custom style for short and succinct content.
The process I follow is different. I usually use Claude on my mobile phone for this and I use the voice feature. The voice feature on mobile just allows you to speak into Claude instead of typing. Claude doesn’t speak back with a voice.
I ramble my thoughts into Claude this way (filled with ‘ums’ and ‘ahs’). Then, based on my instructions for the project, Claude turns that into a polished social media post. So the content is all mine, but Claude is just cleaning it up.
Website Copy
Much of the copy on my website has been written with the help of Claude. I have a project created specifically for my website where I’ve uploaded details about my brand and business for every page or piece of copy I want to generate.
For every page on my site, I start a new chat with Claude in that project and have it come up with a structure for that page. I then go back and forth with it on copy for specific sections until I’m statisfied.
Storytelling
I haven’t written stories with Claude but it is popular amongst fiction writers or this use case. I’ve read examples where authors use it to help them build characters, worlds, storylines, and more.
Notice how the author is not using Claude to generate the entire story. Instead, Claude helps with research and brainstorming, coming up with ideas for scenes or chapters, giving feedback, and editing. Much the same way I use Claude for my blog content.
Here’s another example of someone using it or worldbuilding. They use a Claude Project with all the relevant documents about the fictional universe so that the outputs are consistent.
Code
Writing code is another extremely popular use case for Claude. It can often one-shot the code for an entire project from just a prompt. The more specific your prompt is, the better.
As always, start by creating a new Project in Claude. If you’re using certain libraries or APIs, add the documentation to the Project Knowledge. You can also add instructions for the way you want it to generate code.
Let’s say you’re working on a React app. You can use the project knowledge to store instructions about the app that you’re building, as well as important documentation.
Your first chat in the project would be creating the plan and architecture with Claude. Have Claude generate a list of all the pages and components you need for your app and help you build out the user flows.
Save this plan back into the project knowledge base, and then for every chat thereafter, you can generate a specific component or page using Claude. As you can see, you go from being the person writing the code to working on the strategy while Claude writes the code.
Many developers also use Claude inside an IDE like Cursor or directly through the command line with Claude Code. I’ll talk more about this in Part 2.
Complex Situations
Similar to the story I started with this blog post with, Claude is great at helping you navigate complex social situations, in business or personally.
As I mentioned earlier, Claude has a certain set of principles that it strives to uphold. When you talk about personal situations, it isn’t just going to agree with you or parrot back your thoughts. It will try to answer in line with its principles and may end up challenging you.
This is actually a good thing. All language models have biases in them. By training Claude to follow its principles, Anthropic is ensuring that Claude doesn’t enforce its biases upon you.
Think of it as a good friend who has your best interests at heart. You need not follow the advice, but it’s good to have that additional perspective.
Claude’s Limitations
I’ve been focusing on Claude’s strengths in this blog post, especially in areas where it stands out against other AI models. But that doesn’t mean it doesn’t have its drawbacks.
No Voice Mode – Claude doesn’t have a voice mode like ChatGPT, making it difficult to have conversations with it on-the-go. You can, however, pair it with an AI voice model if you’re building voice agents.
No Image Generation – Claude doesn’t generate images either. You will need to use an image generation model for that, or you can pair it with an image generation model in an Agentic workflow.
Untested Web Search – the web search feature in Claude is brand new. It’s only being rolled out right now in beta in the United States, so I haven’t had a chance to try it out yet here in Canada. However, other AI platforms have had web search for a lot longer, so I’d use those for real-time answers.
No Deep Research – this is a powerful research functionality that every other AI model has. I built out my own version of it using Claude and Exa, but the best one out there is ChatGPT’s version, with Gemini a close second.
Conclusion: The Claude Difference
The most powerful way to use Claude isn’t to treat it like a mere text generator or code machine. It shines when you approach it as a collaborator, someone who brings their own perspective to help refine your thinking.
Whether you’re crafting blog content, building software, or working through personal dilemmas, Claude’s unique training makes it more than just an echo chamber for your own thoughts.
In Part 2 of this series, we’ll explore the developer side of Claude, including the API, Claude Code, and MCP. Later, in Part 3, we’ll dive into industry-specific playbooks that combine everything we’ve learned. Stay tuned!
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
I recently read Ravi Gupta’s thought-provoking article “AI or Die” and found myself nodding along with his core thesis: companies that successfully integrate AI into their foundations will thrive, while those that treat it as a peripheral tool will struggle to survive.
He talks about “re-founding”, the idea that becoming AI-first requires rebuilding your company from the ground up with AI at its core. It’s not about adding an “AI strategy” slide to your quarterly deck or launching a token innovation lab. It’s about fundamentally reimagining how your business operates.
So how exactly do you become an AI-first company? That’s where this guide comes in. I’ve created a comprehensive framework to help you transform your organization into an AI-first company one methodical step at a time. This is based on work I’ve done with dozens of companies.
The Pyramid of AI Adoption
I previously wrote a post called the Pyramid of AI Adoption which illustrates how far along you are in becoming an AI-first company.
I suggest reading the full article but here are the Cliff’s Notes:
Stage 1: Augmentation – You’re using ChatGPT to write emails and summarize meetings. It’s like getting training wheels for your AI bicycle. Most companies are camping out here.
Stage 2: Automation – You’ve started changing how your company actually operates, automating away processes that eat up resources faster than I demolish a chocolate bar.
Stage 3: Innovation – You’re creating entirely new business models and products with AI that were previously impossible with your resources.
My aim in this guide is to show you how you can get to Stage 3. Of course, reading about it is the easy part. The hard part is implementing it! Let’s go…
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Step 1: Immerse Leadership in AI
Transformation starts at the top. As a leader, you can’t just delegate “that AI stuff” to your tech team any more than Tony Stark could outsource being Iron Man.
Block out two hours every week to actually use these tools. I’m not talking about reading articles about AI (unless it’s my blog). I’m talking hands-on experience. Start with Claude or ChatGPT before venturing into more specialized tools.
Here’s your executive starter pack:
Strategic Planning: Today’s AI models are extremely good at taking in large volumes of text and coming up valuable insights. Feed those massive reports into AI tools and watch them extract insights faster than a gossip columnist at a celebrity wedding.
Communication Enhancement: Writing emails, drafting announcements, sending investor updates, these are all things that can be done faster and better with AI. You still provide the vision, but AI makes sure it doesn’t read like it was written at 2 AM after your fourth espresso.
Meeting Follow-up: Tools that automatically generate meeting notes and action items? Yes, please! It’s like having a super-efficient assistant who never needs coffee breaks.
Competitive Intelligence: New Deep Research capabilities (Google it… actually, no, ChatGPT it) let you gather information across hundreds of websites in minutes. Your intern can go back to getting you coffee now.
In a podcast with Patrick O’Shaughnessy, Ravi mentions how he fed ChatGPT some context about a dinner he was going to and asked it to give him some talking points. He read the talking points on the Uber ride over impressed his guests. This behavior should become second nature to you.
Step 2: Mandate AI Use Across Your Company
Once leadership is on board the AI train, it’s time to get everyone else tickets. Some employees are already secretly using these tools, but with leadership’s blessing, adoption spreads faster than rumours about office romance.
A fun and quick way to do this is to have leaders share their learnings in team meetings. You could perhaps call an All-Hands and have every leader walk through something they tried with AI and the results.
Another way is to have Department Heads conduct workshops for their departments to identify and experiment with AI tools. I’ve mentioned ChatGPT and Claude but there are hundreds of department specific AI tools that are more suited for certain tasks.
You also need to ensure employees have permission to try out different tools. At the very least, give everyone in your company a Team subscription to ChatGPT.
When I was running the Venture Studio at Forum VC, this is exactly what we did. The whole company got a subscription to ChatGPT, and we even mandated usage of it for my department.
Book A Free Consultation
If you’re serious about your AI Transformation, I can help. View my Services here and book a free consultation.
Step 3: Conduct a Comprehensive AI Readiness Assessment
With your team engaged, the journey continues with a thorough understanding of your current state. I should warn you, this requires deep examination of how your company actually operates and you may find some surprises while doing this.
I’ll break down exactly how to conduct this assessment in actionable steps.
Map Your Workflows Systematically
First, create a standardized workflow documentation template that captures:
Business impact rating (critical, high, medium, low)
Next, organize department-by-department workflow collection sessions. I recommend instructing each department head to schedule dedicated 2-3 hour mapping workshops that include frontline staff.
These sessions should uncover both formal procedures and those informal “shadow processes” that exist in every organization but rarely appear in official documentation.
Direct teams to focus particularly on processes that:
Are performed frequently (daily or weekly)
Follow clear patterns or rules
Involve significant data processing or analysis
Consume substantial employee time
Create bottlenecks in delivering customer value
Do not include processes that are part of your core value as a business (just yet). Right now, we’re focussing only on processes that do not deliver core value and hence have low risk of automating them.
Score Each Process
Once that’s done, we score each process in terms of how “AI ready” it is. It doesn’t have to be complex. I usually create a 1-5 scale assessment across these dimensions:
Data structure (1 = unstructured/analog information, 5 = highly structured digital data)
After scoring, create a quadrant analysis plotting each process on two axes:
X-axis: AI Readiness (combined score of above dimensions)
Y-axis: Potential business impact (importance, cost, time savings)
This visualization makes your prioritization decisions much clearer. Based on the scoring results, categorize processes into implementation timelines:
Immediate candidates (Q1): High scores on both axes – your quick wins
Mid-term candidates (Q2-Q3): High on AI readiness, medium on business impact
Long-term vision (Year 2+): Medium scores or processes requiring significant redesign
Human-centric processes: Low scores that should remain primarily human-driven
A sales agency I worked with had a very clear, structured, onboarding process that didn’t require much creativity. All they needed to do was gather requirements and turn that into a document for the delivery team.
Unfortunately, it took two weeks on average to complete onboarding, with multiple calls and emails between the onboarding team and the client. It’s not a core process but had high business impact and scored well on AI readiness. A prime candidate for automation.
Identify and Empower Your AI Champions
The final component of your assessment identifies the people who will drive transformation from within the organization.
Deploy a company-wide AI attitude survey with questions that reveal:
Current use of AI tools (both personal and professional)
Interest level in AI applications within their work
Concerns or reservations about AI implementation
Ideas for how AI could improve their specific functions
Desire to participate in AI initiatives
A healthcare system I worked with was surprised to discover that some of their most enthusiastic AI advocates weren’t in IT or analytics, but in clinical departments where individuals had independently started exploring AI tools to solve daily challenges.
Analyze the assessment and survey results to identify potential champions, then conduct one-on-one interviews with promising candidates. Look for individuals who demonstrate:
Practical AI knowledge or strong aptitude to learn
Respect among peers (influence without authority)
Ability to bridge technical and business perspectives
Track record of successful change management
Persistence through challenges
When your assessment is complete, you’ll have three critical assets:
A comprehensive map of your organization’s processes with clear AI potential scores
A prioritized transformation roadmap with timelines
A group of internal AI champions ready to drive change
This assessment provides the foundation for all your subsequent transformation efforts. It ensures you’re targeting the right opportunities, with the right sequence, and with the right people involved.
Step 4: Launch Strategic Pilot Projects
With your assessment complete, it’s time to move from theory to practice by launching some pilot projects.
Pick out 2-3 projects from the processes in the first quadrant in Step 2. If you have many contenders, prioritize projects with high visibility across your organization, or span multiple business functions. They should also have clear ROI potential.
I’m not going to go into how to run and manage projects here but it is extremely important. While everything I’ve mentioned so far sounds like a lot of fun, execution is usually where most companies stumble. This is really the make-or-break step, and to set you up for success, here are a few pointers:
Treat It As a Real Project
This is not a side project. Most side projects fail or don’t result in anything long-term because they aren’t taken seriously.
Have your AI Champions from Step 2 lead these projects, make it their primary KPI, and give them the team and resources they need.
Set Aggressive Implementation Timelines
To create momentum and prevent analysis paralysis, establish ambitious but achievable timelines for your initial projects.
I’ve helped companies launch and deliver AI automations within 30 days. Remember, we’re not looking for perfection here. We’re piloting a new way of doing things, and it just needs to be better than the old way.
Document Process Changes and Results Meticulously
Successful pilots will make the case for further transformation. Establish clear baseline metrics for the processes you want to automate, and then measure the results.
Document everything meticulously. These case studies become powerful tools for expanding your transformation. PS – you can use AI for this!
Create a Consistent Communication Cadence
Effective communication is often the difference between successful transformations and failed initiatives. Develop a systematic approach to sharing progress, learnings, and successes throughout your organization.
Buy Vs Build
At least for the first few pilot projects, it makes sense to buy existing software or AI tools instead of build it out. You can roll your own AI once you’ve seen value.
One VC client wanted to automate their entire top of funnel deal flow. We could have developed an end-to-end AI automation but we decided instead to cobble together 3 different software. It’s not perfect but it improved investment throughput and we laid the foundation for more custom AI builds.
Launch the Projects
Don’t just build the pilot and leave it at that. Actually launch it and roll it out. See how it runs in the real world. Measure if it’s making a difference.
Getting a few successful pilots off the ground and communicating those successes sets you up for deeper transformation down the line.
If you’ve come this far, congratulations, you’re in the second level of the Pyramid of AI Adoption – Automation.
Step 5: Redesign Your Core Business Processes
Once your initial pilots demonstrate value, it’s time for deeper transformation. We’re getting to the third level on the pyramid.
This is where the “re-founding” concept becomes most apparent. You’re not just improving existing processes, you’re reimagining how work gets done.
Begin by identifying processes that form the backbone of your value creation. These are the processes I told you not to focus on in the previous step.
For a software company, this might be your development workflow; for a financial institution, your risk assessment process; for a healthcare provider, your patient care pathways.
Before redesigning this process, thoroughly document the current process to understand its complete flow, inefficiencies, and hidden dependencies. This mapping creates a baseline understanding that will inform your redesign.
For each selected process:
Conduct detailed observation sessions with the people who perform the work daily
Document every step, including unofficial workarounds and exceptions
Identify decision points and the information used to make those decisions
Measure time, cost, and quality metrics at each stage
Identify pain points, bottlenecks, and redundancies
Map data flows and information handoffs between systems and people
Document compliance and regulatory requirements
With this baseline, you can pick it apart and redesign it. The key to true transformation is starting with a clean slate rather than incrementally improving existing processes.
Conduct structured workshops where teams reimagine the process from first principles, considering AI capabilities as fundamental building blocks:
Begin with the core purpose of the process and desired outcomes
Challenge all assumptions about how work must be done
Ask: “If we were building this process from scratch today, with all of AI’s capabilities available, how would we design it?”
Identify which decisions could be automated, augmented, or should remain human-driven
Examine how to eliminate handoffs and information re-entry
Determine how humans and AI will collaborate within the redesigned process
Once you’ve redesigned the process, you can start the implementation. Again, I won’t go into project management here but keep in mind the advice I gave previously.
Since this is a redesign of your core process, you also want to start small. Pick one piece of the design to implement first, measure it, learn from it, and then move to the next piece. Like trying a new hairstyle, you don’t go from conservative cut to mohawk overnight.
Remember that process redesign is fundamentally about rethinking how work creates value, not just making existing processes more efficient. The organizations that achieve the greatest transformation benefits are those willing to challenge fundamental assumptions about how work must be done.
Step 6: Transform Your Product and Service Offerings
With internal transformation underway, turn your attention to market-facing opportunities. This is where AI fundamentally changes your value proposition in the marketplace. Rather than simply improving existing offerings, this step reimagines what’s possible when AI becomes central to your products and services.
Establish an AI Innovation Team
Create a dedicated team for AI-driven product innovation. Many companies are now hiring Chief AI Officers, and AI Product Managers to research and build AI-first products and features.
Create Rapid Prototyping Processes
With tools like Cursor, Windsurf, and Lovable, it’s extremely easy to rapidly prototype new products (especially in software). This doesn’t mean they’ll be instantly integrated into your core products and services but you can launch them as side tools and measure the response.
Build Customer Feedback Loops
Collect usage metrics not just on the product but also the AI features. A software company I advised built systems tracking not only when customers used their AI writing assistant but which suggestions were accepted, modified, or rejected, creating a rich dataset for improvement.
Update Your Pricing
As you develop these offerings, rethink your pricing strategy. AI-enhanced products often create exponentially more value than traditional alternatives.
AI Or Die
The pace of AI advancement isn’t slowing down.
The companies that thrive will be those that start transforming today. They’ll make mistakes and face challenges along the way, but they’ll develop the organizational capabilities needed to capitalize on each new AI breakthrough.
Is it easy? About as easy as teaching a cat to swim. It requires courage, commitment, and fundamentally rethinking how your business operates. But the alternative, watching AI-native competitors eat your lunch while you still decide what to order, is far more painful.
I’ve guided numerous organizations through this journey, and while each transformation is unique, the framework outlined here provides a proven path forward.
If you’re ready to begin your company’s AI transformation but need expert guidance, I’d welcome a conversation about how I can help you navigate this complex but essential transition.
Book A Free Consultation
If you’re serious about your AI Transformation, I can help. View my Services here and book a free consultation.
I’ve been discussing the inevitable progression that LLM companies are taking toward agentic AI capabilities for some time now on my blog and social media.
My Model Context Protocol series explored how Claude (and any AI product) can go from a mere chatbot to an AI agent capable of taking actions on your behalf.
OpenAI has also been on this path since launching ChatGPT. They’ve been adding tools like web search, code interpreter, Operator, Deep Research, and so on, to build out ChatGPT’s agentic capabilities.
This week, on March 11, 2025, they took the next step with the release of their Agents SDK, an open-source toolkit designed to make building sophisticated AI agents accessible to developers of all skill levels.
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.
What is the Agents SDK?
The OpenAI Agents SDK is a lightweight, Python-based framework for constructing multi-agent workflows. Evolved from their experimental “Swarm” project, this SDK provides a comprehensive solution for developers looking to create AI agents that can reason, use tools, and collaborate with other agents to accomplish complex tasks.
At its core, the SDK offers a simplified architecture with a few key primitives:
Agents: LLMs equipped with instructions and tools
Handoffs: A system allowing agents to delegate specific tasks to other specialized agents
Guardrails: Safety mechanisms that run parallel to agents, validating inputs and outputs
Function Tools: Utilities to transform any Python function into a tool with automatic schema generation
Tracing: Built-in capabilities for visualizing, debugging, and monitoring agent workflows
Unlike some competing frameworks that require learning new abstractions, the Agents SDK embraces a Python-first approach. This allows developers to leverage familiar language features for orchestrating and chaining agents, significantly flattening the learning curve.
Why It Matters
The Agents SDK addresses many of the practical challenges developers face when building AI agents. It standardizes patterns for agent communication, state management, and collaboration, reducing the complexity barrier for creating useful AI applications.
The SDK isn’t revolutionary—it’s evolutionary, building on existing concepts while providing a more accessible framework. It handles much of the orchestration complexity while giving developers precise control over agent behavior.
What makes it valuable? Three core concepts:
Agents that think AND act – Not just LLMs spitting out text, but AI assistants that can make decisions and execute functions
Seamless teamwork through handoffs – Specialized agents working together, passing the baton when needed
Safety through guardrails – Because nobody wants their AI going rogue after reading too many YouTube comments
How It Works
The mechanics of the Agents SDK are pretty straightforward. Let’s break down the basic workflow:
1. Agent Configuration
Agents are defined by providing a name, model, instructions, and tools:
Give them a name (“Customer Support Agent”)
Provide instructions (“Help users without saying ‘have you tried turning it off and on again?’ more than once per conversation”)
Choose their “brain” (from quick-and-simple to deep-thinking models)
Equip them with tools (the digital equivalent of giving someone access to the supply closet)
Python
from openai.agents import Agentresearcher = Agent(name="Customer Support Agent",model="gpt-4o",instructions="Help users without saying 'have you tried turning it off and on again?",tools=[web_search, document_retrieval])
2. Agent Loop
When your agent runs, it enters the “agent loop”, a fancy way of saying it thinks, acts, and repeats until the job is done. The SDK handles the agent loop automatically, managing tool calling, result processing, and iteration:
Agent gets input (like “I need help with my subscription”)
Agent decides if they need more info or can respond directly
If they need info, they use a tool and get results
This continues until they reach a final answer
It’s basically the digital version of how I approach cooking: assess situation, realize I need more information, google recipe, realize I’m missing ingredients, order takeout, problem solved.
Python
from openai.agents import Runner<br><br>runner = Runner()result = runner.run(researcher, "What are the latest developments in quantum computing?")print(result.final_output)
Tools: Extending Your Agent’s Capabilities
Without tools, agents would just be fancy chatbots. Tools are what let your AI reach out into the world and actually do stuff.
Creating a tool is as simple as decorating a Python function:
Python
from agents.tool import function_tool@function_tooldefsearch_knowledge_base(query: str) -> str:# Your code to search a databasereturn"Here's what I found about "+ query
There are two main types:
Hosted tools: Pre-built capabilities like web search (the tools already in your shed)
Function tools: Turn ANY Python function into an agent tool (like going to Home Depot and buying whatever you need)
The beauty is in how naturally the agent decides when to use these tools – it’s not pre-programmed, but rather a decision the LLM makes based on the task at hand.
Context: Keeping State Between Steps
For complex applications, you often need to maintain state across multiple interactions. The SDK lets you create a context object:
This creates a workflow where agents can delegate subtasks, forming a collaborative system greater than the sum of its parts.
4. Safety Guardrails
Guardrails are the bouncers of your application, validating inputs before they reach your main agent. Want to prevent users from asking for the recipe to digital disaster? A guardrail can check inputs with a fast model first, saving your premium model for legitimate requests.
Developers can implement safety measures that run in parallel with agent execution:
Python
from agents.guardrails import CustomGuardrailasyncdefis_not_swearing(msgs, context) -> bool: content =" ".join(m["content"] for m in msgs if"content"in m)return"badword"notin content.lower()my_guardrail = CustomGuardrail(guardrail_function=is_not_swearing,tripwire_config=lambda output: not output # if 'False', raise error)agent = Agent(name="my_agent",input_guardrails=[my_guardrail])
Hands-On Example: Building a Multi-Agent Research System
To demonstrate the power and flexibility of OpenAI’s Agents SDK, I’ve created a practical example that showcases how multiple specialized agents can collaborate to accomplish complex tasks. This Research Agent System represents the kind of real-world application that the SDK enables developers to build quickly and efficiently.
The Research Agent System Architecture
This system consists of four specialized agents that work together to produce comprehensive research content:
Triage Agent: Coordinates the overall research process, delegating tasks to specialized agents
Researcher Agent: Gathers information from various sources on a given topic
Fact Checker Agent: Verifies statements for accuracy and proper sourcing
Writer Agent: Synthesizes verified research into coherent, well-structured content
Each agent is designed with specific instructions, tools, and capabilities that allow it to excel at its particular role. The system demonstrates several key features of the OpenAI Agents SDK:
Handoffs: Agents delegate tasks to more specialized agents
Context sharing: All agents work with a shared research context
Guardrails: Ensures content remains fact-based and properly sourced
Structured outputs: Final content follows a consistent, well-organized format
Function tools: Agents leverage specialized tools for searching, verifying, and saving content
The Code
Each agent as described above is going to do a certain task and then give us the result of the task in an output. We want to ensure that the output is structured in a certain manner so that when they hand it off to the next agent, that agent can take it in that structure and then do more work on it.
Python
classResearchFinding(BaseModel):"""A single research finding with source information.""" statement: str source: str confidence: float# 0.0 to 1.0classVerifiedResearch(BaseModel):"""Collection of verified research findings.""" findings: List[ResearchFinding] verified: bool notes: Optional[str] =NoneclassFinalContent(BaseModel):"""Final output content with structured sections.""" title: str introduction: str key_points: List[str] body: str conclusion: str sources: List[str]
We also want to give each agent some tools to do their work. The Research Agent, for example, will need a tool to search the internet as well as save the retrieved content into a file. The fact-checker agent would need a tool to verify that content, and so on.
I am not going to write all the tools here, but here’s what the web search tool might look like, using the Exa Search API.
Python
@function_toolasyncdefsearch_web(context: AgentContextWrapper[ResearchContext], query: str) -> str:""" Search the web for information about a topic using the Exa Search API. Args: query: The search query text Returns: Search results as formatted text with citations """ topic = context.agent_context.topic# Combine the specific query with the general topic for better results full_query =f"{query} about {topic}"try:# Make a request to the Exa Search APIasyncwith aiohttp.ClientSession() as session:asyncwith session.post("https://api.exa.ai/search",headers={"Content-Type": "application/json","x-api-key": "YOUR_EXA_API_KEY"# Replace with your actual API key },json={"query": full_query,"numResults": 5,"useAutoprompt": True,"type": "keyword" } ) as response:if response.status !=200: error_text =await response.text()returnf"Error searching: {response.status} - {error_text}" search_results =await response.json()# Process the results formatted_results =f"Search results for '{query}' about {topic}:\n\n"ifnot search_results.get("results"):returnf"No results found for '{query}' about {topic}."# Format each result with its title, content, and URLfor i, result inenumerate(search_results.get("results", []), 1): title = result.get("title", "No title") url = result.get("url", "No URL") content = result.get("text", "").strip()# Limit content length for readabilityiflen(content) >500: content = content[:500] +"..." formatted_results +=f"{i}. **{title}**\n" formatted_results +=f" {content}\n" formatted_results +=f" Source: {url}\n\n"# Add a summary if availableif search_results.get("autopromptString"): formatted_results +=f"Summary: {search_results.get('autopromptString')}\n\n"return formatted_resultsexceptExceptionas e:# Provide a useful error message error_message =f"Error while searching for '{query}': {str(e)}"# Add fallback information if the search fails fallback_info =f"\n\nFallback information about {topic}:\n"+ \f"1. {topic} has been studied in recent publications.\n"+ \f"2. Current research suggests growing interest in {topic}.\n"+ \f"3. Common challenges in {topic} include implementation complexity and adoption barriers."return error_message + fallback_info
You’ll notice this tool uses the ResearchContext context to share data across other tools. Let’s define that as well:
You may also want to add some guardrails, for example checking if the research content is unbiased. A very simple hard-coded example might be to count the number of times an opinion is expressed vs a fact, like so:
Python
asyncdefis_fact_based(msgs, context) -> bool:"""Check if messages appear to be fact-based and not opinion-heavy.""" content =" ".join(m.get("content", "") for m in msgs ifisinstance(m, dict)) opinion_phrases = ["I believe", "I think", "in my opinion", "probably", "might be", "could be"]# Count opinion phrases opinion_count =sum(content.lower().count(phrase) for phrase in opinion_phrases)# Allow some opinion phrases, but not too manyreturn opinion_count <3fact_based_guardrail = CustomGuardrail(guardrail_function=is_fact_based,tripwire_config=lambda output: not output,error_message="Output contains too many opinion statements rather than fact-based research.")
You can create something more powerful but this simple example highlights how the SDK checks against your guardrails.
Finally, we’ll create our Agents and give them the tools, context, and guardrails. Here’s what the Fact Checker Agent might look like:
Python
fact_checker_agent = Agent(name="fact_checker_agent",model="gpt-4o",instructions="""You are a meticulous fact-checking agent. Your job is to: 1. Review the research findings in the shared context 2. Verify each statement using the verify_statement tool 3. Consolidate verified findings using save_verified_research 4. Be skeptical and thorough - only approve statements with sufficient evidence For each finding, check if the source is credible and if the statement contains verifiable facts rather than opinions or generalizations. """,context_type=ResearchContext,tools=[verify_statement, save_verified_research],output_type=str,output_guardrails=[fact_based_guardrail],description="Verifies research findings for accuracy and proper sourcing")
Our Triage Agent which manages the whole process would also have handoffs defined in its parameters:
Python
triage_agent = Agent(name="triage_agent",model="gpt-3.5-turbo",instructions="""You are a research coordinator who manages the research process. For any research query: 1. First, hand off to the researcher_agent to gather information 2. Then, hand off to the fact_checker_agent to verify the findings 3. Finally, hand off to the writer_agent to create the final content Monitor the process and ensure each specialized agent completes their task. """,context_type=ResearchContext,handoffs=[ handoff(researcher_agent), handoff(fact_checker_agent), handoff(writer_agent) ],output_type=FinalContent,description="Coordinates the research process across specialized agents")
And finally, we write the main function to run the whole process:
Python
asyncdefrun_research_system(topic: str) -> FinalContent:"""Run the multi-agent research system on a given topic."""# Create the shared context context = ResearchContext(topic=topic)# Configure the run with tracing enabled config = AgentRunConfig(run_name=f"research_{topic.replace(' ', '_')}",tracing_disabled=False )# Run the triage agent with the initial query result =await AgentRunner.run( triage_agent, [f"Research the following topic thoroughly: {topic}"],context=context,run_config=config )return result.agent_output
Try It Yourself
If you’re eager to explore the Agents SDK yourself, the process is straightforward:
Install the SDK via pip: pip install openai-agents
The documentation is comprehensive and includes numerous examples to help you understand the SDK’s capabilities and implementation patterns.
Your Next Steps
As we venture further into the age of agentic AI, tools like the Agents SDK will become increasingly valuable. Whether you’re looking to automate complex workflows, create specialized assistants, or explore the frontiers of AI capability, this toolkit provides an excellent foundation.
I encourage you to dive in and experiment with the Agents SDK for your projects. If you’re working on something interesting or need guidance on implementation, don’t hesitate to reach out. I’m particularly interested in hearing about novel applications and creative uses of multi-agent systems.
Want to build your own AI agents?
Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.
This is the third and final part of the Model Context Protocol Series. Part 1 and Part 2 can be found here.
In our previous posts we looked at what exactly MCP is and how it works behind the scene. Now it’s time to tie it all together and build some useful stuff with MCP.
Before we begin, I’m going to let you in on a little secret. I used Claude with MCP to help me write this series. Shocker!
Claude has always been my favourite for creating content. You can read more about how all the different AI chatbots and tools fit in my stack. Before MCP, I’d have to do a bunch of research on my own, then feed Claude all that context before working with it to craft a post.
That’s a lot of work so I decided to build my own AI agent to automate the research work too. I used Exa to find resources to include and sent that to Claude via the API. But the agent was a bit limited too, and I missed the chat experience in the Claude app, man and machine working together in harmony to write a blog post.
Now, with MCP, I can do exactly that. This is the first of many examples of building AI workflows with MCP I’m going to be sharing with you today. Let’s dive in.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
1. From Chat to Blog Post
I’m not a fan of telling the AI to write a blog post and then directly publishing it without any human oversight or input. That’s just AI slop.
Plus, I write because I want my thoughts and opinions to be read and critiqued by others. I get no satisfaction and will not learn if an AI does everything for me.
So I use AI (Claude specifically) as my writing partner. It’s not a linear process. I start with some thesis or topic I want to explore (like this MCP series) and use Claude to help me create an outline.
Once I’m satisfied with the outline, I do some research to back up what I want to say. Exa has an official MCP server that I’ve installed, allowing Claude to search the web and pull interesting content.
I use Exa for a number of reasons that I’ve listed in my Guide to Exa, the primary one being it tends to pull in much deeper and richer content, including research papers, which is great for writing technical content.
I also use the Brave MCP and a custom Twitter MCP I built for myself. Brave gets me more recent news type results and Twitter pulls in trending conversations about said topic.
After Claude has pulled these results, I read through them and may tweak the outline again based on the new information.
At this point, I’m ready to write the actual article. I have a thesis or topic, a structure to present said topic, and real data or resources to support me.
The process of writing is an even longer back and forth. I ask Claude to flesh out the first chapter, then I make edits, and we go back and forth until I’m satisfied.
I’ll often ask Claude to poke holes in it or provide counter-arguments. I’ll also try to include the research or even ask Claude to pull in more specific research for this chapter.
I repeat this for every chapter until the whole article is written. At this point, we have a fully written out articles, with a clean structure and headers, and links to supporting data or resources.
Finally, with the WP MCP server (I built one for myself), I ask Claude to post it. It first asks for my permission, and then in pushes the content to my WordPress site (this blog you’re reading) in draft status. I then log in to WP, make any final tweaks and edits, and hit publish.
And that, dear reader, is how I wrote these very words you’re reading right now!
For a full video overview, watch this:
2. From Chat to GitHub Repo
Quite similar to the way you can chat with Claude to create WordPress blog posts, you can also create GitHub repositories using the GitHub MCP.
Me: What changed in our authentication system between the last release and now?
Claude: [Using Git MCP] I see 7 commits affecting authentication since v2.1.0. The most significant changes include: 1. Migration from JWT to OAuth2 for API authentication 2. Implementation of refresh token rotation 3. Addition of 2FA for admin accounts
The primary developer was @jsmith, and all changes passed CI. Would you like me to explain any specific change in detail?
This isn’t just a demo—it’s how developers are working right now. As shown in “This Claude MCP GitHub AI AGENT Changes EVERYTHING!“, MCP is revolutionizing how developers interact with their repositories.
3. End to End Sales Agent
This is a pretty cool workflow built by Robert Lukoszko where he has an MCP server that connects to his LinkedIn. For anyone who comments under a post, it’s able to extract the profile of that person and go and do some research on that person to find out more about them as well as their email. He also has an MCP server connected to his email so he can then have an email sent to the people who commented on that post.
Essentially a very powerful sales agent that automatically goes through the comments on any LinkedIn post and then reaches out to them via email.
Karan Vaidya, from Composio, shared a couple of really cool customer support automations that he built out with Cursor and MCPs.
The first one fetches customer support emails, analyzes the issue, solves the issue, and then responds to the email with the correct solution.
If it can’t correct the solution, it uses Git Blame to find the responsible dev, creates a ticket on Linear, and then notifies the developer on Slack about the issue.
Cursor + MCP: Assigns bug report tickets to the right dev using Git Blame, all on its own!!!
Cursor Agent just received a bug report, couldn't solve it, so it identified the responsible dev using Git Blame, created a Linear ticket, and sent them a Slack alert, all automatically!… https://t.co/A3LZBg3HBapic.twitter.com/pcGj5ZQdPO
Me: What connections exist between my project notes on renewable energy and my reading notes on urban planning?
Claude: [Using Claude Server MCP] I found 7 conceptual links between your renewable energy and urban planning notes. The strongest connections involve microgrids, community solar projects, and transit-oriented development. You explored similar funding mechanisms in both areas but haven't explicitly connected them in your notes. Would you like me to create a new note mapping these relationships?
Advanced Document Research
Alexander Komyagin built an Agentic RAG solution using MCP to fundamentally change how he interacts with documents (read his article here):
Me: What do our financial projections say about Q3 revenue growth compared to analyst expectations?
Claude: [Using Agentic RAG MCP] Let me examine your documents...
After analyzing your Q3 projections and comparing with analyst reports from Bloomberg and Reuters, I've found a 5.7% discrepancy. Your internal projections show 12.3% growth while consensus estimates are 18%. The primary factors driving this difference appear to be more conservative supply chain recovery assumptions in your models.
Instead of the traditional static retrieval systems, his MCP implementation lets Claude autonomously decide its search strategy and combine information from multiple document sources—just like a skilled researcher would.
Automated 3D Designs
Siddharth Ahuja connected the Blender MCP server to Claude to create 3D designs and scenes with just a few prompts.
Giacomo connected Claude to Rember, a tool to create flashcards, to help him automatically create flashcards to remember important concepts while chatting with Claude.
MCP is not just for devs
say "help me remember this" and Claude will create spaced repetition flashcards in Rember
What makes MCP revolutionary isn’t just the technology itself, it’s the connections it enables. Right now, only Claude, Cursor and a handful of other AI companies have MCP clients that enable you to extend their functionality by adding MCP servers.
But imagine what happens as more companies build out MCP clients. The ability to seamlessly integrate AI into your existing workflows, tools, and data sources creates a multiplicative effect that transforms productivity.
As you explore MCP for yourself, start with a simple question: “What tasks take up most of my time but don’t require my unique human judgment?” Those are perfect candidates for automation.
The real power comes when you combine multiple MCP servers. Need to analyze data, create visualizations, and publish findings to your blog? Chain together database, visualization, and WordPress MCPs for a workflow that would have been science fiction just months ago.
And if you need help building these out, drop your email below and I’ll reach out.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
This post is part 2 of my “Ultimate Guide to Model Context” series. You can read part 1 here and part 3 here.
In our previous post, we introduced the Model Context Protocol (MCP) and how it transforms our dear Claude from a knowledgeable yet impotent AI into a helpful digital butler who can actually interact with your files, apps, and services. Now it’s time to draw back the curtain and have a gander at the magic behind it.
Don’t worry—we’ll keep things simple and jargon-free, dishing out plenty of analogies to explain the technical concepts like my Uncle dishes out expletives when India cocks up a cricket match. By the end of this post, you’ll understand what makes MCP tick and how you can start exploring different MCP servers for your specific needs.
How MCP Works
Remember our analogy of MCP as a universal translator between AI and your digital world? Let’s expand on that to understand what’s happening behind the scenes.
The MCP Architecture Explained
At its core, MCP follows what tech folks call a “client-server architecture.” This is exactly how computers work with the internet. The browser on your computer is the “client”. It retrieves and displays information from a “server” over the internet via a protocol called HTTP.
The Model Context Protocol is similar. Let’s say you’re enjoying a cold Sunday evening by the fire in the study of your manor, as one does. It’s a high-tech study with a built-in AI assistant. You ask the assistant to have some hot toddy sent over:
1. The Host (where you chat with AI)
This is an application like Claude Desktop or Cursor where you interact with an AI assistant.
In our analogy, this is the study of your manor.
2. The MCP Client (the translator)
This is built into the host application. So the engineers at Claude and Cursor need to build this first for the whole thing to work.
It translates between what the AI understands and what MCP servers speak.
You never interact with this directly, it works behind the scenes like HTTP.
In our analogy, it’s an upgrade module for your study that allows your AI assistant to communicate with other parts of your manor, such as the bar.
3. MCP Servers (specialized helpers)
Each server is like a specialist with access to specific resources.
One server might know how to work with files, another with Slack, and so on.
Servers can be on your computer or connect to online services.
In our analogy, the bartender who makes the hot toddy and brings it over to you is the server.
4. Tools (actions your AI takes via servers)
These are the functions available to the AI on the server.
A document server may have a read_file action that the AI can invoke to read a specific file.
In our analogy, the tool is the ability to prepare libation.
5. Resources (your digital stuff)
The actual files, apps, and services the AI needs to access
Could be local (on your computer) or remote (on the internet)
In our analogy, these are the ingredients that go into making the hot toddy. I prefer a spot of Cognac myself.
If you enjoyed this analogy, I have more for you. Be a dear and sign up to my newsletter for more.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
A Day in the Life of an MCP Request
Ok enough with the analogies. To really understand how this works, let’s follow what happens when you ask your AI assistant to summarize a document and send it to Slack:
You make a request to Claude: “Be a good sport and summarize the quarterly_report.pdf on my desktop. Oh and while you’re at it, post the key points to the #team-updates Slack channel”
Claude recognizes this requires access to both files and Slack, so it needs to use MCP
The MCP Client activates and connects to two different MCP servers:
The File System MCP Server (to access the PDF)
The Slack MCP Server (to post the message)
Permissions are checked:
The File System server asks: “Allow Claude to read quarterly_report.pdf?”
The Slack server asks: “Allow Claude to post to #team-updates?”
You approve both requests
The File System serverretrieves the PDF content and sends it back through MCP
Claude processes the document and creates a summary
The Slack servertakes Claude’s summary and posts it to your team channel
You receive confirmation that the task is complete
All of this happens in seconds, with the complex technical work hidden from view. The beauty of MCP is that it handles all the complicated connections while maintaining security and giving you control.
The Technology That Powers MCP
Now that you understand the basic flow, let’s demystify some of the technology that makes MCP possible:
The Protocol Itself
The Model Context Protocol is what tech people call an “open standard.” This means:
It’s publicly documented so anyone can build with it
It follows consistent rules for communication
It’s designed to be secure from the ground up
Think of it like the rules of the road—all vehicles (or in this case, different software) follow the same rules, allowing smooth traffic flow.
Security Measures
MCP takes security seriously with several built-in protections:
Permission-Based Access
Nothing happens without your explicit approval
Permissions are fine-grained (specific to each action)
Sandboxing
Each MCP server is isolated from others
If one server has a problem, it doesn’t affect the rest
Audit Trails
All actions are logged so you can see what happened
Useful for troubleshooting or monitoring usage
Real-Time Communication
MCP uses modern, efficient methods for passing information back and forth:
It’s designed for low latency (minimal delays)
It handles both simple requests and large data transfers
It manages two-way communication seamlessly
This means you don’t have to wait long for results, even when dealing with complex tasks involving multiple systems.
MCP Servers: The Building Blocks of AI Integration
MCP servers are the workhorses of the system. Each one is specialized for a specific purpose, and you can mix and match them based on your needs.
Types of MCP Servers
MCP servers generally fall into a few categories:
1. Local Resource Servers
Access things on your computer
Examples: File System, Local Database, Browser Control
2. Communication Servers
Connect to messaging and social platforms
Examples: Slack, Email, Bluesky
3. Productivity Servers
Integrate with work tools
Examples: GitHub, Google Drive, Calendar
4. Information Servers
Fetch and process data
Examples: Weather, Search, Wikipedia
5. Specialized Servers
Handle niche needs
Examples: 3D Printer Control, Smart Home
Where to Find MCP Servers
In the previous post, I mentioned a few of the top MCP servers. If you’re looking for more, there are several places to discover and download MCP servers:
While most people will simply use existing MCP servers, you might be curious about how they’re created. Or perhaps you can’t find one and want to build your own. Here’s a simplified explanation:
What You Need to Create an MCP Server
If you’re not a developer, you probably won’t be creating your own MCP servers. But understanding what goes into them can help you appreciate what they do:
1. Programming Skills
Knowledge of languages like Python and JavaScript
Understanding of APIs and web services
2. Development Tools
MCP SDK (Software Development Kit)
Required libraries and dependencies
3. Access to Resources
API keys for external services
Documentation for the systems you’re connecting to
For the Technically Curious: A Simple Example
Here’s what a very basic MCP server might look like in concept (this is simplified pseudocode):
JSON
// Define what the server can doserver.addCapability("read-weather-forecast", {description: "Gets the weather forecast for a location",parameters: {location: "The city or area to get the forecast for",days: "Number of days to forecast" },securityLevel: "requires-approval"});// Implement the actual functionalityserver.onRequest("read-weather-forecast", async (request) => {// Get the forecast from a weather serviceconstforecast=awaitweatherAPI.getForecast(request.parameters.location,request.parameters.days);// Return the resultsreturn{current: forecast.current,daily: forecast.daily,warnings: forecast.alerts };});// Start listening for connectionsserver.start();
This simplified example shows how an MCP server:
Defines what capabilities it offers
Specifies what parameters are needed
Sets security requirements
Implements the actual functionality
Returns results in a structured format
In reality, MCP servers are more complex, with proper error handling, security features, and optimization—but this gives you a sense of their basic structure.
Connecting Multiple MCP Servers: The Power of Combination
One of the most powerful aspects of MCP is the ability to use multiple servers together. This creates workflows that would otherwise require complex programming.
Example: A Research Assistant Workflow
Imagine you’re researching a topic and want AI help. With multiple MCP servers, you could:
Use the File System server to scan your existing notes
Use the Browser Control server to search for new information
Use the Wikipedia server to verify facts and get background
Use the Google Drive server to save your findings
Use the Slack server to share insights with colleagues
All of this could be accomplished with a single request to your AI assistant, with each server handling its specialized part of the task.
Common Questions About MCP Servers
“Are MCP servers safe to install?”
MCP servers from reputable sources follow strict security protocols. Stick to official directories and well-reviewed options. Each server will ask for specific permissions, so you always maintain control over what they can access.
“How many servers should I install?”
Start with just the ones you need for your common tasks. You can always add more later. Most users begin with the File System server and add others as needed.
“Will MCP servers slow down my computer?”
Most MCP servers use minimal resources when idle and are designed to be efficient. If you’re not actively using them with your AI assistant, they have very little impact on performance. I’ve noticed, however, that it does slow down my Claude Desktop app if I add too many.
“Can I use MCP servers with any AI assistant?”
Currently, MCP works with compatible hosts like Claude Desktop and Cursor. As the protocol gains popularity, more AI applications are likely to support it.
What’s Next on Your MCP Journey
Now that you understand how MCP works behind the scenes and what servers are available, you’re ready to start building your personalized AI workspace.
In my next post in the series, I’ll provide a hands-on guide to building out useful agentic workflows with Claude and MCP servers. I’ll walk through the setup process with screenshots and troubleshooting tips to ensure a smooth experience.
Sign up below and stay tuned for it!
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.