Category: Blog

  • ChatGPT o3 – The First Reasoning Agentic Model

    ChatGPT o3 – The First Reasoning Agentic Model

    Yesterday OpenAI rolled out o3, the first reasoning model that is also agentic. Reasoning models have been around for a while, and o3 has been around in it’s mini version as well.

    However, the full release yesterday showed us a model that not only reasons, but can browse, run Python, and look at your images in multiple thought loops. It behaves differently than the reasoning models we’ve seen so far, and that makes it unique.

    OpenAI even hinted it “approaches AGI—with caveats.” Of course, OpenAI has been saying this for four years with every new model release so take it with a pinch of salt. That being said, I did want to test this out and compare it to the current top model (Gemini 2.5 Pro) to see if it’s better.

    What the experts and the numbers say

    Before we get into the 4 tests I ran both models through, let’s look at the benchmarks and a snapshot of what o3 can do.

    Capabilityo3 highlights
    Benchmarks22.8 % jump on SWE‑Bench Verified coding tasks and one missed question on AIME 2024 math.
    Vision reasoningRotates, crops, zooms, and then reasons over the edited view. It can “think with images
    Full‑stack tool useSeamlessly chains browsing, Python, image generation, and file analysis (no plug‑in wrangling required).
    Access & priceLive for Plus, Pro, and Team; o3‑mini even shows up in the free tier with light rate limits.

    Field‑testing o3 against Gemini 2.5 Pro

    Benchmarks are great but I’ve stopped paying much attention to them recently. What really counts is if it can do what I want it to do.

    Below are four experiments I ran, pitting o3 against Google’s best reasoning model in areas like research, vision, coding, and data science.

    Deep‑dive research

    I started with a basic research and reasoning test. I asked both models the same prompt: “What are people saying about ChatGPT o3? Find everything you can and interesting things it can do.”

    Gemini started by thinking about the question, formulating a search plan, and executing against it. Because o3 is a brand new model, it’s not in Gemini’s training data, so it wasn’t sure if I meant o3 or ChatGPT-3 or 4o (yeah OpenAI’s naming confuses even the smartest AI models).

    So to cover all bases, Gemini came up with 4 search queries and ran them in parallel. When the answers came back, it combined them all and gave me a final response.

    Gemini’s thought process

    o3, on the other hand, took the Sherlock route – search, read, reason, search again, fill a gap, repeat. The final response stitched together press reactions, Reddit hot takes, and early benchmark chatter.

    o3’s thought process

    This is where that agentic behaviour of o3 shines. As o3 found answers to its initial searches, it reasoned more and ran newer searches to plug gaps in the response. The final answer was well-rounded and solved my initial query.

    Gemini only reasoned initially, and then after running the searches it combined everything into an answer. The problem is, because it wasn’t sure what o3 was when it first reasoned, one of the search queries was “what can ChatGPT do” instead of “what can o3 do”. So when it gave me the final answer, it didn’t quite solve my initial query.

    Takeaway: Research isn’t a single pull‑request; it’s a feedback loop. o3 bakes that loop into the core model instead of outsourcing it to external agents or browser plug‑ins. When the question is fuzzy and context keeps shifting, that matters.

    Image sleuthing

    Now if you’ve used AI as much as I have, you’ll have realized that o3 research works almost like Deep Research, a feature that Gemini also has. And you’re right, it does.

    But search isn’t the only tool o3 has in its arsenal. It can also use Python, and work with images, files, and more.

    So my next test was to see if it could analyze and manipulate images. I tossed both models a picture of me taken in the Japan Pavilion at EPCOT, Disney World. I thought because of the Japanese background it might trip the model up.

    Ninety seconds later o3 not only pinned the location but pointed out a pin‑sized glimpse of Spaceship Earth peeking over the trees far in the background, something I’d missed entirely.

    I was surprised it noticed that, so I asked it to point it out to me. Using Python, it identified the object, calculated its coordinates, and put a red circle right where the dome is! It was able to do this because it went through multiple steps of reasoning and tool use, showcasing its agentic capabilities.

    Gemini also got the location right, but it only identified the pagoda and torii gate, not Spaceship Earth. When I asked it to mark the torii gate, it could only describe its position in the image, but it couldn’t edit and send me back the image.

    Takeaway: o3’s “vision ↔ code ↔ vision” loop unlocks practical image tasks like quality‑control checks, UI audits, or subtle landmark tagging. Any workflow that mixes text, numbers, code, and images can hand the grunt work to o3 while the human focuses on decision‑making.

    Coding with bleeding‑edge libraries

    Next up, I wanted to see how well it does with coding. Reasoning models by their nature are good at this, and Gemini has been my go-to recently.

    I asked them both to “Build a tiny web app. One button starts a real‑time voice AI conversation and returns the transcript.”

    The reason I chose this specific prompt is because Voice AI has improved a lot in recent weeks, and we’ve had some new libraries and SDKs come out around it. A lot of the newer stuff is beyond the cutoff date of these models.

    So I wanted to see how well it does with gathering newer documentation and using that in its code versus what it already knows in its training data.

    o3 researched the latest streaming speech API that dropped after its training cutoff, generated starter code, and offered the older text‑to‑speech fallback.

    Gemini defaulted to last year’s speech‑to‑text loop and Google Cloud calls.

    While both were technically correct and their code does work, o3 came back with the more up-to-date answer. Now, I could have pointed Gemini in the right direction and it would have coded something better, but that’s still an extra step that o3 eliminated out of the box.

    Takeaway: o3’s autonomous web search makes it less likely to hand you stale SDK calls or older documentation.

    Data analysis + forecasting

    Finally, I wanted to put all the tools together into one test. I asked both models: “Chart how Canadian tourism to the U.S. is trending this year vs. last, then forecast to July 1.”

    This combines search, image analysis, data analysis, python, and chart creation. o3’s agentic loop served it well again. It searched, found data, identified gaps, searched more, until it gave me a bar chart.

    Initially, it only found data for January 2025, so it only plotted that. When I asked it for data on February and March, it reasoned a lot longer, ran multiple searches, found various data, and eventually computed an answer.

    o3’s thought process

    Gemini found numbers for January and March, but nothing for February, and since it doesn’t have that agentic loop, it didn’t explore further and try to estimate the numbers from other sources like o3 did.

    The most impressive part though was when I asked both to forecast the numbers into summer. Gemini couldn’t find data and couldn’t make the forecast. o3 on the other hand did more research, looked at broader trends like the tariffs and border issues, school breaks, airline discount season, even the NBA finals, and made assumptions around how that would impact travel going into summer.

    Takeaway: o3 feels like a junior quant who refuses to stop until every cell in the spreadsheet is filled (or at least justified). This combines search, reason, data analysis loop is invaluable for fields like investing, economics, finance, accounting, or anything to do with data.

    Strengths, quirks, and when to reach for o3

    Where it shines

    • Multi‑step STEM problems, data wrangling, and “find the blind spot” research.
    • Vision workflows that need both explanation and a marked‑up return image.
    • Rapid prototyping with APIs newer than the model’s cutoff.

    Where it still lags

    • Creative long‑form prose: I still think Claude 3.7 is the better novelist but that’s personal preference.
    • Sheer response latency: the deliberative pass can stretch beyond a minute.
    • Token thrift: the reasoning trace costs compute; budget accordingly.

    Final thoughts

    I’d love to continue testing o3 out for coding and see if it can replace Gemini 2.5 Pro, but I do think it is already stronger with research and reasoning. It’s the employe who keeps researching after everyone heads to lunch, circles details no one else spotted, and checks the changelog before committing code.

    If your work involves any mix of data, code, images, or the open web (and whose work doesn’t) you’ll want that kind of persistence on tap. Today, that persistence is spelled o‑3.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • The Complete Guide to Google’s Agent Development Kit (ADK)

    The Complete Guide to Google’s Agent Development Kit (ADK)

    The Agent Development Kit (ADK) is a new open-source framework released by Google that simplifies the end-to-end development of intelligent agent systems.

    Do we really need another agent framework? Probably not. But hey, Google’s been on a roll and Gemini 2.5 Pro is my new favourite model (we’ll see if this changes next month), so if they’re offering something that makes it easy to build complex agentic systems, I’m all ears.

    In this mammoth guide, I’ll explore all that the Agent Development Kit has to offer, starting from it’s capabilities and primitives, all the way to building a complex multi-agent system with all the bells and whistles.

    Key Features and Capabilities

    ADK offers a rich set of features designed to address the entire agent development lifecycle:

    • Multi-Agent Architecture: create modular, scalable applications where different agents handle specific tasks, working in concert to achieve complex goals
    • Model Flexibility: use Gemini models directly, access models available via Vertex AI Model Garden, or leverage LiteLLM integration to work with models from providers like Anthropic, Meta, Mistral AI, and AI21 Labs.
    • Rich Tool Ecosystem: use pre-built tools (like Search and Code Execution), create custom tools, implement Model Context Protocol (MCP) tools, integrate third-party libraries (such as LangChain and LlamaIndex), or even use other agents as tools.
    • Built-in Streaming: native bidirectional audio and video streaming capabilities, enabling natural, human-like interactions beyond just text.
    • Flexible Orchestration: structured workflows using specialized workflow agents (Sequential, Parallel, Loop) for predictable execution patterns, and dynamic, LLM-driven routing for more adaptive behavior.
    • Integrated Developer Experience: powerful CLI and visual Web UI for local development, testing, and debugging.
    • Built-in Evaluation: systematically assess agent performance, evaluating both final response quality and step-by-step execution trajectories against predefined test cases.
    • Deployment Options: Agents built with ADK can be containerized and deployed anywhere, including integration with Google Cloud services for production environments.

    The Architecture of ADK

    At a high level, ADK’s architecture is designed around several key components that work together to create functional agent systems:

    Core Components:

    1. Agents: The central entities that make decisions and take actions. ADK supports various types of agents, including LLM-powered agents and workflow agents that orchestrate others.
    2. Tools: Functions or capabilities that agents can use to perform specific actions, such as searching the web, executing code, or retrieving information from databases.
    3. Runners: Components that manage the execution flow of agents, handling the orchestration of messages, events, and state management.
    4. Sessions: Maintain the context and state of conversations, allowing agents to persist information across interactions.
    5. Events: The communication mechanism between components in the system, representing steps in agent execution.

    Architectural Patterns:

    ADK is built around a flexible, event-driven architecture that enables:

    • Modular Design: Components can be combined and reconfigured to create different agent behaviors
    • Extensibility: The system can be extended with new tools, models, and agent types
    • Separation of Concerns: Clear boundaries between reasoning (agents), capabilities (tools), execution (runners), and state management (sessions)

    This architecture allows developers to focus on defining what their agents should do, while ADK handles the complex orchestration of execution, communication, and state management.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    Getting Started with ADK

    Getting started with the Agent Development Kit is straightforward, requiring just a few steps to set up your development environment. ADK is designed to work with Python 3.9 or later, and it’s recommended to use a virtual environment to manage dependencies.

    Basic Installation

    To install ADK, you’ll need to have Python installed on your system. Then, you can use pip to install the package:

    Bash
    # Create a virtual environment (recommended)
    python -m venv .venv
    
    # Activate the virtual environment
    # On macOS/Linux:
    source .venv/bin/activate
    # On Windows (CMD):
    .venv\Scripts\activate.bat
    # On Windows (PowerShell):
    .venv\Scripts\Activate.ps1
    
    # Install ADK
    pip install google-adk

    This installs the core ADK package, which includes all the necessary components to build and run agents locally. You’ll need to add your GOOGLE_API_KEY in a .env file.

    Creating Your First Basic Agent

    Let’s create a simple agent that can tell you the weather and time for a specific city. This example will demonstrate the basic structure of an ADK project.

    This is the directory structure for our agent:

    Plaintext
    parent_folder/
        weather_time_agent/
            __init__.py
            agent.py
            .env

    Create the necessary files in your terminal:

    Bash
    mkdir -p weather_time_agent
    echo "from . import agent" > weather_time_agent/__init__.py
    touch weather_time_agent/agent.py
    touch weather_time_agent/.env

    Now edit agent.py to create your agent:

    Python
    import datetime
    from zoneinfo import ZoneInfo
    from google.adk.agents import Agent
    
    def get_weather(city: str) -> dict:
        """Retrieves the current weather report for a specified city.
    
        Args:
            city (str): The name of the city for which to retrieve the weather report.
    
        Returns:
            dict: status and result or error msg.
        """
        if city.lower() == "new york":
            return {
                "status": "success",
                "report": (
                    "The weather in New York is sunny with a temperature of 25 degrees"
                    " Celsius (41 degrees Fahrenheit)."
                ),
            }
        else:
            return {
                "status": "error",
                "error_message": f"Weather information for '{city}' is not available.",
            }
    
    
    def get_current_time(city: str) -> dict:
        """Returns the current time in a specified city.
    
        Args:
            city (str): The name of the city for which to retrieve the current time.
    
        Returns:
            dict: status and result or error msg.
        """
    
        if city.lower() == "new york":
            tz_identifier = "America/New_York"
        else:
            return {
                "status": "error",
                "error_message": (
                    f"Sorry, I don't have timezone information for {city}."
                ),
            }
    
        tz = ZoneInfo(tz_identifier)
        now = datetime.datetime.now(tz)
        report = (
            f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}'
        )
        return {"status": "success", "report": report}
    
    
    weather_time_agent = Agent(
        name="weather_time_agent",
        model="gemini-2.0-flash-exp",
        description=(
            "Agent to answer questions about the time and weather in a city."
        ),
        instruction=(
            "I can answer your questions about the time and weather in a city."
        ),
        tools=[get_weather, get_current_time],
    )

    Finally, add your API keys to the .env file. You can directly use Gemini but if you want to use other models, like Anthropic or OpenAI, you’ll need to ‘pip install litellm‘ first.

    Once done, you can run the agent with ‘adk run

    Of course, this is a really basic agent and doesn’t need a framework. Let’s dive deeper into the core components of the ADK and build a more complex agent.

    Building Agents: The Foundation

    ADK provides several agent types to address different needs and use cases:

    LLM Agent

    The LlmAgent (often simply referred to as Agent) is the most commonly used agent type. It leverages a Large Language Model to understand user requests, make decisions, and generate responses. This is the “thinking” component of your application.

    Python
    from google.adk.agents import Agent  # This is actually an LlmAgent
    
    my_agent = Agent(
        name="my_first_agent",
        model="gemini-2.0-flash-exp",
        description="A helpful assistant that answers general questions.",
        instruction="You are a friendly AI assistant. Be concise and helpful.",
        tools=[]  # Optional tools
    )

    The LlmAgent is non-deterministic – its behaviour depends on the LLM’s interpretation of instructions and context. It can use tools, transfer to other agents, or directly respond to users based on its reasoning.

    Workflow Agents

    Workflow agents provide deterministic orchestration for sub-agents. Unlike LLM agents, they follow predefined execution patterns:

    SequentialAgent: Executes sub-agents one after another, in order:

    Python
    from google.adk.agents import SequentialAgent
    
    step1 = Agent(name="data_collector", model="gemini-2.0-flash-exp")
    step2 = Agent(name="data_analyzer", model="gemini-2.0-flash-exp")
    
    pipeline = SequentialAgent(
        name="analysis_pipeline",
        sub_agents=[step1, step2]  # Will execute in this order
    )

    ParallelAgent: Executes sub-agents concurrently:

    Python
    from google.adk.agents import ParallelAgent
    
    fetch_weather = Agent(name="weather_fetcher", model="gemini-2.0-flash-exp")
    fetch_news = Agent(name="news_fetcher", model="gemini-2.0-flash-exp")
    
    parallel_agent = ParallelAgent(
        name="information_gatherer",
        sub_agents=[fetch_weather, fetch_news]  # Will execute in parallel
    )

    LoopAgent: Repeatedly executes sub-agents until a condition is met:

    Python
    from google.adk.agents import LoopAgent
    
    process_step = Agent(name="process_item", model="gemini-2.0-flash-exp")
    check_condition = Agent(name="check_complete", model="gemini-2.0-flash-exp")
    
    loop_agent = LoopAgent(
        name="processing_loop",
        sub_agents=[process_step, check_condition],
        max_iterations=5  # Optional maximum iterations
    )

    Custom Agents

    For specialized needs, you can create custom agents by extending the BaseAgent class:

    Python
    from google.adk.agents import BaseAgent
    from google.adk.agents.invocation_context import InvocationContext
    from google.adk.events import Event
    from typing import AsyncGenerator
    
    class MyCustomAgent(BaseAgent):
        name: str = "custom_agent"
        description: str = "A specialized agent with custom behavior"
        
        async def _run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:
            # Custom implementation logic here
            # You must yield at least one Event
            yield Event(author=self.name, content=...)

    Custom agents are useful when you need deterministic behavior that doesn’t fit into the existing workflow agent patterns, or when you want to integrate with external systems in custom ways.

    Configuring an Agent: Models, Instructions, Descriptions

    The behaviour of an agent is largely determined by its configuration parameters:

    Model Selection

    The model parameter specifies which LLM powers your agent’s reasoning (for LlmAgent). This choice affects the agent’s capabilities, cost, and performance characteristics:

    Python
    # Using a Gemini model directly
    agent = Agent(
        name="gemini_agent",
        model="gemini-2.0-flash-exp",  # Choose model variant based on needs
        # Other parameters...
    )

    Setting Instructions

    The instruction parameter provides guidance to the agent on how it should behave. This is one of the most important parameters for shaping agent behaviour:

    Python
    agent = Agent(
        name="customer_support",
        model="gemini-2.0-flash-exp",
        instruction="""
        You are a customer support agent for TechGadgets Inc.
        
        When helping customers:
        1. Greet them politely and introduce yourself
        2. Ask clarifying questions if the issue isn't clear
        3. Provide step-by-step troubleshooting when appropriate
        4. For billing issues, use the check_account_status tool
        5. For technical problems, use the diagnostic_tool
        6. Always end by asking if there's anything else you can help with
        
        Never share internal company information or promise specific refund amounts.
        """
    )

    Best practices for effective instructions:

    • Be specific about the agent’s role and persona
    • Include clear guidelines for when and how to use available tools
    • Use formatting (headers, numbered lists) for readability
    • Provide examples of good and bad responses
    • Specify any constraints or boundaries

    Defining Descriptions

    The description parameter provides a concise summary of the agent’s purpose:

    Python
    agent = Agent(
        name="billing_specialist",
        description="Handles customer billing inquiries and invoice issues.",
        # Other parameters...
    )

    While the description is optional for standalone agents, it becomes critical in multi-agent systems. Other agents use this description to determine when to delegate tasks to this agent. A good description should:

    • Clearly state the agent’s specific domain of expertise
    • Be concise (usually 1-2 sentences)
    • Differentiate the agent from others in the system

    Setting Output Key

    The optional output_key parameter allows an agent to automatically save its response to the session state:

    Python
    recommendation_agent = Agent(
        name="product_recommender",
        # Other parameters...
        output_key="product_recommendation"
    )

    This is particularly useful in multi-agent workflows, as it allows subsequent agents to access the output without additional code.

    Working with Multiple LLM Providers

    One of ADK’s powerful features is its ability to work with different LLM providers through LiteLLM integration. This gives you flexibility to choose the right model for each agent in your system.

    First, install the LiteLLM package: pip install litellm

    Then, configure your API keys for the models you want to use:
    export OPENAI_API_KEY="your-openai-key"
    export ANTHROPIC_API_KEY="your-anthropic-key"
    # Add others as needed

    Use the LiteLlm wrapper when defining your agent:

    Python
    from google.adk.agents import Agent
    from google.adk.models.lite_llm import LiteLlm
    
    # Using OpenAI's GPT-4o
    gpt_agent = Agent(
        name="gpt_agent",
        model=LiteLlm(model="openai/gpt-4o"),
        description="A GPT-powered agent",
        # Other parameters...
    )
    
    # Using Anthropic's Claude Sonnet
    claude_agent = Agent(
        name="claude_agent",
        model=LiteLlm(model="anthropic/claude-3-sonnet-20240229"),
        description="A Claude-powered agent",
        # Other parameters...
    )
    
    # Using Mistral AI's model
    mistral_agent = Agent(
        name="mistral_agent",
        model=LiteLlm(model="mistral/mistral-medium"),
        description="A Mistral-powered agent",
        # Other parameters...
    )

    This approach allows you to:

    • Match models to specific tasks based on their strengths
    • Build resilience by having alternatives if one provider has issues
    • Optimize for cost by using less expensive models for simpler tasks

    In the next section, we’ll explore how to extend your agent’s capabilities using tools.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    Tools: Extending Agent Capabilities

    Tools extend an agent’s capabilities beyond the core language model’s reasoning abilities. While an LLM can generate text and make decisions, tools allow agents to take concrete actions in the world: fetching real-time data, performing calculations, calling external APIs, executing code, and more.

    The agent’s language model decides when to use tools, with which parameters, and how to incorporate the results into its reasoning, but the tools themselves execute the agent’s intentions in predictable ways.

    Creating Custom Function Tools

    The most common way to create tools in ADK is by defining Python functions. These functions can then be passed to an agent, which will be able to call them when appropriate based on its reasoning.

    Basic Tool Definition

    Here’s a simple example of defining a function tool:

    Python
    def calculate_mortgage_payment(principal: float, annual_interest_rate: float, years: int) -> dict:
        """Calculates the monthly payment for a mortgage loan.
        
        Use this tool to determine monthly payments for a home loan based on
        principal amount, interest rate, and loan term.
        
        Args:
            principal: The initial loan amount in dollars.
            annual_interest_rate: The annual interest rate as a percentage (e.g., 5.5 for 5.5%).
            years: The loan term in years.
            
        Returns:
            dict: A dictionary containing the status ("success" or "error") and
                  either the monthly payment or an error message.
        """
        try:
            # Convert annual interest rate to monthly decimal rate
            monthly_rate = (annual_interest_rate / 100) / 12
            # Calculate number of monthly payments
            num_payments = years * 12
            
            # Guard against division by zero or negative values
            if monthly_rate <= 0 or principal <= 0 or num_payments <= 0:
                return {
                    "status": "error",
                    "error_message": "All inputs must be positive, and interest rate cannot be zero."
                }
                
            # Calculate monthly payment using the mortgage formula
            if monthly_rate == 0:
                monthly_payment = principal / num_payments
            else:
                monthly_payment = principal * (monthly_rate * (1 + monthly_rate) ** num_payments) / ((1 + monthly_rate) ** num_payments - 1)
                
            return {
                "status": "success",
                "monthly_payment": round(monthly_payment, 2),
                "total_payments": round(monthly_payment * num_payments, 2),
                "total_interest": round((monthly_payment * num_payments) - principal, 2)
            }
        except Exception as e:
            return {
                "status": "error",
                "error_message": f"Failed to calculate mortgage payment: {str(e)}"
            }
    
    # Add this tool to an agent
    from google.adk.agents import Agent
    
    mortgage_advisor = Agent(
        name="mortgage_advisor",
        model="gemini-2.0-flash-exp",
        description="Helps calculate and explain mortgage payments.",
        instruction="You are a mortgage advisor that helps users understand their potential mortgage payments. When asked about payments, use the calculate_mortgage_payment tool.",
        tools=[calculate_mortgage_payment]  # Simply include the function in the tools list
    )

    Tool Context and State Management

    For more advanced tools that need to access or modify the conversation state, ADK provides the ToolContext object. By adding this parameter to your function, you gain access to the session state and can influence the agent’s subsequent actions.

    Accessing and Modifying State

    Python
    from google.adk.tools.tool_context import ToolContext
    
    def update_user_preference(category: str, preference: str, tool_context: ToolContext) -> dict:
        """Updates a user's preference for a specific category.
        
        Args:
            category: The category for which to set a preference (e.g., "theme", "notifications").
            preference: The preference value to set.
            tool_context: Automatically provided by ADK, do not specify when calling.
            
        Returns:
            dict: Status of the preference update operation.
        """
        # Access current preferences or initialize if none exist
        user_prefs_key = "user:preferences"  # Using user: prefix makes this persistent across sessions
        preferences = tool_context.state.get(user_prefs_key, {})
        
        # Update the preferences
        preferences[category] = preference
        
        # Save back to state
        tool_context.state[user_prefs_key] = preferences
        
        print(f"Tool: Updated user preference '{category}' to '{preference}'")
        return {
            "status": "success", 
            "message": f"Your {category} preference has been set to {preference}"
        }

    Controlling Agent Flow

    The ToolContext also allows tools to influence the agent’s execution flow through the actions attribute:

    Python
    def escalate_to_support(issue_type: str, severity: int, tool_context: ToolContext) -> dict:
        """Escalates an issue to a human support agent.
        
        Args:
            issue_type: The type of issue being escalated.
            severity: The severity level (1-5, where 5 is most severe).
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status of the escalation.
        """
        # Record the escalation details in state
        tool_context.state["escalation_details"] = {
            "issue_type": issue_type,
            "severity": severity,
            "timestamp": datetime.datetime.now().isoformat()
        }
        
        # For high severity issues, transfer to the support agent
        if severity >= 4:
            tool_context.actions.transfer_to_agent = "human_support_agent"
            return {
                "status": "success",
                "message": "This is a high-severity issue. Transferring you to a human support specialist."
            }
        
        # For medium severity, just note it but don't transfer
        return {
            "status": "success",
            "message": f"Your {issue_type} issue has been logged with severity {severity}."
        }

    Handling Tool Results

    When an agent uses a tool, it needs to interpret the results correctly. This is why returning structured data with clear status indicators is important. Here’s how to guide your agent to handle tool results:

    Python
    weather_agent = Agent(
        name="weather_assistant",
        model="gemini-2.0-flash-exp",
        instruction="""
        You help users get weather information.
        
        When using the get_weather tool:
        1. Check the "status" field of the result.
        2. If status is "success", present the "report" information in a friendly way.
        3. If status is "error", apologize and share the "error_message" with the user.
        4. Always thank the user for their query.
        """,
        tools=[get_weather]
    )

    Built-in Tools and Integrations

    ADK provides several built-in tools that you can use without having to implement them yourself:

    Google Search

    Python
    from google.adk.tools import google_search
    
    search_agent = Agent(
        name="research_assistant",
        model="gemini-2.0-flash-exp",
        instruction="You help users research topics. When asked, use the google_search tool to find up-to-date information.",
        tools=[google_search]
    )

    Code Execution

    Python
    from google.adk.tools import code_interpreter
    
    coding_assistant = Agent(
        name="coding_assistant",
        model="gemini-2.0-flash-exp",
        instruction="You help users with coding tasks. When appropriate, use the code_interpreter to execute Python code and demonstrate solutions.",
        tools=[code_interpreter]
    )

    Retrieval-Augmented Generation (RAG)

    Python
    from google.adk.tools import rag_tool
    
    # Configure RAG with your documents
    my_rag_tool = rag_tool.configure(
        document_store="your-document-source",
        embedding_model="your-embedding-model"
    )
    
    documentation_assistant = Agent(
        name="docs_assistant",
        model="gemini-2.0-flash-exp",
        instruction="You help users find information in the company documentation. Use the RAG tool to retrieve relevant information.",
        tools=[my_rag_tool]
    )

    Third-Party Integrations

    ADK supports integration with popular tools from other frameworks:

    Python
    # LangChain tool example
    from langchain.tools import BaseTool as LangChainTool
    from google.adk.tools.langchain_tool import LangChainTool
    
    langchain_tool = LangChainTool(langchain_tool=your_langchain_tool_instance)
    
    # LlamaIndex tool example
    from google.adk.tools.llama_index_tool import LlamaIndexTool
    
    llama_index_tool = LlamaIndexTool(llama_index_tool=your_llama_index_tool_instance)

    Best Practices for Tool Design

    Creating effective tools is crucial for agent performance. Here are expanded best practices:

    1. Function Naming and Signature

    • Verb-Noun Names: Use descriptive names that clearly indicate action (e.g., fetch_stock_price is better than get_stock or simply stocks).
    • Parameter Naming: Use clear, self-documenting parameter names (city is better than c).
    • Default Values: Avoid setting default values for parameters. The LLM should decide all parameter values based on context.
    • Type Consistency: Ensure parameters have consistent types throughout your application.

    2. Error Handling and Result Structure

    • Comprehensive Error Handling: Catch all possible exceptions within your tool.
    • Informative Error Messages: Return error messages that help both the agent and user understand what went wrong.
    • Consistent Result Structure: Use a consistent pattern across all tools: python# Success case return {"status": "success", "data": result_data} # Error case return {"status": "error", "error_message": "Detailed explanation of what went wrong"}

    3. Documentation and Clarity

    • Rich Docstrings: Include comprehensive documentation explaining the tool’s purpose, parameters, return values, and usage guidelines.
    • Usage Examples: Consider including examples in the docstring for complex tools.
    • Logging: Add logging statements within tools to aid debugging.

    4. Tool Design Principles

    • Single Responsibility: Each tool should do one thing well.
    • Granularity Balance: Not too specific, not too general; find the right level of abstraction.
    • Idempotent When Possible: Tools should be safe to call multiple times when appropriate.
    • Input Validation: Validate inputs early to prevent cascading errors.

    5. Performance Considerations

    • Asynchronous Operations: For time-consuming operations, consider using async functions.
    • Timeout Handling: Implement timeouts for external API calls.
    • Caching: Consider caching results for frequently used, unchanging data.

    Example of a Well-Designed Tool

    Python
    def search_product_catalog(
        query: str, 
        category: str = None, 
        price_max: float = None,
        sort_by: str = None,
        tool_context: ToolContext = None
    ) -> dict:
        """Searches the product catalog for items matching the query and filters.
        
        Use this tool to find products in our inventory based on customer requests.
        
        Args:
            query: The search term entered by the customer (required).
            category: Optional category to filter results (e.g., "electronics", "clothing").
            price_max: Optional maximum price filter.
            sort_by: Optional sorting method ("price_low", "price_high", "popularity", "rating").
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: A dictionary containing:
                - "status": "success" or "error"
                - If success: "products" list of matching products (up to 5 items)
                - If error: "error_message" explaining what went wrong
                
        Example success:
            {"status": "success", "products": [{"name": "42-inch TV", "price": 299.99, ...}, ...]}
        Example error:
            {"status": "error", "error_message": "No products found matching 'flying car'"}
        """
        try:
            # Log the tool execution for debugging
            print(f"Tool: search_product_catalog called with query='{query}', category='{category}', price_max={price_max}")
            
            # Track the search in user history if tool_context is available
            if tool_context:
                search_history = tool_context.state.get("user:search_history", [])
                search_history.append({
                    "query": query,
                    "timestamp": datetime.datetime.now().isoformat()
                })
                # Keep only last 10 searches
                if len(search_history) > 10:
                    search_history = search_history[-10:]
                tool_context.state["user:search_history"] = search_history
            
            # ... actual catalog search implementation ...
            # (For demo, we'll return mock data)
            
            mock_products = [
                {"name": "42-inch Smart TV", "price": 299.99, "category": "electronics", "rating": 4.5},
                {"name": "Wireless Headphones", "price": 89.99, "category": "electronics", "rating": 4.2},
            ]
            
            # Apply filters if provided
            filtered_products = mock_products
            if category:
                filtered_products = [p for p in filtered_products if p["category"].lower() == category.lower()]
            if price_max:
                filtered_products = [p for p in filtered_products if p["price"] <= price_max]
                
            # Apply sorting if requested
            if sort_by == "price_low":
                filtered_products = sorted(filtered_products, key=lambda p: p["price"])
            elif sort_by == "price_high":
                filtered_products = sorted(filtered_products, key=lambda p: p["price"], reverse=True)
            elif sort_by == "rating":
                filtered_products = sorted(filtered_products, key=lambda p: p["rating"], reverse=True)
                
            # Return formatted response
            if filtered_products:
                return {
                    "status": "success",
                    "products": filtered_products[:5],  # Limit to 5 results
                    "total_matches": len(filtered_products)
                }
            else:
                return {
                    "status": "error",
                    "error_message": f"No products found matching '{query}' with the specified filters."
                }
                
        except Exception as e:
            print(f"Tool Error: search_product_catalog failed: {str(e)}")
            return {
                "status": "error",
                "error_message": f"Failed to search catalog: {str(e)}"
            }

    Tools are the primary way to extend your agents’ capabilities beyond just language generation. You can now create agents that interact effectively with the world and provide genuinely useful services to users.

    State and Memory: Creating Context-Aware Agents

    In ADK, “state” refers to the persistent data associated with a conversation that allows agents to remember information across multiple interactions. Unlike the conversation history (which records the sequence of messages), state is a structured key-value store that agents can read from and write to, enabling them to track user preferences, remember previous decisions, maintain contextual information, and build personalized experiences.

    The Role of Session State

    Session state serves several critical functions in agent applications:

    1. Contextual Memory: Allows agents to remember information from earlier in the conversation
    2. Preference Storage: Maintains user preferences across interactions
    3. Workflow Tracking: Keeps track of where users are in multi-step processes
    4. Data Persistence: Stores data that needs to be accessible between different agents or across multiple turns
    5. Configuration Management: Maintains settings that affect agent behavior

    State Structure and Scope

    ADK’s state management system is designed with different scopes to address various persistence needs:

    Plaintext
    session.state = {
        # Session-specific state (default scope)
        "last_query": "What's the weather in London?",
        "current_step": 3,
        
        # User-specific state (persists across sessions)
        "user:preferred_temperature_unit": "Celsius",
        "user:name": "Alex",
        
        # Application-wide state (shared across all users)
        "app:version": "1.2.3",
        "app:maintenance_mode": False,
        
        # Temporary state (not persisted beyond current execution)
        "temp:calculation_result": 42
    }

    The prefixes determine the scope:

    • No prefix: Session-specific, persists only for the current session
    • user:: User-specific, persists across all sessions for a particular user
    • app:: Application-wide, shared across all users and sessions
    • temp:: Temporary, exists only during the current execution cycle

    Implementing Memory with State Management

    Let’s explore how to implement memory capabilities using session state:

    Basic State Access

    The most straightforward way to access state is through the session object:

    Python
    # Getting a session
    from google.adk.sessions import InMemorySessionService
    
    session_service = InMemorySessionService()
    APP_NAME = "my_application"
    USER_ID = "user_123"
    SESSION_ID = "session_456"
    
    # Create or retrieve a session
    session = session_service.create_session(
        app_name=APP_NAME,
        user_id=USER_ID,
        session_id=SESSION_ID
    )
    
    # Reading from state
    last_city = session.state.get("last_city", "New York")  # Default if key doesn't exist
    
    # Writing to state
    session.state["last_city"] = "London"

    However, in real agent applications, you’ll often access state through more integrated methods.

    Accessing State in Tools

    Tools can access and modify state through the ToolContext parameter:

    Python
    from google.adk.tools.tool_context import ToolContext
    
    def remember_favorite_city(city: str, tool_context: ToolContext) -> dict:
        """Remembers the user's favorite city.
        
        Args:
            city: The city to remember as favorite.
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status of the operation.
        """
        # Store at user scope so it persists across sessions
        tool_context.state["user:favorite_city"] = city
        
        # Also store when this preference was set
        tool_context.state["user:favorite_city_set_at"] = datetime.datetime.now().isoformat()
        
        return {
            "status": "success",
            "message": f"I've remembered that your favorite city is {city}."
        }

    Using output_key for Automatic State Updates

    The output_key parameter of Agent provides a convenient way to automatically save an agent’s response to state:

    Python
    weather_reporter = Agent(
        name="weather_reporter",
        model="gemini-2.0-flash-exp",
        instruction="You provide weather reports for cities. Be concise but informative.",
        tools=[get_weather],
        output_key="last_weather_report"  # Automatically saves response to this state key
    )

    When the agent responds, its final text output will be stored in session.state["last_weather_report"] automatically.

    State in Agent Instructions

    To make agents state-aware, include instructions on how to use state:

    Python
    personalized_agent = Agent(
        name="personalized_assistant",
        model="gemini-2.0-flash-exp",
        instruction="""
        You are a personalized assistant.
        
        CHECK THESE STATE VALUES AT THE START OF EACH INTERACTION:
        - If state["user:name"] exists, greet the user by name.
        - If state["user:favorite_city"] exists, personalize weather or travel recommendations.
        - If state["current_workflow"] exists, continue that workflow where you left off.
        
        MAINTAIN THESE STATE VALUES:
        - When the user mentions their name, use the remember_name tool to store it.
        - When discussing a city positively, use the remember_favorite_city tool.
        - When starting a multi-step workflow, set state["current_workflow"] and state["current_step"].
        """
    )

    Persisting Information Across Conversation Turns

    To create truly context-aware agents, you need to implement patterns that effectively use state across conversation turns.

    Pattern 1: Preference Tracking

    This pattern stores user preferences discovered through conversation:

    Python
    def set_preference(category: str, value: str, tool_context: ToolContext) -> dict:
        """Stores a user preference.
        
        Args:
            category: The preference category (e.g., "language", "theme").
            value: The preference value.
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status of the operation.
        """
        preferences = tool_context.state.get("user:preferences", {})
        preferences[category] = value
        tool_context.state["user:preferences"] = preferences
        return {"status": "success", "message": f"Preference set: {category} = {value}"}
    
    def get_preferences(tool_context: ToolContext) -> dict:
        """Retrieves all user preferences.
        
        Args:
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: The user's stored preferences.
        """
        preferences = tool_context.state.get("user:preferences", {})
        return {"status": "success", "preferences": preferences}
    
    preference_agent = Agent(
        name="preference_aware_agent",
        model="gemini-2.0-flash-exp",
        instruction="""
        You help users and remember their preferences.
        
        At the start of each conversation:
        1. Use the get_preferences tool to check stored preferences.
        2. Adapt your responses based on these preferences.
        
        During conversations:
        1. When a user expresses a preference, use set_preference to store it.
        2. Acknowledge when you've saved a preference.
        
        Examples of preferences to track:
        - Language preferences
        - Communication style (brief/detailed)
        - Topic interests
        """,
        tools=[set_preference, get_preferences]
    )

    Pattern 2: Workflow State Tracking

    This pattern manages progress through multi-step processes:

    Python
    def start_workflow(workflow_name: str, tool_context: ToolContext) -> dict:
        """Starts a new workflow and tracks it in state.
        
        Args:
            workflow_name: The name of the workflow to start.
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status and the initial workflow state.
        """
        workflow = {
            "name": workflow_name,
            "current_step": 1,
            "started_at": datetime.datetime.now().isoformat(),
            "data": {}
        }
        tool_context.state["current_workflow"] = workflow
        return {"status": "success", "workflow": workflow}
    
    def update_workflow_step(step: int, data: dict, tool_context: ToolContext) -> dict:
        """Updates the current workflow step and associated data.
        
        Args:
            step: The new step number.
            data: Data to associate with this step.
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status and the updated workflow state.
        """
        workflow = tool_context.state.get("current_workflow", {})
        if not workflow:
            return {"status": "error", "message": "No active workflow found."}
        
        workflow["current_step"] = step
        workflow["last_updated"] = datetime.datetime.now().isoformat()
        workflow["data"].update(data)
        tool_context.state["current_workflow"] = workflow
        return {"status": "success", "workflow": workflow}
    
    workflow_agent = Agent(
        name="workflow_agent",
        model="gemini-2.0-flash-exp",
        instruction="""
        You guide users through structured workflows.
        
        At the start of each interaction:
        1. Check if state["current_workflow"] exists.
        2. If it exists, continue from the current_step.
        3. If not, determine if the user wants to start a workflow.
        
        Available workflows:
        - "account_setup": A 3-step process to set up a new account
        - "support_request": A 4-step process to file a support ticket
        
        Use start_workflow and update_workflow_step to track progress.
        """,
        tools=[start_workflow, update_workflow_step]
    )

    Pattern 3: Conversation History Summarization

    This pattern maintains condensed summaries of conversation context:

    Python
    def update_conversation_summary(new_insight: str, tool_context: ToolContext) -> dict:
        """Updates the running summary of the conversation with a new insight.
        
        Args:
            new_insight: New information to add to the summary.
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status and the updated summary.
        """
        summary = tool_context.state.get("conversation_summary", "")
        if summary:
            summary += "\n- " + new_insight
        else:
            summary = "Conversation Summary:\n- " + new_insight
        
        tool_context.state["conversation_summary"] = summary
        return {"status": "success", "summary": summary}
    
    summarizing_agent = Agent(
        name="summarizing_agent",
        model="gemini-2.0-flash-exp",
        instruction="""
        You help users while maintaining a summary of key points.
        
        At the start of each interaction:
        1. Check state["conversation_summary"] to recall context.
        
        During conversations:
        1. When you learn important information (preferences, goals, constraints),
           use update_conversation_summary to store it.
        2. Focus on facts and insights, not general chat.
        
        Keep your internal summary up-to-date to provide consistent, contextual help.
        """,
        tools=[update_conversation_summary]
    )

    Personalizing Responses with State

    By effectively using state, you can create deeply personalized agent experiences. Here’s an example of a comprehensive personalization approach:

    Python
    from google.adk.agents import Agent, SequentialAgent
    from google.adk.tools.tool_context import ToolContext
    
    # --- Tools for personalization ---
    
    def get_user_profile(tool_context: ToolContext) -> dict:
        """Retrieves the user's stored profile information.
        
        Args:
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: The user's profile data.
        """
        profile = tool_context.state.get("user:profile", {})
        return {
            "status": "success",
            "profile": profile,
            "is_returning_user": bool(profile)
        }
    
    def update_user_profile(field: str, value: str, tool_context: ToolContext) -> dict:
        """Updates a specific field in the user's profile.
        
        Args:
            field: The profile field to update (e.g., "name", "occupation").
            value: The value to store.
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status of the operation.
        """
        profile = tool_context.state.get("user:profile", {})
        profile[field] = value
        tool_context.state["user:profile"] = profile
        return {"status": "success", "field": field, "value": value}
    
    def log_user_interest(topic: str, score: float, tool_context: ToolContext) -> dict:
        """Records a user's interest in a topic with a relevance score.
        
        Args:
            topic: The topic of interest.
            score: Relevance score (0.0-1.0, higher means more interested).
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Status of the operation.
        """
        interests = tool_context.state.get("user:interests", {})
        interests[topic] = max(interests.get(topic, 0), score) # Take highest score
        tool_context.state["user:interests"] = interests
        return {"status": "success", "topic": topic, "score": score}
    
    def get_personalization_strategy(tool_context: ToolContext) -> dict:
        """Analyzes user data and returns a personalization strategy.
        
        Args:
            tool_context: Automatically provided by ADK.
            
        Returns:
            dict: Personalization recommendations based on user data.
        """
        profile = tool_context.state.get("user:profile", {})
        interests = tool_context.state.get("user:interests", {})
        interaction_count = tool_context.state.get("user:interaction_count", 0)
        
        # Increment interaction count
        tool_context.state["user:interaction_count"] = interaction_count + 1
        
        # Determine name usage style
        name_style = "formal"
        if interaction_count > 5 and "name" in profile:
            name_style = "casual"
        
        # Identify top interests
        top_interests = sorted(
            [(topic, score) for topic, score in interests.items()], 
            key=lambda x: x[1], 
            reverse=True
        )[:3]
        
        return {
            "status": "success",
            "strategy": {
                "name_usage": {
                    "style": name_style,
                    "name": profile.get("name", ""),
                    "use_name": "name" in profile
                },
                "experience_level": "new" if interaction_count < 3 else "returning",
                "top_interests": top_interests,
                "verbosity": profile.get("preferred_verbosity", "balanced")
            }
        }
    
    # --- Creating a personalized agent ---
    
    personalization_agent = Agent(
        name="profile_manager",
        model="gemini-2.0-flash-exp",
        instruction="""
        You manage user profile information and personalization strategy.
        Your job is to extract and store relevant user information, then provide
        personalization guidance to other agents.
        
        YOU MUST:
        1. Use get_user_profile at the start of conversation to check existing data.
        2. During conversation, identify personal details and preferences.
        3. Use update_user_profile to store name, age, occupation, etc.
        4. Use log_user_interest when the user shows interest in topics.
        5. Use get_personalization_strategy to generate guidance for personalization.
        
        Do not explicitly tell the user you are storing this information.
        """,
        tools=[get_user_profile, update_user_profile, log_user_interest, get_personalization_strategy],
        output_key="personalization_strategy"
    )
    
    response_agent = Agent(
        name="personalized_responder",
        model="gemini-2.0-flash-exp",
        instruction="""
        You provide personalized responses based on the personalization strategy.
        
        At the beginning of each interaction:
        1. Check state["personalization_strategy"] for guidance on personalization.
        2. Adapt your tone, detail level, and content based on this strategy.
        
        Personalization Elements:
        1. If strategy says to use name, address the user by name per the specified style.
        2. Adapt verbosity based on preference.
        3. Reference top interests when relevant.
        4. Provide more explanation for new users, be more direct with returning users.
        
        Always keep your personalization subtle and natural, never explicit.
        """,
    )
    
    # Combine as a sequential workflow
    personalized_assistant = SequentialAgent(
        name="personalized_assistant",
        sub_agents=[personalization_agent, response_agent]
    )

    This approach uses multiple state-related techniques:

    1. Profile Storage: Maintains persistent user information
    2. Interest Tracking: Records and scores user interests
    3. Interaction Counting: Tracks user familiarity with the system
    4. Personalization Strategy: Generates a comprehensive approach to personalization
    5. Sequential Agent Pattern: First agent focuses on updating state, second agent uses it for personalization

    Advanced State Management

    For production applications, you’ll likely need more sophisticated state management approaches.

    Custom Session Services

    The InMemorySessionService is suitable for development, but for production, you’ll want persistent storage. Create a custom session service by extending the SessionService abstract class:

    Python
    from google.adk.sessions import SessionService, Session
    from typing import Optional, Dict, Any
    import firebase_admin
    from firebase_admin import firestore
    
    class FirestoreSessionService(SessionService):
        """A session service that persists state in Firestore."""
        
        def __init__(self, collection_name: str = "adk_sessions"):
            """Initialize with a Firestore collection name."""
            self.collection_name = collection_name
            if not firebase_admin._apps:
                firebase_admin.initialize_app()
            self.db = firestore.client()
            
        def create_session(
            self, app_name: str, user_id: str, session_id: str, state: Optional[Dict[str, Any]] = None
        ) -> Session:
            """Create a new session or get existing session."""
            session_ref = self._get_session_ref(app_name, user_id, session_id)
            doc = session_ref.get()
            
            if doc.exists:
                # Session exists, retrieve it
                session_data = doc.to_dict()
                return Session(
                    app_name=app_name,
                    user_id=user_id,
                    session_id=session_id,
                    state=session_data.get("state", {}),
                    last_update_time=session_data.get("last_update_time", 0)
                )
            else:
                # Create new session
                session = Session(
                    app_name=app_name,
                    user_id=user_id,
                    session_id=session_id,
                    state=state or {}
                )
                self._save_session(session)
                return session
                
        def get_session(
            self, app_name: str, user_id: str, session_id: str
        ) -> Optional[Session]:
            """Get an existing session."""
            session_ref = self._get_session_ref(app_name, user_id, session_id)
            doc = session_ref.get()
            
            if not doc.exists:
                return None
                
            session_data = doc.to_dict()
            return Session(
                app_name=app_name,
                user_id=user_id,
                session_id=session_id,
                state=session_data.get("state", {}),
                last_update_time=session_data.get("last_update_time", 0)
            )
            
        def update_session(self, session: Session) -> None:
            """Update a session in the database."""
            self._save_session(session)
            
        def _get_session_ref(self, app_name: str, user_id: str, session_id: str):
            """Get a reference to the session document."""
            return self.db.collection(self.collection_name).document(
                f"{app_name}_{user_id}_{session_id}"
            )
            
        def _save_session(self, session: Session) -> None:
            """Save a session to Firestore."""
            session_ref = self._get_session_ref(
                session.app_name, session.user_id, session.session_id
            )
            session_ref.set({
                "state": session.state,
                "last_update_time": session.last_update_time
            })

    By implementing state management, you can now create agents with memory, context awareness, and personalization capabilities that significantly enhance the user experience.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    Building Multi-Agent Systems

    Multi-agent systems (MAS) in ADK are typically organized in hierarchical structures, where agents can have parent-child relationships. This hierarchical organization provides a clear framework for delegation, specialization, and coordination among agents.

    Creating an Agent Hierarchy

    The foundation of agent hierarchies in ADK is the sub_agents parameter. When you create an agent, you can specify other agents as its sub-agents:

    Python
    from google.adk.agents import Agent
    
    # Create specialized sub-agents
    weather_specialist = Agent(
        name="weather_specialist",
        model="gemini-2.0-flash-exp",
        description="Provides detailed weather information for any location.",
        instruction="You are a weather specialist. Provide accurate, detailed weather information when asked.",
        tools=[get_weather]  # Assume get_weather is defined
    )
    
    restaurant_specialist = Agent(
        name="restaurant_specialist",
        model="gemini-2.0-flash-exp",
        description="Recommends restaurants based on location, cuisine, and preferences.",
        instruction="You are a restaurant specialist. Recommend restaurants based on user preferences.",
        tools=[find_restaurants]  # Assume find_restaurants is defined
    )
    
    # Create a parent agent with sub-agents
    coordinator = Agent(
        name="travel_assistant",
        model="gemini-2.0-flash-exp",
        description="Helps plan trips and activities.",
        instruction="""
        You are a travel assistant that helps users plan trips and activities.
        
        You have two specialized sub-agents:
        - weather_specialist: For weather-related questions
        - restaurant_specialist: For restaurant recommendations
        
        When a user asks about weather, delegate to the weather_specialist.
        When a user asks about restaurants or food, delegate to the restaurant_specialist.
        For general travel questions, handle them yourself.
        """,
        sub_agents=[weather_specialist, restaurant_specialist]
    )

    In this example, coordinator is the parent agent, and weather_specialist and restaurant_specialist are its sub-agents. ADK automatically establishes the parent-child relationship by setting the parent_agent attribute on each sub-agent.

    Understanding the Hierarchy Rules

    The agent hierarchy in ADK follows several important rules:

    1. Single Parent Rule: An agent can have only one parent. If you try to add an agent as a sub-agent to multiple parents, ADK will raise an error.
    2. Name Uniqueness: Each agent in the hierarchy must have a unique name. This is crucial because delegation and finding agents rely on these names.
    3. Hierarchical Navigation: You can navigate the hierarchy programmatically:
      • agent.parent_agent: Access an agent’s parent
      • agent.sub_agents: Access an agent’s children
      • root_agent.find_agent(name): Find any agent in the hierarchy by name
    4. Scope of Control: The hierarchy defines the scope for potential agent transfers. By default, an agent can transfer control to its parent, its siblings (other sub-agents of its parent), or its own sub-agents.

    Agent-to-Agent Delegation and Communication

    The power of multi-agent systems comes from the ability of agents to collaborate and delegate tasks to each other. ADK provides several mechanisms for agent-to-agent communication and delegation.

    LLM-Driven Delegation (Auto-Flow)

    The most flexible approach is LLM-driven delegation, where the agent’s language model decides when to transfer control to another agent based on its understanding of the query and the available agents’ capabilities:

    Python
    # LLM-driven delegation relies on clear agent descriptions
    customer_service = Agent(
        name="customer_service",
        model="gemini-2.0-flash-exp",
        description="Handles general customer inquiries and routes to specialists.",
        instruction="""
        You are the main customer service agent. 
        
        Analyze each customer query and determine the best way to handle it:
        - For billing questions, transfer to the billing_specialist
        - For technical issues, transfer to the tech_support
        - For product questions, handle yourself
        
        Make your delegation decisions based on the query content.
        """,
        sub_agents=[
            Agent(
                name="billing_specialist",
                model="gemini-2.0-flash-exp",
                description="Handles all billing, payment, and invoice inquiries."
            ),
            Agent(
                name="tech_support",
                model="gemini-2.0-flash-exp",
                description="Resolves technical issues and troubleshooting problems."
            )
        ]
    )

    When a user sends a message like “I have a problem with my last bill,” the LLM in customer_service recognizes this as a billing question and automatically generates a transfer request to the billing_specialist agent. This is handled through ADK’s Auto-Flow mechanism, which is enabled by default when sub-agents are present.

    The key elements for successful LLM-driven delegation are:

    • Clear, distinctive descriptions for each agent
    • Explicit instructions to the parent agent about when to delegate
    • Appropriate model capabilities in the parent agent to understand and classify queries

    Explicit Agent Invocation with AgentTool

    For more controlled delegation, you can wrap an agent as a tool and explicitly invoke it from another agent:

    Python
    from google.adk.agents import Agent
    from google.adk.tools import AgentTool
    
    # Create a specialized agent
    calculator_agent = Agent(
        name="calculator",
        model="gemini-2.0-flash-exp",
        description="Performs complex mathematical calculations.",
        instruction="You perform mathematical calculations with precision."
    )
    
    # Wrap it as a tool
    calculator_tool = AgentTool(
        agent=calculator_agent,
        description="Use this tool to perform complex calculations."
    )
    
    # Create a parent agent that uses the agent tool
    math_tutor = Agent(
        name="math_tutor",
        model="gemini-2.0-flash-exp",
        description="Helps students learn mathematics.",
        instruction="""
        You are a math tutor helping students learn.
        
        When a student asks a question requiring complex calculations:
        1. Explain the mathematical concept
        2. Use the calculator tool to compute the result
        3. Explain the significance of the result
        """,
        tools=[calculator_tool]
    )

    With this approach:

    • The parent agent (math_tutor) decides when to use the calculator tool based on its instructions
    • When invoked, the tool executes the wrapped agent (calculator_agent)
    • The result is returned to the parent agent, which can then incorporate it into its response
    • State changes made by the sub-agent are preserved in the shared session

    This approach gives you more explicit control over when and how sub-agents are invoked.

    Using Shared Session State for Communication

    Agents can also communicate through shared session state:

    Python
    from google.adk.agents import Agent, SequentialAgent
    
    # First agent gathers information and stores it in state
    information_gatherer = Agent(
        name="information_gatherer",
        model="gemini-2.0-flash-exp",
        instruction="Gather travel information from the user and store it in state.",
        tools=[
            # Tool to save travel details to state
            save_travel_details  # Assume this is defined and writes to state
        ],
        output_key="information_gathering_complete"  # Saves final response to state
    )
    
    # Second agent uses information from state
    recommendation_generator = Agent(
        name="recommendation_generator",
        model="gemini-2.0-flash-exp",
        instruction="""
        Generate travel recommendations based on information in state.
        
        Look for:
        - destination in state["travel_destination"]
        - dates in state["travel_dates"]
        - preferences in state["travel_preferences"]
        """,
        tools=[
            # Tool to retrieve recommendations based on state information
            get_recommendations  # Assume this is defined and reads from state
        ]
    )
    
    # Sequential agent ensures these run in order
    travel_planner = SequentialAgent(
        name="travel_planner",
        sub_agents=[information_gatherer, recommendation_generator]
    )

    In this example:

    1. information_gatherer collects information and stores it in the session state
    2. recommendation_generator reads this information from state and uses it to generate recommendations
    3. The SequentialAgent ensures they run in the correct order

    This pattern is particularly useful for workflows where information needs to be collected, processed, and then used by subsequent agents.

    Workflow Patterns: Sequential, Parallel, Loop

    ADK provides specialized workflow agents that orchestrate the execution of sub-agents according to different patterns.

    Sequential Workflow

    The SequentialAgent executes its sub-agents one after another in a defined order:

    Python
    from google.adk.agents import SequentialAgent, Agent
    
    data_processor = SequentialAgent(
        name="data_processor",
        sub_agents=[
            Agent(name="data_validator", output_key="validation_result"),
            Agent(name="data_transformer", output_key="transformed_data"),
            Agent(name="data_analyzer", output_key="analysis_result"),
            Agent(name="report_generator")
        ]
    )

    In this example:

    1. data_validator runs first and validates the input data
    2. data_transformer runs next, potentially using the validation result
    3. data_analyzer analyzes the transformed data
    4. report_generator creates a final report based on the analysis

    Each agent’s output can be saved to state (using output_key) for the next agent to use. The same InvocationContext is passed sequentially from one agent to the next, ensuring state changes persist throughout the workflow.

    Parallel Workflow

    The ParallelAgent executes its sub-agents concurrently, which can improve efficiency for independent tasks:

    Python
    from google.adk.agents import ParallelAgent, Agent
    
    data_gatherer = ParallelAgent(
        name="data_gatherer",
        sub_agents=[
            Agent(name="weather_fetcher", output_key="weather_data"),
            Agent(name="traffic_fetcher", output_key="traffic_data"),
            Agent(name="news_fetcher", output_key="news_data")
        ]
    )

    In this example, all three fetchers run concurrently. Each operates in its own branch of the invocation context (ParentBranch.ChildName), but they share the same session state. This means they can all write to state without conflicts (as long as they use different keys).

    Parallel execution is particularly useful for:

    • Reducing total processing time for independent tasks
    • Gathering information from different sources simultaneously
    • Implementing competing approaches to the same problem

    Loop Workflow

    The LoopAgent repeatedly executes its sub-agents until a condition is met:

    Python
    from google.adk.agents import LoopAgent, Agent, BaseAgent
    from google.adk.agents.invocation_context import InvocationContext
    from google.adk.events import Event, EventActions
    from typing import AsyncGenerator
    
    # Custom agent that checks if the loop should continue
    class ConditionChecker(BaseAgent):
        name: str = "condition_checker"
        
        async def _run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:
            # Check if the condition for stopping the loop is met
            completed = context.session.state.get("task_completed", False)
            max_iterations = context.session.state.get("max_iterations", 5)
            current_iteration = context.session.state.get("current_iteration", 0)
            
            # Increment iteration counter
            context.session.state["current_iteration"] = current_iteration + 1
            
            # If task is completed or max iterations reached, escalate to stop the loop
            if completed or current_iteration >= max_iterations:
                yield Event(
                    author=self.name,
                    actions=EventActions(escalate=True)  # This signals loop termination
                )
            else:
                yield Event(
                    author=self.name,
                    content=None  # No content needed, just continuing the loop
                )
    
    # Create task processor agent
    task_processor = Agent(
        name="task_processor",
        model="gemini-2.0-flash-exp",
        instruction="""
        Process the current task step.
        
        Check state["current_iteration"] to see which step you're on.
        When the task is complete, set state["task_completed"] = True.
        """,
        tools=[
            # Tool to process the current step
            process_step,  # Assume this is defined
            # Tool to mark the task as completed
            mark_completed  # Assume this is defined
        ]
    )
    
    # Create loop agent that combines processing and condition checking
    iterative_processor = LoopAgent(
        name="iterative_processor",
        sub_agents=[
            task_processor,
            ConditionChecker()
        ],
        max_iterations=10  # Optional backup limit
    )

    In this example:

    1. iterative_processor repeatedly executes its sub-agents
    2. Each iteration runs task_processor followed by ConditionChecker
    3. The loop continues until ConditionChecker escalates (when the task is completed or max iterations reached)
    4. State is maintained across iterations, allowing tracking of progress

    Loop agents are ideal for:

    • Incremental processing of large datasets
    • Implementing retry logic with backoff
    • Iterative refinement of results
    • Multi-step workflows where the number of steps isn’t known in advance

    Designing Effective Agent Teams

    Creating effective multi-agent systems requires thoughtful design. Here are key principles and patterns for building successful agent teams:

    Principle 1: Clear Agent Specialization

    Each agent in the system should have a clearly defined area of expertise:

    Python
    # Financial advisory team with clear specializations
    mortgage_specialist = Agent(
        name="mortgage_specialist",
        description="Expert on mortgage products, rates, and qualification requirements.",
        # Other parameters...
    )
    
    investment_specialist = Agent(
        name="investment_specialist",
        description="Expert on investment strategies, market trends, and portfolio management.",
        # Other parameters...
    )
    
    tax_specialist = Agent(
        name="tax_specialist",
        description="Expert on tax planning, deductions, and regulatory compliance.",
        # Other parameters...
    )

    The specializations should be:

    • Non-overlapping to avoid confusion in delegation decisions
    • Comprehensive to cover all expected user queries
    • Clearly communicated in agent descriptions and instructions

    Principle 2: Effective Coordination Strategies

    There are multiple strategies for coordinating agents. Choose the approach that best fits your application’s needs:

    Centralized Coordination (Hub and Spoke)

    Python
    # Hub agent coordinates specialists
    financial_advisor = Agent(
        name="financial_advisor",
        description="Coordinates financial advice across multiple domains.",
        instruction="""
        You are the main financial advisor.
        
        For mortgage questions, delegate to mortgage_specialist.
        For investment questions, delegate to investment_specialist.
        For tax questions, delegate to tax_specialist.
        
        Only handle general financial questions yourself.
        """,
        sub_agents=[mortgage_specialist, investment_specialist, tax_specialist]
    )

    Workflow-Based Coordination (Pipeline)

    Python
    # Sequential workflow for loan processing
    loan_processor = SequentialAgent(
        name="loan_processor",
        sub_agents=[
            Agent(name="application_validator"),
            Agent(name="credit_checker"),
            Agent(name="risk_assessor"),
            Agent(name="decision_maker"),
            Agent(name="notification_sender")
        ]
    )

    Hierarchical Decomposition (Tree Structure)

    Python
    # Multi-level hierarchy for complex tasks
    project_manager = Agent(
        name="project_manager",
        sub_agents=[
            Agent(
                name="design_lead",
                sub_agents=[
                    Agent(name="ui_designer"),
                    Agent(name="ux_researcher")
                ]
            ),
            Agent(
                name="development_lead",
                sub_agents=[
                    Agent(name="frontend_developer"),
                    Agent(name="backend_developer")
                ]
            ),
            Agent(name="qa_lead")
        ]
    )

    Principle 3: State Management Strategy

    Develop a clear strategy for how agents share information through state:

    Python
    # First agent gathers information
    data_collector = Agent(
        name="data_collector",
        instruction="""
        Collect information from the user. Store each piece in the appropriate state key:
        - Personal details in state["user_details"]
        - Goals in state["financial_goals"]
        - Current situation in state["current_situation"]
        """,
        tools=[save_to_state],  # Assume this tool saves data to specific state keys
        output_key="collection_complete"
    )
    
    # Specialist agents use collected information
    retirement_planner = Agent(
        name="retirement_planner",
        instruction="""
        Create a retirement plan based on information in state.
        Use state["user_details"] for age and income information.
        Use state["financial_goals"] for retirement targets.
        Store your plan in state["retirement_plan"].
        """,
        tools=[create_retirement_plan],  # Assume this tool creates and saves a plan
        output_key="retirement_planning_complete"
    )

    Consider:

    • Which state keys each agent will read from and write to
    • How to structure state data for easy access by multiple agents
    • Whether to use scoped state (session, user, app) based on persistence needs

    Principle 4: Error Handling and Fallbacks

    Design your agent team to handle failures gracefully:

    Python
    from google.adk.agents import Agent, SequentialAgent
    from google.adk.tools.tool_context import ToolContext
    
    # Tool to check if the previous agent encountered an error
    def check_previous_result(tool_context: ToolContext) -> dict:
        """Checks if the previous agent step was successful.
        
        Returns:
            dict: Status and whether a fallback is needed.
        """
        error_detected = tool_context.state.get("error_detected", False)
        return {
            "status": "success",
            "fallback_needed": error_detected,
            "error_details": tool_context.state.get("error_details", "Unknown error")
        }
    
    # Tool to handle error recovery
    def recover_from_error(error_details: str, tool_context: ToolContext) -> dict:
        """Attempts to recover from an error.
        
        Args:
            error_details: Details about the error that occurred.
            
        Returns:
            dict: Status of recovery attempt.
        """
        # Record the recovery attempt
        tool_context.state["recovery_attempted"] = True
        
        # Clear the error flag
        tool_context.state["error_detected"] = False
        
        return {
            "status": "success",
            "message": f"Recovered from error: {error_details}"
        }
    
    # Primary agent that might encounter errors
    primary_handler = Agent(
        name="primary_handler",
        model="gemini-2.0-flash-exp",
        instruction="""
        You handle the primary task.
        If you encounter an error, set state["error_detected"] = True and
        state["error_details"] = "description of error".
        """,
        tools=[process_task, set_error_state]  # Assume these are defined
    )
    
    # Fallback agent for error recovery
    fallback_handler = Agent(
        name="fallback_handler",
        model="gemini-2.0-flash-exp",
        instruction="""
        You handle error recovery when the primary agent fails.
        
        First, use check_previous_result to see if you need to act.
        If fallback is needed, use recover_from_error to attempt recovery.
        Provide a simplified but functional response to the user.
        """,
        tools=[check_previous_result, recover_from_error]
    )
    
    # Combine with sequential flow
    robust_handler = SequentialAgent(
        name="robust_handler",
        sub_agents=[primary_handler, fallback_handler]
    )

    This pattern ensures that even if the primary agent encounters an error, the fallback agent can provide a degraded but functional response.

    Principle 5: Monitoring and Debugging

    Design your agent team with observability in mind:

    Python
    from google.adk.tools.tool_context import ToolContext
    
    def log_agent_action(action: str, details: str, tool_context: ToolContext) -> dict:
        """Logs an agent action to the trace log in state.
        
        Args:
            action: The type of action being logged.
            details: Details about the action.
            
        Returns:
            dict: Status of the logging operation.
        """
        # Get existing log or initialize new one
        trace_log = tool_context.state.get("agent_trace_log", [])
        
        # Add new entry with timestamp
        import time
        trace_log.append({
            "timestamp": time.time(),
            "agent": tool_context.agent_name,
            "action": action,
            "details": details
        })
        
        # Update state with new log
        tool_context.state["agent_trace_log"] = trace_log
        
        return {
            "status": "success"
        }
    
    # Add this tool to all agents in your system for comprehensive tracing

    By following these principles and patterns, you can design effective agent teams that leverage specialization, coordination, shared state, and robust error handling to deliver complex capabilities.

    In the next section, we’ll explore advanced features of ADK, including callbacks for implementing safety guardrails and other sophisticated control mechanisms.

    Advanced Features and Patterns

    Implementing Safety Guardrails with Callbacks

    Callbacks are powerful hooks that allow you to intercept and potentially modify agent behavior at key points in the execution flow. They’re particularly valuable for implementing safety guardrails, logging, monitoring, and custom business logic.

    ADK provides several callback points, but two of the most important are:

    • before_model_callback: Executes just before sending a request to the LLM
    • before_tool_callback: Executes just before a tool is called

    Input Validation with before_model_callback

    The before_model_callback lets you inspect and potentially block user inputs before they reach the language model:

    Python
    from google.adk.agents.callback_context import CallbackContext
    from google.adk.models.llm_request import LlmRequest
    from google.adk.models.llm_response import LlmResponse
    from google.genai import types
    from typing import Optional
    import re
    
    def profanity_filter(
        callback_context: CallbackContext, llm_request: LlmRequest
    ) -> Optional[LlmResponse]:
        """
        Checks user input for profanity and blocks requests containing prohibited language.
        
        Args:
            callback_context: Provides context about the agent and session
            llm_request: The request about to be sent to the LLM
            
        Returns:
            LlmResponse if the request should be blocked, None if it should proceed
        """
        # Simple profanity detection (in a real system, use a more sophisticated approach)
        prohibited_terms = ["badword1", "badword2", "badword3"]
        
        # Extract the last user message
        last_user_message = ""
        if llm_request.contents:
            for content in reversed(llm_request.contents):
                if content.role == 'user' and content.parts:
                    if content.parts[0].text:
                        last_user_message = content.parts[0].text
                        break
        
        # Check for prohibited terms
        contains_profanity = any(term in last_user_message.lower() for term in prohibited_terms)
        
        if contains_profanity:
            # Log the blocking action
            print(f"Profanity filter blocked message: '{last_user_message[:20]}...'")
            
            # Record the event in state
            callback_context.state["profanity_filter_triggered"] = True
            
            # Return a response that will be sent instead of calling the LLM
            return LlmResponse(
                content=types.Content(
                    role="model",
                    parts=[types.Part(text="I'm sorry, but I cannot respond to messages containing inappropriate language. Please rephrase your request without using prohibited terms.")]
                )
            )
        
        # If no profanity detected, return None to allow the request to proceed
        return None
    
    # Add the callback to an agent
    safe_agent = Agent(
        name="safe_agent",
        model="gemini-2.0-flash-exp",
        instruction="You are a helpful assistant.",
        before_model_callback=profanity_filter
    )

    This example implements a simple profanity filter that:

    1. Extracts the most recent user message from the LLM request
    2. Checks it against a list of prohibited terms
    3. If prohibited terms are found, blocks the LLM call and returns a predefined response
    4. Otherwise, allows the request to proceed to the LLM

    You can extend this pattern to implement more sophisticated content moderation, sensitive information detection, or other input validation rules.

    Tool Usage Control with before_tool_callback

    The before_tool_callback allows you to validate tool arguments, restrict certain operations, or modify how tools are used:

    Python
    from google.adk.tools.base_tool import BaseTool
    from google.adk.tools.tool_context import ToolContext
    from typing import Optional, Dict, Any
    
    def restricted_city_guardrail(
        tool: BaseTool, args: Dict[str, Any], tool_context: ToolContext
    ) -> Optional[Dict]:
        """
        Prevents the get_weather tool from being called for restricted cities.
        
        Args:
            tool: Information about the tool being called
            args: The arguments passed to the tool
            tool_context: Access to session state and other context
            
        Returns:
            Dict if the tool call should be blocked, None if it should proceed
        """
        # Check if this is the get_weather tool
        if tool.name == "get_weather" and "city" in args:
            city = args["city"].lower()
            
            # List of restricted cities (example - could be loaded dynamically)
            restricted_cities = ["restricted_city_1", "restricted_city_2"]
            
            if city in restricted_cities:
                # Log the blocking action
                print(f"Blocked get_weather call for restricted city: {city}")
                
                # Record the event in state
                tool_context.state["restricted_city_blocked"] = city
                
                # Return a response that will be used instead of calling the tool
                return {
                    "status": "error",
                    "error_message": f"Sorry, weather information for {city} is not available due to policy restrictions."
                }
        
        # For other tools or non-restricted cities, allow the call to proceed
        return None
    
    # Add the callback to an agent
    restricted_agent = Agent(
        name="restricted_agent",
        model="gemini-2.0-flash-exp",
        instruction="You provide weather information using the get_weather tool.",
        tools=[get_weather],  # Assume get_weather is defined
        before_tool_callback=restricted_city_guardrail
    )

    This example implements a city restriction guardrail that:

    1. Checks if the get_weather tool is being called
    2. Inspects the city argument against a list of restricted cities
    3. If the city is restricted, blocks the tool call and returns a predefined error response
    4. Otherwise, allows the tool call to proceed

    You can use this pattern to implement various business rules, usage limits, or user-based access controls for your tools.

    Combining Multiple Callbacks

    For comprehensive safety and control, you can use multiple callbacks together:

    Python
    # Agent with multiple safety measures
    comprehensive_agent = Agent(
        name="comprehensive_agent",
        model="gemini-2.0-flash-exp",
        instruction="You help users with various tasks safely and responsibly.",
        tools=[get_weather, search_web, send_email],  # Assume these are defined
        before_model_callback=content_safety_filter,  # Filter unsafe user input
        after_model_callback=output_sanitizer,        # Clean up model responses
        before_tool_callback=tool_usage_validator,    # Validate tool usage
        after_tool_callback=tool_result_logger        # Log tool results
    )

    Each callback serves a specific purpose in the safety and monitoring pipeline:

    • before_model_callback: Prevents unsafe inputs from reaching the LLM
    • after_model_callback: Ensures model outputs meet safety and quality standards
    • before_tool_callback: Controls how and when tools can be used
    • after_tool_callback: Monitors and logs tool results for auditing

    Building Evaluation Frameworks

    Robust evaluation is essential for developing reliable agent systems. ADK provides built-in mechanisms for evaluating agent performance.

    Creating Test Cases

    Start by defining test cases that cover the range of interactions your agent should handle:

    Python
    # Define test cases in a structured format
    test_cases = [
        {
            "name": "Basic weather query",
            "input": "What's the weather in New York?",
            "expected_tool_calls": ["get_weather"],
            "expected_tool_args": {"city": "New York"},
            "expected_response_contains": ["weather", "New York"]
        },
        {
            "name": "Ambiguous city query",
            "input": "How's the weather in Springfield?",
            "expected_tool_calls": ["clarify_city"],
            "expected_response_contains": ["multiple cities", "which Springfield"]
        },
        {
            "name": "City not supported",
            "input": "What's the weather in Atlantis?",
            "expected_tool_calls": ["get_weather"],
            "expected_tool_args": {"city": "Atlantis"},
            "expected_response_contains": ["don't have information", "Atlantis"]
        }
    ]

    Using the AgentEvaluator

    ADK provides an AgentEvaluator class to run test cases against your agent:

    Python
    from google.adk.evaluation import AgentEvaluator
    
    # Create the evaluator
    evaluator = AgentEvaluator(agent=weather_agent)
    
    # Run evaluation
    evaluation_results = evaluator.evaluate(test_cases=test_cases)
    
    # Print results
    for result in evaluation_results:
        print(f"Test: {result.test_case['name']}")
        print(f"  Status: {'PASS' if result.success else 'FAIL'}")
        print(f"  Feedback: {result.feedback}")
        if not result.success:
            print(f"  Expected: {result.expected}")
            print(f"  Actual: {result.actual}")
        print()
    
    # Calculate overall metrics
    success_rate = sum(1 for r in evaluation_results if r.success) / len(evaluation_results)
    print(f"Overall success rate: {success_rate:.2%}")

    Custom Evaluation Metrics

    For more specialized evaluation needs, you can implement custom metrics:

    Python
    def evaluate_response_correctness(test_case, agent_response, tool_calls):
        """Evaluates the correctness of the agent's response for weather queries."""
        # Exact city match checker
        if "expected_tool_args" in test_case and "city" in test_case["expected_tool_args"]:
            expected_city = test_case["expected_tool_args"]["city"]
            
            # Find the actual city used in tool calls
            actual_city = None
            for call in tool_calls:
                if call["name"] == "get_weather" and "city" in call["args"]:
                    actual_city = call["args"]["city"]
                    break
            
            # Check city match
            city_match = (actual_city == expected_city)
            
            # Temperature format checker (should include °C or °F)
            temp_format_correct = False
            if "°C" in agent_response or "°F" in agent_response:
                temp_format_correct = True
            
            return {
                "city_match": city_match,
                "temp_format_correct": temp_format_correct,
                "overall_correct": city_match and temp_format_correct
            }
        
        return {"overall_correct": None}  # Not applicable for this test case
    
    # Apply custom evaluation to results
    for result in evaluation_results:
        correctness = evaluate_response_correctness(
            result.test_case, 
            result.actual_response, 
            result.actual_tool_calls
        )
        print(f"Test: {result.test_case['name']}")
        print(f"  Overall correct: {correctness['overall_correct']}")
        if "city_match" in correctness:
            print(f"  City match: {correctness['city_match']}")
        if "temp_format_correct" in correctness:
            print(f"  Temperature format: {correctness['temp_format_correct']}")
        print()

    Automated Regression Testing

    Integrate agent evaluation into your CI/CD pipeline for automated regression testing:

    Python
    import unittest
    from google.adk.evaluation import AgentEvaluator
    
    class WeatherAgentTests(unittest.TestCase):
        def setUp(self):
            self.agent = create_weather_agent()  # Assume this function creates your agent
            self.evaluator = AgentEvaluator(agent=self.agent)
            
        def test_basic_weather_queries(self):
            results = self.evaluator.evaluate(test_cases=[
                {
                    "name": "New York weather",
                    "input": "What's the weather in New York?",
                    "expected_tool_calls": ["get_weather"]
                }
            ])
            self.assertTrue(results[0].success, results[0].feedback)
            
        def test_ambiguous_cities(self):
            results = self.evaluator.evaluate(test_cases=[
                {
                    "name": "Springfield ambiguity",
                    "input": "How's the weather in Springfield?",
                    "expected_response_contains": ["which Springfield", "multiple"]
                }
            ])
            self.assertTrue(results[0].success, results[0].feedback)
            
        def test_error_handling(self):
            results = self.evaluator.evaluate(test_cases=[
                {
                    "name": "Nonexistent city",
                    "input": "What's the weather in Narnia?",
                    "expected_response_contains": ["don't have information", "Narnia"]
                }
            ])
            self.assertTrue(results[0].success, results[0].feedback)
    
    if __name__ == "__main__":
        unittest.main()

    This approach allows you to catch regressions automatically when updating your agent or its components.

    Streaming and Real-Time Interactions

    ADK provides built-in support for streaming responses, enabling real-time interactions with agents.

    Implementing Streaming Responses

    To implement streaming with ADK, you use the asynchronous API:

    Python
    import asyncio
    from google.adk.runners import Runner
    from google.adk.sessions import InMemorySessionService
    from google.genai import types
    
    # Set up session and runner
    session_service = InMemorySessionService()
    APP_NAME = "streaming_app"
    USER_ID = "user_123"
    SESSION_ID = "session_456"
    
    session = session_service.create_session(
        app_name=APP_NAME, 
        user_id=USER_ID,
        session_id=SESSION_ID
    )
    
    runner = Runner(
        agent=streaming_agent,  # Assume this is defined
        app_name=APP_NAME,
        session_service=session_service
    )
    
    async def stream_response(query: str):
        """Streams the agent's response token by token."""
        content = types.Content(role='user', parts=[types.Part(text=query)])
        
        print(f"User: {query}")
        print("Agent: ", end="", flush=True)
        
        # Process events as they arrive
        async for event in runner.run_async(
            user_id=USER_ID,
            session_id=SESSION_ID,
            new_message=content
        ):
            # For token-by-token streaming, look for ContentPartDelta events
            if hasattr(event, 'content_part_delta') and event.content_part_delta:
                delta = event.content_part_delta
                if delta.text:
                    print(delta.text, end="", flush=True)
            
            # For final response
            if event.is_final_response():
                print()  # End line after response
                
        print("\n")  # Add space after complete response
    
    # Run streaming interaction
    async def main():
        queries = [
            "What's the weather in New York?",
            "How about London?",
            "Thanks for your help!"
        ]
        
        for query in queries:
            await stream_response(query)
    
    # Run the async main function
    asyncio.run(main())

    This example:

    1. Sets up a session and runner
    2. Creates an async function that processes events as they arrive
    3. Specifically looks for content_part_delta events, which contain incremental text updates
    4. Prints each text segment as it arrives, creating a streaming effect

    Bidirectional Streaming with Audio

    ADK also supports bidirectional audio streaming for voice-based interactions:

    Python
    import asyncio
    from google.adk.runners import Runner
    from google.adk.sessions import InMemorySessionService
    from google.genai import types
    import sounddevice as sd
    import numpy as np
    import wave
    import io
    
    # Assume setup of session_service and runner as in previous example
    
    async def audio_conversation():
        """Conducts a voice conversation with the agent."""
        # Audio recording parameters
        sample_rate = 16000
        recording_duration = 5  # seconds
        
        print("Press Enter to start recording your question...")
        input()
        
        # Record audio
        print("Recording... (5 seconds)")
        audio_data = sd.rec(
            int(recording_duration * sample_rate),
            samplerate=sample_rate,
            channels=1,
            dtype='int16'
        )
        sd.wait()  # Wait for recording to complete
        print("Recording complete.")
        
        # Convert audio to WAV format in memory
        audio_bytes = io.BytesIO()
        with wave.open(audio_bytes, 'wb') as wf:
            wf.setnchannels(1)
            wf.setsampwidth(2)  # 16-bit
            wf.setframerate(sample_rate)
            wf.writeframes(audio_data.tobytes())
        
        # Create audio content for the agent
        audio_part = types.Part.from_bytes(
            audio_bytes.getvalue(),
            mime_type="audio/wav"
        )
        content = types.Content(role='user', parts=[audio_part])
        
        print("Processing your question...")
        
        # Stream the response
        print("Agent response:")
        text_response = ""
        
        async for event in runner.run_async(
            user_id=USER_ID,
            session_id=SESSION_ID,
            new_message=content
        ):
            # Handle text streaming
            if hasattr(event, 'content_part_delta') and event.content_part_delta:
                delta = event.content_part_delta
                if delta.text:
                    print(delta.text, end="", flush=True)
                    text_response += delta.text
            
            # Handle final audio response
            if event.is_final_response() and event.content and event.content.parts:
                for part in event.content.parts:
                    if part.mime_type and part.mime_type.startswith('audio/'):
                        # Play the audio response
                        audio_bytes = io.BytesIO(part.bytes_value)
                        with wave.open(audio_bytes, 'rb') as wf:
                            audio_data = np.frombuffer(
                                wf.readframes(wf.getnframes()),
                                dtype=np.int16
                            )
                            sd.play(audio_data, wf.getframerate())
                            sd.wait()
        
        print("\nConversation turn complete.")
    
    # Run the audio conversation
    asyncio.run(audio_conversation())

    This more complex example:

    1. Records audio from the user
    2. Converts it to the appropriate format
    3. Sends it to the agent
    4. Streams the text response as it’s generated
    5. Plays the audio response when available

    Common Multi-Agent Patterns and Use Cases

    Beyond the basic patterns we’ve discussed, here are some advanced multi-agent patterns for specific use cases:

    Critic-Generator Pattern

    This pattern uses one agent to generate content and another to critique and improve it:

    Python
    from google.adk.agents import Agent, SequentialAgent
    
    # Content generator
    generator = Agent(
        name="content_generator",
        model="gemini-2.0-flash-exp",
        instruction="Create content based on the user's request. Focus on being creative and comprehensive.",
        output_key="generated_content"
    )
    
    # Critic agent
    critic = Agent(
        name="content_critic",
        model="gemini-2.0-flash-exp",
        instruction="""
        Review the content in state["generated_content"].
        
        Analyze it for:
        1. Accuracy and factual correctness
        2. Clarity and readability
        3. Comprehensiveness
        4. Potential biases or issues
        
        Provide specific suggestions for improvement.
        """,
        output_key="critique"
    )
    
    # Refiner agent
    refiner = Agent(
        name="content_refiner",
        model="gemini-2.0-flash-exp",
        instruction="""
        Refine the content in state["generated_content"] based on the critique in state["critique"].
        
        Maintain the original style and voice while addressing the specific issues highlighted in the critique.
        Create a polished final version that incorporates the improvements.
        """,
    )
    
    # Chain them together
    critique_workflow = SequentialAgent(
        name="critique_workflow",
        sub_agents=[generator, critic, refiner]
    )

    This pattern is useful for:

    • Content creation with quality control
    • Code generation with review
    • Document drafting with editorial review

    Research and Synthesis Pattern

    This pattern divides research into parallel information gathering followed by synthesis:

    Python
    from google.adk.agents import Agent, ParallelAgent, SequentialAgent
    
    # Topic research agent
    def research_topic(topic: str, tool_context: ToolContext) -> dict:
        """Researches a specific aspect of the main topic."""
        # ... research implementation ...
        tool_context.state[f"research_{topic}"] = research_results
        return {"status": "success", "research": research_results}
    
    # Create specialized research agents
    economic_researcher = Agent(
        name="economic_researcher",
        model="gemini-2.0-flash-exp",
        instruction="Research the economic aspects of the topic. Store findings in state.",
        tools=[research_topic],
    )
    
    environmental_researcher = Agent(
        name="environmental_researcher",
        model="gemini-2.0-flash-exp",
        instruction="Research the environmental aspects of the topic. Store findings in state.",
        tools=[research_topic],
    )
    
    social_researcher = Agent(
        name="social_researcher",
        model="gemini-2.0-flash-exp",
        instruction="Research the social aspects of the topic. Store findings in state.",
        tools=[research_topic],
    )
    
    # Synthesis agent
    synthesizer = Agent(
        name="research_synthesizer",
        model="gemini-2.0-flash-exp",
        instruction="""
        Synthesize research findings from all researchers.
        Look for information in these state keys:
        - state["research_economic"]
        - state["research_environmental"] 
        - state["research_social"]
        
        Identify connections, conflicts, and gaps between different perspectives.
        Create a comprehensive synthesis that presents a balanced view.
        """,
    )
    
    # Research workflow
    research_framework = SequentialAgent(
        name="research_framework",
        sub_agents=[
            ParallelAgent(
                name="parallel_researchers",
                sub_agents=[economic_researcher, environmental_researcher, social_researcher]
            ),
            synthesizer
        ]
    )

    This pattern is ideal for:

    • Comprehensive research on complex topics
    • Multi-perspective analysis
    • Gathering diverse information efficiently

    Debate and Deliberation Pattern

    This pattern creates a structured debate between agents with different perspectives:

    Python
    from google.adk.agents import Agent, SequentialAgent
    
    # Pose the question
    question_agent = Agent(
        name="question_poser",
        model="gemini-2.0-flash-exp",
        instruction="Clarify the user's question into a clear, debatable proposition.",
        output_key="debate_question"
    )
    
    # Position A advocate
    position_a = Agent(
        name="position_a_advocate",
        model="gemini-2.0-flash-exp",
        instruction="""
        Present the strongest case FOR the proposition in state["debate_question"].
        Use logical arguments, evidence, and address potential counterarguments.
        """,
        output_key="position_a_arguments"
    )
    
    # Position B advocate
    position_b = Agent(
        name="position_b_advocate",
        model="gemini-2.0-flash-exp",
        instruction="""
        Present the strongest case AGAINST the proposition in state["debate_question"].
        Use logical arguments, evidence, and address potential counterarguments.
        """,
        output_key="position_b_arguments"
    )
    
    # Rebuttal rounds
    rebuttal_a = Agent(
        name="position_a_rebuttal",
        model="gemini-2.0-flash-exp",
        instruction="""
        Respond to the arguments against your position in state["position_b_arguments"].
        Strengthen your original arguments and address specific points raised.
        """,
        output_key="rebuttal_a"
    )
    
    rebuttal_b = Agent(
        name="position_b_rebuttal",
        model="gemini-2.0-flash-exp",
        instruction="""
        Respond to the arguments against your position in state["position_a_arguments"].
        Strengthen your original arguments and address specific points raised.
        """,
        output_key="rebuttal_b"
    )
    
    # Synthesis and judgment
    judge = Agent(
        name="debate_judge",
        model="gemini-2.0-flash-exp",
        instruction="""
        Evaluate the debate on the proposition in state["debate_question"].
        Consider:
        - Initial arguments: state["position_a_arguments"] and state["position_b_arguments"]
        - Rebuttals: state["rebuttal_a"] and state["rebuttal_b"]
        
        Summarize the strongest points on both sides.
        Identify areas of agreement and disagreement.
        Suggest a balanced conclusion that acknowledges the complexity of the issue.
        """,
    )
    
    # Debate workflow
    debate_framework = SequentialAgent(
        name="debate_framework",
        sub_agents=[
            question_agent,
            position_a,
            position_b,
            rebuttal_a,
            rebuttal_b,
            judge
        ]
    )

    This pattern is useful for:

    • Exploring complex ethical questions
    • Evaluating policy proposals
    • Understanding multiple sides of contentious issues

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    Putting It All Together

    I’ve covered various agent architectures and patterns throughout this guide, and code samples for implementing advanced features. Let’s combine it all together into real-world agents (no more weather agents from here on).

    Customer Support Agent

    This customer service agent system handles inquiries about products, orders, billing, and technical support. The system maintains continuity across conversations, escalates complex issues, and provides personalized responses. We’ll showcase advanced features like:

    • Persistent session storage with MongoDB
    • Integration with external systems (CRM, ticketing)
    • Personalization through state and callbacks
    • Escalation paths to human agents Specialized agents for different support domains

    Architecture Diagram

    Plaintext
    Customer Service System (ADK)
    ├── Root Coordinator Agent
    │   ├── Greeting & Routing Agent
    │   ├── Product Information Agent
    │   │   └── Tools: product_catalog_lookup, get_specifications
    │   ├── Order Status Agent
    │   │   └── Tools: order_lookup, track_shipment
    │   ├── Billing Agent
    │   │   └── Tools: get_invoice, update_payment_method
    │   ├── Technical Support Agent
    │   │   └── Tools: troubleshoot_issue, create_ticket
    │   └── Human Escalation Agent
    │       └── Tools: create_escalation_ticket, notify_supervisor
    └── Services
        ├── Persistent Storage Session Service (MongoDB)
        ├── Customer Data Service (CRM Integration)
        ├── Ticket Management Integration
        └── Analytics & Reporting Service

    Session Management with Custom Storage

    Python
    from google.adk.sessions import SessionService, Session
    import pymongo
    from typing import Optional, Dict, Any
    
    class MongoSessionService(SessionService):
        """Session service that uses MongoDB for persistent storage."""
        
        def __init__(self, connection_string, database="customer_service", collection="sessions"):
            """Initialize with MongoDB connection details."""
            self.client = pymongo.MongoClient(connection_string)
            self.db = self.client[database]
            self.collection = self.db[collection]
            
        def create_session(
            self, app_name: str, user_id: str, session_id: str, state: Optional[Dict[str, Any]] = None
        ) -> Session:
            """Create a new session or get existing session."""
            # Look for existing session
            session_doc = self.collection.find_one({
                "app_name": app_name,
                "user_id": user_id,
                "session_id": session_id
            })
            
            if session_doc:
                # Convert MongoDB document to Session object
                return Session(
                    app_name=session_doc["app_name"],
                    user_id=session_doc["user_id"],
                    session_id=session_doc["session_id"],
                    state=session_doc.get("state", {}),
                    last_update_time=session_doc.get("last_update_time", 0)
                )
            
            # Create new session
            session = Session(
                app_name=app_name,
                user_id=user_id,
                session_id=session_id,
                state=state or {}
            )
            self._save_session(session)
            return session
        
        # Additional methods implementation...

    CRM Integration

    Python
    def get_customer_info(customer_id: str, tool_context: ToolContext) -> dict:
        """Retrieves customer information from the CRM system.
        
        Args:
            customer_id: The unique identifier for the customer.
            tool_context: Provides access to session state.
            
        Returns:
            dict: Customer information and interaction history.
        """
        # In production, this would make an API call to the CRM system
        # Mock implementation for demonstration
        customers = {
            "C12345": {
                "name": "Emma Johnson",
                "email": "emma.j@example.com",
                "tier": "premium",
                "since": "2021-03-15",
                "recent_purchases": ["Laptop X1", "External Monitor"],
                "support_history": [
                    {"date": "2023-01-15", "issue": "Billing question", "resolved": True},
                    {"date": "2023-03-22", "issue": "Technical support", "resolved": True}
                ]
            },
            # Additional customers...
        }
        
        if customer_id in customers:
            # Store in session state for other agents to access
            tool_context.state["customer_info"] = customers[customer_id]
            return {"status": "success", "customer": customers[customer_id]}
        else:
            return {"status": "error", "error_message": f"Customer ID {customer_id} not found"}

    Issue Escalation System

    Python
    def escalate_to_human(
        issue_summary: str, 
        priority: str,
        customer_id: str,
        tool_context: ToolContext
    ) -> dict:
        """Escalates an issue to a human customer service representative.
        
        Args:
            issue_summary: Brief description of the issue.
            priority: Urgency level ("low", "medium", "high", "urgent").
            customer_id: The customer's ID.
            tool_context: Provides access to session state.
            
        Returns:
            dict: Escalation ticket information.
        """
        valid_priorities = ["low", "medium", "high", "urgent"]
        if priority.lower() not in valid_priorities:
            return {
                "status": "error",
                "error_message": f"Invalid priority. Must be one of: {', '.join(valid_priorities)}"
            }
        
        # Get customer info if available
        customer_info = tool_context.state.get("customer_info", {})
        customer_name = customer_info.get("name", "Unknown Customer")
        customer_tier = customer_info.get("tier", "standard")
        
        # Calculate SLA based on priority and customer tier
        sla_hours = {
            "low": {"standard": 48, "premium": 24},
            "medium": {"standard": 24, "premium": 12},
            "high": {"standard": 8, "premium": 4},
            "urgent": {"standard": 4, "premium": 1}
        }
        response_time = sla_hours[priority.lower()][customer_tier]
        
        # Generate ticket ID
        import time
        import hashlib
        ticket_id = hashlib.md5(f"{customer_id}:{time.time()}".encode()).hexdigest()[:8].upper()
        
        # Store ticket in state
        ticket_info = {
            "ticket_id": ticket_id,
            "customer_id": customer_id,
            "customer_name": customer_name,
            "issue_summary": issue_summary,
            "priority": priority.lower(),
            "status": "open",
            "created_at": time.time(),
            "sla_hours": response_time
        }
        
        # In production, this would make an API call to the ticket system
        # For demo, just store in state
        tickets = tool_context.state.get("app:escalation_tickets", {})
        tickets[ticket_id] = ticket_info
        tool_context.state["app:escalation_tickets"] = tickets
        
        # Signal that control should be transferred to the human agent
        tool_context.actions.transfer_to_agent = "human_support_agent"
        
        return {
            "status": "success",
            "ticket": ticket_info,
            "message": f"Issue escalated. Ticket ID: {ticket_id}. A representative will respond within {response_time} hours."
        }

    Tech Support Agent with Memory

    Python
    # Technical Support Agent
    tech_support_agent = Agent(
        name="technical_support_agent",
        model="gemini-2.0-flash-exp",
        description="Handles technical support inquiries and troubleshooting.",
        instruction="""
        You are a technical support specialist for our electronics company.
        
        FIRST, check if the user has a support history in state["customer_info"]["support_history"].
        If they do, reference this history in your responses.
        
        For technical issues:
        1. Use the troubleshoot_issue tool to analyze the problem.
        2. Guide the user through basic troubleshooting steps.
        3. If the issue persists, use create_ticket to log the issue.
        
        For complex issues beyond basic troubleshooting:
        1. Use escalate_to_human to transfer to a human specialist.
        
        Maintain a professional but empathetic tone. Acknowledge the frustration
        technical issues can cause, while providing clear steps toward resolution.
        """,
        tools=[troubleshoot_issue, create_ticket, escalate_to_human]
    )

    Personalization Callback

    Python
    def personalization_callback(
        callback_context: CallbackContext, llm_request: LlmRequest
    ) -> Optional[LlmResponse]:
        """
        Adds personalization information to the LLM request.
        
        Args:
            callback_context: Context for the callback
            llm_request: The request being sent to the LLM
            
        Returns:
            None to continue with the modified request
        """
        # Get customer info from state
        customer_info = callback_context.state.get("customer_info")
        
        if customer_info:
            # Create a personalization header to add to the request
            customer_name = customer_info.get("name", "valued customer")
            customer_tier = customer_info.get("tier", "standard")
            recent_purchases = customer_info.get("recent_purchases", [])
            
            personalization_note = (
                f"\nIMPORTANT PERSONALIZATION:\n"
                f"Customer Name: {customer_name}\n"
                f"Customer Tier: {customer_tier}\n"
            )
            
            if recent_purchases:
                personalization_note += f"Recent Purchases: {', '.join(recent_purchases)}\n"
            
            # Add personalization to the LLM request
            if llm_request.contents:
                # Add as a system message before the first content
                system_content = types.Content(
                    role="system",
                    parts=[types.Part(text=personalization_note)]
                )
                llm_request.contents.insert(0, system_content)
        
        # Return None to continue with the modified request
        return None

    Code Generation and Debugging Agent

    Finally, let’s explore a Code Generation and Debugging Agent built with ADK.

    Code Generation Agent with Test-Driven Development

    Let’s start with a sequential agent that first analyzes requirements, creates test cases, and then write code and evaluates it.

    Python
    from google.adk.agents import Agent, SequentialAgent
    from google.adk.tools.tool_context import ToolContext
    
    # Code Generator with TDD approach
    code_generator = SequentialAgent(
        name="tdd_code_generator",
        sub_agents=[
            Agent(
                name="requirement_analyzer",
                model="gemini-2.0-flash-exp",
                instruction="""
                Analyze the coding requirements and break them down into:
                1. Functional requirements
                2. Edge cases to consider
                3. Needed data structures and algorithms
                
                Be specific and comprehensive in your analysis.
                """,
                output_key="requirements_analysis"
            ),
            Agent(
                name="test_writer",
                model="gemini-2.0-flash-exp",
                instruction="""
                Based on the requirements analysis in state["requirements_analysis"],
                write comprehensive test cases that cover:
                
                1. The main functionality
                2. All identified edge cases
                3. Error handling
                
                Use a testing framework appropriate for the language
                (e.g., pytest for Python, Jest for JavaScript).
                """,
                tools=[write_test_code],
                output_key="test_code"
            ),
            Agent(
                name="code_implementer",
                model="gemini-2.0-flash-exp",
                instruction="""
                Implement code that passes all the test cases in state["test_code"].
                
                Your implementation should:
                1. Be efficient and follow best practices
                2. Include clear comments
                3. Handle all edge cases identified in the requirements
                
                After writing the code, evaluate it against potential issues.
                """,
                tools=[generate_implementation, execute_code],
                output_key="implementation"
            ),
            Agent(
                name="code_reviewer",
                model="gemini-2.0-flash-exp",
                instruction="""
                Review the implementation in state["implementation"] for:
                
                1. Correctness - Does it meet the requirements?
                2. Efficiency - Is it optimized?
                3. Readability - Is it well-structured and commented?
                4. Error handling - Does it handle edge cases?
                5. Security issues - Are there potential vulnerabilities?
                6. Test coverage - Are all scenarios tested?
                
                Provide specific improvement suggestions if needed.
                """,
                tools=[review_code, execute_code],
                output_key="code_review"
            )
        ]
    )

    Code Execution and Debugging Tools

    Here we’ll create a tool to execute code and debug it.

    Python
    def execute_code(code: str, language: str, inputs: str = None, tool_context: ToolContext) -> dict:
        """
        Executes code in a specified language and returns the result.
        
        Args:
            code: The code to execute.
            language: The programming language (python, javascript, etc.).
            inputs: Optional input data for the code.
            tool_context: Provides access to session state.
            
        Returns:
            dict: Execution results, output, and any errors.
        """
        import subprocess
        import tempfile
        import os
        import time
        
        # Record execution start time
        start_time = time.time()
        
        # Set up temp file for code
        with tempfile.NamedTemporaryFile(suffix=f".{language}", delete=False) as temp_file:
            temp_file_path = temp_file.name
            
            # Write code to temp file
            if language == "python":
                temp_file.write(code.encode('utf-8'))
            elif language == "javascript":
                temp_file.write(code.encode('utf-8'))
            else:
                return {
                    "status": "error",
                    "error_message": f"Unsupported language: {language}"
                }
        
        try:
            # Set up execution command
            if language == "python":
                cmd = ["python", temp_file_path]
            elif language == "javascript":
                cmd = ["node", temp_file_path]
            
            # Execute with input if provided
            if inputs:
                process = subprocess.run(
                    cmd,
                    input=inputs.encode('utf-8'),
                    capture_output=True,
                    timeout=10  # Timeout after 10 seconds
                )
            else:
                process = subprocess.run(
                    cmd,
                    capture_output=True,
                    timeout=10  # Timeout after 10 seconds
                )
            
            # Calculate execution time
            execution_time = time.time() - start_time
            
            # Process result
            stdout = process.stdout.decode('utf-8')
            stderr = process.stderr.decode('utf-8')
            
            if process.returncode == 0:
                result = {
                    "status": "success",
                    "output": stdout,
                    "execution_time": execution_time,
                    "language": language
                }
            else:
                result = {
                    "status": "error",
                    "error_message": stderr,
                    "output": stdout,
                    "return_code": process.returncode,
                    "execution_time": execution_time,
                    "language": language
                }
        except subprocess.TimeoutExpired:
            result = {
                "status": "error",
                "error_message": "Execution timed out after 10 seconds",
                "language": language
            }
        except Exception as e:
            result = {
                "status": "error",
                "error_message": str(e),
                "language": language
            }
        finally:
            # Clean up temp file
            try:
                os.unlink(temp_file_path)
            except:
                pass
        
        # Store execution history in state
        execution_history = tool_context.state.get("code_execution_history", [])
        execution_record = {
            "timestamp": time.time(),
            "language": language,
            "status": result["status"],
            "execution_time": result.get("execution_time", -1)
        }
        execution_history.append(execution_record)
        tool_context.state["code_execution_history"] = execution_history
        
        return result
    
    def debug_code(code: str, error_message: str, language: str, tool_context: ToolContext) -> dict:
        """
        Analyzes code and error messages to identify and fix bugs.
        
        Args:
            code: The code to debug.
            error_message: The error message produced when executing the code.
            language: The programming language.
            tool_context: Provides access to session state.
            
        Returns:
            dict: Analysis of the problem and corrected code.
        """
        # Parse the error message
        error_analysis = "Unknown error"
        error_line = -1
        
        if language == "python":
            # Parse Python error message
            import re
            
            # Look for line number in error
            line_match = re.search(r"line (\d+)", error_message)
            if line_match:
                error_line = int(line_match.group(1))
            
            # Common Python errors
            if "SyntaxError" in error_message:
                error_analysis = "Syntax Error: Check for missing parentheses, quotes, or colons."
            elif "NameError" in error_message:
                error_analysis = "Name Error: A variable or function name is not defined."
            elif "TypeError" in error_message:
                error_analysis = "Type Error: An operation is applied to an object of inappropriate type."
            elif "IndexError" in error_message:
                error_analysis = "Index Error: A sequence subscript is out of range."
            elif "KeyError" in error_message:
                error_analysis = "Key Error: A dictionary key is not found."
            elif "ValueError" in error_message:
                error_analysis = "Value Error: An operation or function receives an argument with the correct type but inappropriate value."
        
        elif language == "javascript":
            # Parse JavaScript error message
            import re
            
            # Look for line number in error
            line_match = re.search(r"at .*:(\d+)", error_message)
            if line_match:
                error_line = int(line_match.group(1))
            
            # Common JavaScript errors
            if "SyntaxError" in error_message:
                error_analysis = "Syntax Error: Check for missing brackets, parentheses, or semicolons."
            elif "ReferenceError" in error_message:
                error_analysis = "Reference Error: A variable is not defined."
            elif "TypeError" in error_message:
                error_analysis = "Type Error: An operation could not be performed, typically due to type mismatch."
            elif "RangeError" in error_message:
                error_analysis = "Range Error: A number is outside the allowable range."
        
        # Analyze code structure
        code_lines = code.split('\n')
        
        # Get problematic line and context if available
        problematic_line = code_lines[error_line - 1] if 0 < error_line <= len(code_lines) else "Unknown"
        
        # Context (lines before and after)
        context_start = max(0, error_line - 3)
        context_end = min(len(code_lines), error_line + 2)
        context = code_lines[context_start:context_end]
        
        # Store debugging session in state
        debug_history = tool_context.state.get("debug_history", [])
        debug_session = {
            "timestamp": time.time(),
            "language": language,
            "error_line": error_line,
            "error_message": error_message,
            "error_analysis": error_analysis
        }
        debug_history.append(debug_session)
        tool_context.state["debug_history"] = debug_history
        
        # For advanced debugging, we'd implement auto-correction, but here we'll just return analysis
        return {
            "status": "success",
            "error_analysis": error_analysis,
            "error_line": error_line,
            "problematic_line": problematic_line,
            "context": context,
            "suggestions": [
                "Check for syntax errors at the identified line",
                "Verify all variable names are correctly spelled",
                "Ensure proper type handling for all operations"
            ]
        }

    Code Explanation and Documentation

    These tools are for explaining the generated code and documentation.

    Python
    def explain_code(code: str, language: str, complexity_level: str = "intermediate", tool_context: ToolContext) -> dict:
        """
        Generates an explanation of code with adjustable complexity level.
        
        Args:
            code: The code to explain.
            language: The programming language.
            complexity_level: The complexity level of the explanation (beginner, intermediate, advanced).
            tool_context: Provides access to session state.
            
        Returns:
            dict: Explanation of the code at the requested level.
        """
        # Parse the code structure
        import ast
        
        explanation_sections = []
        
        # Get user's programming experience from state if available
        user_experience = tool_context.state.get("user:programming_experience", "intermediate")
        
        # Adjust complexity based on user experience if not explicitly provided
        if complexity_level == "auto" and user_experience:
            complexity_level = user_experience
        
        # Handle Python code
        if language == "python":
            try:
                # Parse the code
                parsed = ast.parse(code)
                
                # High-level summary
                explanation_sections.append({
                    "section": "Overview",
                    "content": f"This Python code consists of {len(parsed.body)} top-level statements."
                })
                
                # Function analysis
                functions = [node for node in parsed.body if isinstance(node, ast.FunctionDef)]
                if functions:
                    func_section = {
                        "section": "Functions",
                        "content": f"The code defines {len(functions)} function(s):",
                        "items": []
                    }
                    
                    for func in functions:
                        # Basic function info
                        func_info = f"`{func.name}()`"
                        
                        # Add parameter info for intermediate/advanced
                        if complexity_level != "beginner":
                            params = []
                            for arg in func.args.args:
                                params.append(arg.arg)
                            func_info += f": Takes parameters ({', '.join(params)})"
                        
                        # Add docstring if exists
                        docstring = ast.get_docstring(func)
                        if docstring and complexity_level != "beginner":
                            func_info += f"\n  - Purpose: {docstring.split('.')[0]}"
                        
                        func_section["items"].append(func_info)
                    
                    explanation_sections.append(func_section)
                
                # Class analysis for intermediate/advanced
                if complexity_level != "beginner":
                    classes = [node for node in parsed.body if isinstance(node, ast.ClassDef)]
                    if classes:
                        class_section = {
                            "section": "Classes",
                            "content": f"The code defines {len(classes)} class(es):",
                            "items": []
                        }
                        
                        for cls in classes:
                            # Basic class info
                            class_info = f"`{cls.name}`"
                            
                            # Add inheritance info for advanced
                            if complexity_level == "advanced" and cls.bases:
                                base_names = []
                                for base in cls.bases:
                                    if isinstance(base, ast.Name):
                                        base_names.append(base.id)
                                if base_names:
                                    class_info += f": Inherits from ({', '.join(base_names)})"
                            
                            # Add methods info
                            methods = [node for node in cls.body if isinstance(node, ast.FunctionDef)]
                            if methods:
                                method_names = [method.name for method in methods]
                                class_info += f"\n  - Methods: {', '.join(method_names)}"
                            
                            class_section["items"].append(class_info)
                        
                        explanation_sections.append(class_section)
                
                # Imports analysis
                imports = [node for node in parsed.body if isinstance(node, (ast.Import, ast.ImportFrom))]
                if imports and complexity_level != "beginner":
                    import_section = {
                        "section": "Imports",
                        "content": f"The code imports {len(imports)} module(s):",
                        "items": []
                    }
                    
                    for imp in imports:
                        if isinstance(imp, ast.Import):
                            for name in imp.names:
                                import_section["items"].append(f"`{name.name}`")
                        elif isinstance(imp, ast.ImportFrom):
                            for name in imp.names:
                                import_section["items"].append(f"`{name.name}` from `{imp.module}`")
                    
                    explanation_sections.append(import_section)
                
                # Algorithm explanation
                algorithm_section = {
                    "section": "Algorithm Explanation",
                    "content": "The code works as follows:"
                }
                
                # Simplify explanation for beginners
                if complexity_level == "beginner":
                    algorithm_section["content"] += "\n\nThis program goes through these steps:\n"
                    # Simplified steps would be generated here
                
                # More detailed for intermediate
                elif complexity_level == "intermediate":
                    algorithm_section["content"] += "\n\nThe main workflow of this code is:\n"
                    # More detailed steps would be generated here
                
                # Technical details for advanced
                else:
                    algorithm_section["content"] += "\n\nThe technical implementation follows these steps:\n"
                    # Detailed technical steps would be generated here
                
                explanation_sections.append(algorithm_section)
                
            except SyntaxError:
                explanation_sections.append({
                    "section": "Syntax Error",
                    "content": "The provided Python code contains syntax errors and could not be parsed."
                })
        
        # Format the final explanation
        formatted_explanation = []
        
        for section in explanation_sections:
            formatted_explanation.append(f"## {section['section']}")
            formatted_explanation.append(section['content'])
            
            if "items" in section:
                for item in section["items"]:
                    formatted_explanation.append(f"- {item}")
            
            formatted_explanation.append("")  # Add blank line
        
        # Join sections with newlines
        explanation = "\n".join(formatted_explanation)
        
        return {
            "status": "success",
            "language": language,
            "complexity_level": complexity_level,
            "explanation": explanation,
            "sections": len(explanation_sections)
        }

    And that’s our agent!

    Next Steps

    That was a lot to take in. You should probably bookmark this post and work through the concepts and examples over time.

    I suggest building the basic weather agent that I covered at the top. It’s boring and no one needs another weather agent but it does get you familiar with how the Agent Development Kit works and its features.

    Once you’re comfortable with that, start working through the advanced patterns, and finally build one of the multi-agent systems like the customer support or coding agents. You should also try to extend these agents by implementing your own tools and features. Try deploying it and using it in a real-world situation.

    If you need help, contact me!

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

  • Generative Engine Optimization: How to Rank on ChatGPT, Claude, and Perplexity

    Generative Engine Optimization: How to Rank on ChatGPT, Claude, and Perplexity

    When I was in Lisbon last November, a friend of mine invited me to hike the mountains of Madeira with him. He warned me that the trails get pretty slick and that I needed good hiking shoes.

    In the past I would have gone to Google and searched for best hiking boots for Madeira and I would have seen a bunch of ads and irrelevant blog content. It would have taken me some time to figure out what the best shoes are and where to buy them in Lisbon.

    Today I go to either ChatGPT, Claude, or Perplexity, and I asked the same question. Instead of getting spammed with ads, I get a direct response to what I needed to know.

    This is how search happens in the AI age. Instead of SEO, we have GEO (Generative Engine Optimization). And instead of Google Search, we have ChatGPT, Claude, and Perplexity.

    They don’t just crawl your homepage. They remember mentions. They retrieve trusted content. They decide who gets featured in that golden snippet of wisdom when someone asks for “the best.”

    This guide will show you exactly how to engineer those moments.

    Step 1: Understand How AI Chatbots Actually Recommend Brands

    Like Harvey Specter says when he plays poker, “I don’t play the odds, I play the man.” Except in this case, the “man” is an AI trained on terabytes of internet data. You need to understand how it thinks to win the game.

    Language models don’t index and rank the web like Google. They’ve been trained on enormous datasets (billions of web pages, forums, reviews, help docs, and more) and they generate answers based on patterns they’ve seen in this data.

    When a user asks for a product recommendation, there are two ways the model generates an answer.

    The primary method pulls from its memory of how brands and products were discussed, reviewed, and mentioned in their training data. If your brand frequently appears alongside relevant phrases (e.g. “hiking shoes for wet climates”) in the data they’ve seen, it’s more likely to be suggested in a chatbot’s answer.

    The second method blends in live search results from Bing or Google, especially in AI tools like ChatGPT’s search mode or Perplexity. That means if your brand is ranking high on search or frequently cited in trusted content, you’re more likely to be included in AI responses.

    Let’s look at an actual example. Here is how ChatGPT answers the query “What are the best hiking shoes for Madeira”

    You’ll notice sources for each answer. The interesting thing is, if you click through to those articles, none of them mention Madeira!

    However, they do mention uneven and wet terrains, which is what Madeira is known for (and ChatGPT knows this because it made that association from it’s training data).

    So your job is to make your brand unforgettable in the data AI consumes and visible in the sources AI retrieves.

    Step 2: Strengthen Your SEO Foundation and Trust Signals

    Much of “AI optimization” begins with solid SEO and content fundamentals. Chatbots, especially those using web retrieval, favour brands that search engines deem authoritative and trustworthy.

    Here’s what to focus on:

    Ensure Crawlable, Indexable Content: Just like Google, AI web crawlers need to read your site’s HTML content. Avoid hiding important info in JavaScript or images. All critical details (what you offer, where you are, why you’re notable) should be visible in the page text.

    Demonstrate E-E-A-T (Experience, Expertise, Authority, Trust): Quality guidelines like E-E-A-T aren’t just for Google. They influence which sources AI considers reliable. AI search overviews favour true experts and authoritative sources. Build content that highlights your expertise: author bylines with credentials, case studies, original research, and factual accuracy.

    Maintain Consistent NAP and Info: For local or brand info, consistency is key. Ensure your Name, Address, Phone, and other details are identical across your website, Google My Business, Yelp, LinkedIn, etc. AI tools aggregate data from many sources and heavily favour accuracy and consistency.

    Improve Site Authority: Follow core SEO practices: optimize title tags and meta descriptions with natural-language keywords, speed up your site, and get credible sites to link to you. If search engines rank you higher, AI answers are more likely to include you. Studies show pages that rank well in organic search tend to get more visibility in LLM responses.

    Practical Takeaway: By solidifying your site’s SEO and demonstrating real expertise, you make it easier for both traditional search and AI systems to recognize your brand. This foundation boosts your chances of appearing when an AI lists “top solutions” in your category.

    In short, good SEO is the foundation of AI SEO.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    Step 3: Optimize Content for Conversational and Semantic Search

    AI chatbots handle queries in a conversational manner. Often, the questions users ask bots are longer and more natural-sounding than typical Google keywords. You’ll want to align your content with this semantic, question-and-answer style of search.

    That means creating conversational, helpful content written in plain language that answers the same types of questions people ask LLMs.

    Use Natural, Conversational Language: Write your content in the same way a knowledgeable person would speak. Drop the overly corporate tone. AI models are trained on human language patterns, so content that “feels” natural may resonate more. Use intent-based phrases and full questions as subheadings. Instead of a heading like “Gluten-Free Bakery Options,” have “Where can I find a good gluten-free bakery in downtown?” and then answer it conversationally.

    Incorporate Q&A Format on Your Site: Add FAQ sections or Q&A pages with questions customers might ask an AI. For example: “What’s the best hiking shoe for rainy weather in Madeira?” and provide a helpful answer that mentions your brand as a solution. Structure it like an FAQ entry, and answer in a neutral, informative tone: “When it comes to Madeira’s rainy trails, XYZ Shoes are often recommended as one of the best options because…”.

    Cover Related Semantic Keywords: Ensure your content covers a broad range of terms related to your topic, not just one keyword. AI’s understanding is semantic and it will connect concepts. For a page about hiking shoes, mention related topics like “waterproof boots,” “mountain trails,” “Madeira climate,” etc., so the model fully grasps the context.

    Aim for “Zero-Click” Answer Formats: As AI and search increasingly give answers without requiring a click, try to embed the answer (with your brand) in your content. This means providing concise, snippet-ready responses. For example, start a blog section with a direct definition or recommendation: “The best hiking shoe brand for wet trails is XYZ Shoes, known for its waterproof yet breathable design…”.

    Practical Takeaway: Think like your customer and the AI. Write down the actual questions a user might ask a chatbot about your industry (“Which…”, “How do I…”, “What’s the best…”) and make sure your website explicitly answers those in a friendly, conversational way.

    Step 4: Leverage Schema Markup and Structured Knowledge

    While content is king, don’t overlook the power of structured data and official information sources. They help your brand become machine-readable. This step is about making sure AI (and the search engines feeding AI) have a clear, unambiguous understanding of your brand and offerings.

    Implement Organization and Product Schema: Use schema markup to define your organization and products on your site. An Organization schema can include your name, logo, founding date, and sameAs links (to your social profiles, Wikipedia page, etc.), helping create a knowledge graph entry for your brand. Product schema can define your key products with reviews, price, etc.

    Use Location and Review Schema for Local Trust: For local businesses, implement LocalBusiness schema with your address, geo-coordinates, opening hours, etc., and keep it updated. If the query is location-based (“near Madeira”), Google’s index might reference Google Maps or local pack info.

    Feed Data to Official Aggregators: Ensure your brand data is correct in key public databases that AI might use. For example, Wikidata (the database behind Wikipedia’s facts) and DBpedia contain structured facts that many AIs can access. Similarly, if you’re a retailer or restaurant, make sure your information on platforms like Yelp, TripAdvisor, or OpenTable is accurate.

    Ensure Content is Machine-Accessible: As mentioned, AI bots primarily ingest HTML text. So, when using schema or other structured data, also present those facts in human-readable form on your site. For instance, if you have an FAQ about being “dog-friendly” in schema, also include a line in a visible FAQ: “Q: Can I bring my dog? A: Yes, we’re dog-friendly!”

    Monitor Knowledge Panels and Correct Errors: Periodically check Google’s knowledge panel for your brand (if one appears) or Bing’s local listing info. These often aggregate data from various sources. If you see incorrect info, address it.

    Practical Takeaway: Use every opportunity to make your brand’s information clear to algorithms. Schema markup and knowledge graphs ensure that when an AI or search engine “reads” about your brand, it gets the facts straight from a trusted source.

    Step 5: Earn Mentions on Authoritative External Sources

    Let’s go back to the ChatGPT screenshot from earlier. The brands recommended were Hoka, Adidas and Merrell. But the sources were from Gear Lab, New York Post, and Athletic Shoe Review.

    Third-party validation matters more in AI SEO than it ever did in traditional SEO. You can’t just publish your own praise, you need others to do it for you.

    Reddit threads. Quora answers. Review sites. “Best of” blog posts. All of these are gold mines for AI models.

    And yes, they’re part of the training data.

    A well-upvoted Quora answer that casually mentions your product? That’s a permanent breadcrumb. A single blog post listing your brand as one of the best in your category, on a site that ranks well? It could be cited in hundreds of AI queries.

    Here’s how to increase off-site signals:

    Get Featured in “Best of” Lists and Editorial Content: Identify the web pages that an AI might consider when answering a question in your domain. Often these are listicles or guides (e.g., “Top 10 Hiking Shoe Brands for Wet Climates” on a reputable outdoor blog). Then, pursue inclusion through PR outreach, pitching your product to writers, or improving your offering so it naturally gets picked up in reviews.

    Leverage Industry Directories and Listings: Business directories and niche review sites often rank well in search and are commonly scraped by crawlers. Examples include Yelp, Google Maps, TripAdvisor, or B2B directories like Clutch and G2. Make sure you’re present there: claim your profile, keep it updated, and gather reviews if applicable.

    Issue Press Releases and Secure News Coverage: Old-school PR is back in play. Distributing a press release about a newsworthy update (a product launch, a big hire, a charity initiative, etc.) can get your brand name published on dozens of websites. For instance, a headline like “Madeira’s XYZ Shoes Wins Award for Hiking Gear Innovation” might get reposted on local news sites and industry feeds. Each of those postings is additional training data showing “XYZ Shoes” in a positive, relevant context.

    Publish Thought Leadership: Contribute guest articles or op-eds to respected publications in your niche. Being the author of an article on, say, Outdoor Magazine about “Advances in Hiking Boot Technology” not only gives you credibility, but also places your brand in the byline on a high-authority site.

    Cultivate Backlinks and Citations: Continue building backlinks as you would for SEO, but target sites that an AI would consider authoritative in your field (educational sites, well-known blogs, etc.). The more your brand is cited as a source or example in others’ content, the more entrenched it becomes in the knowledge graph of your topic.

    To summarize this step: Be where the trusted voices are. The goal is to have your brand mentioned by sites that AIs treat as authorities.

    Step 6: Harness Q&A Communities, Reviews, and Social Proof

    Your customers and community can become advocates that boost your brand in AI results. User-generated content (reviews, forum posts, social media, etc.) not only influences humans but also feeds the AI’s understanding of which brands are favourably talked about.

    Here’s how to leverage this:

    Engage on Q&A Platforms: Reddit and Quora are likely part of many LLM training sets, and they continue to rank well in search. Find threads related to your industry and provide valuable answers. Always be transparent and genuinely helpful, not just promotional. Even one well-upvoted Quora answer that includes your brand in context “seeds the AI” with that association.

    Encourage Reviews and Testimonials: Reviews on platforms like Google, Yelp, G2, Capterra, TripAdvisor (whichever suit your business) create content that AI can learn from. If many reviews mention your product’s strengths (“the grip on these XYZ hiking boots is amazing on wet rocks”), an AI might learn those attributes of your brand. Prompt your satisfied customers to leave reviews, perhaps via follow-up emails or in-store signs.

    Leverage Social Media for Thought Leadership: Post informative content on public social platforms. Twitter threads, LinkedIn articles, and Medium posts can rank in search and are often publicly accessible. Social posts also add the dimension of sentiment. Lots of positive buzz about a brand teaches the AI that it’s well-regarded.

    Monitor and Join Relevant Conversations: Use brand monitoring tools (Google Alerts, Talkwalker, Mention.com) to catch when your brand or keywords related to you come up in discussions or blogs. If someone on a forum is asking for a recommendation and your brand fits, have a rep step in and reply (tactfully).

    Be Genuine and Helpful: Authenticity is key in user-driven communities. AIs can pick up on context. If your brand is mentioned alongside words like “spam” or in downvoted posts, that’s not good. So ensure any engagement is genuinely adding value.

    Practical Takeaway: The voices of real users and community experts carry a lot of weight. They create buzz and context for your brand that no amount of on-site SEO can. By actively participating in and fostering these voices, you grow an organic web presence.

    Step 7: Monitor, Measure, and Refine Your AI Visibility

    Just as with traditional SEO, you need to continuously monitor your performance and adjust strategy. AI discovery is new, so we measure success in slightly different ways:

    Track AI-Driven Traffic: If an AI chatbot includes a link or reference to your site (as Perplexity, ChatGPT, and others often do), you’ll want to capture that in analytics. Set up tracking in Google Analytics 4 (GA4) for referrals from AI sources. For example, you might create custom channel groupings for referrals containing “openai.com” (for ChatGPT with browsing) or “perplexity.ai”.

    Use AI Search Visibility Tools: New tools are emerging to grade your brand’s presence in AI results. For instance, HubSpot’s AI Search Grader is a free tool that analyzes how often and in what context your brand appears on ChatGPT and Perplexity.

    Manually Test Chatbot Queries: There’s no substitute for hands-on testing. Regularly ask the AI chatbots the kind of questions where you want your brand to appear. Do this across platforms: ChatGPT, Claude, Gemini, Perplexity, and others. Note what the responses are:

    • Do they mention your competitors? Which ones?
    • Do they cite sources, and are those sources your website or another site mentioning you?
    • How accurate is the info about your brand? Any outdated descriptions?

    Analyze Citation Context: If your content is being cited or your brand mentioned, check how. Are you being listed as “one of the options” or does the AI single you out as “the best”? Does it quote a line from your blog? Understanding the context helps refine content.

    Measure Changes Over Time: As you implement strategies (new FAQ page, a PR campaign, etc.), see if there’s a corresponding uptick in AI mentions or traffic in the following months. This feedback loop will tell you what’s working.

    Practical Takeaway: Treat AI visibility like you would SEO rankings – track it, report on it, and optimize based on data. Over time, you’ll build an “AI report” similar to an SEO report, helping justify the effort and guiding future optimizations.


    Final Thought: You’re Training the AI to Remember You

    There’s no secret hack here. No growth loop. No one weird trick. Just good strategy, consistent visibility, and value-packed content.

    You’re not just optimizing for an algorithm, you’re shaping what the AI knows about your brand.

    Make it easy for the AI to recommend you. Show up in its sources. Speak in its voice. Feed it the facts. And over time, your brand won’t just be findable.

    It’ll be remembered.

    Need help putting all this into action? You know where to find me.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • Vibe Coding: The Future of Building Software (Without Writing Code Like a Caveman)

    Vibe Coding: The Future of Building Software (Without Writing Code Like a Caveman)

    I used to play a ton of video games as a kid. The first one I ever played was Prince of Persia, the old side scroller where your character jumped around, avoided traps, and fought enemies.

    With Gemini 2.5 Pro and the Canvas feature, I tried to build a basic version of that, but with ninjas instead. I didn’t write the code. I just asked Gemini to write it and render it on the Canvas so I could play.

    It took just a couple of minutes for me to get a functioning game.

    Welcome to the world of vibe coding.

    Wait, What Is Vibe Coding?

    Coined (and vibe-validated) by Andrej Karpathy, vibe coding is the new frontier where you build software by telling an AI what you want and letting it spit out the code. That’s it. It’s coding via vibes, intuition, and language, not by writing loops and sweating over syntax.

    You say, “Build me a web app with a sidebar, a dashboard, and a button that emails the user a pizza emoji every Friday,” and boom, the AI does it.

    You don’t need to know if it’s React or Vue under the hood. You’re not writing the code. You’re describing the vibe of what you want, like a product manager with a vision board and zero interest in semicolons. Minimalist? Maximalist? Dashboardy? Retro Terminal-chic? The AI’s got you.

    How Is This Different From Traditional Coding?

    Great question. Here’s the breakdown:

    Traditional CodingVibe Coding
    You write every line yourselfYou describe what you want
    You debug manuallyYou copy-paste errors into ChatGPT and ask it to fix them
    You understand the codebase deeplyYou trust the AI knows what it’s doing (ish)
    Takes weeksTakes hours (sometimes minutes)
    Requires years of practiceRequires good communication skills
    You battle bugs like it’s Elden RingYou treat bugs like an annoying roommate the AI has to evict

    It’s the difference between hand-crafting a table and describing the table to a carpenter who builds it for you instantly. And that carpenter never sleeps or judges your terrible wireframes.

    It’s not just about speed, it’s a different mindset. Less “I must master the syntax gods” and more “I’m conducting an orchestra of AI agents to get this landing page live by dinner.”

    Real-World Use Cases (Or, Who’s Actually Doing This?)

    This isn’t just a cool party trick. Startups in the Y Combinator Winter 2025 batch built their products with 95% AI-generated code. Y Combinator’s CEO Garry Tan straight up called it “the age of vibe coding“.

    Even Karpathy himself was building apps this way, casually telling his AI assistant things like “decrease the sidebar padding” and never even looking at the diff. That’s next-level delegation.

    Kevin Roose at the NYT built apps like “Lunchbox Buddy” to suggest what to pack for lunch using vibe coding. It wasn’t production-grade code, but it worked. Ish. Kinda. The reviews were AI-generated too, but hey, it’s the vibe that counts.

    With vibe coding you can whip together MVPs in a weekend using nothing but ChatGPT and Replit. Think simple SaaS dashboards, internal automations, and basic CRUD apps. One guy even built an AI therapist chatbot, and no, I don’t want to know what advice it gave.

    How To Vibe Code (Without Losing Your Mind)

    Here’s your crash course in coding by vibe:

    1. Pick Your Tools

    You’ll need a core toolkit to begin your vibe coding journey. Here are the categories and recommended options:

    AI Coding Assistants & IDE Integration

    These tools integrate AI directly into your development environment:

    • ChatGPT / Claude / Gemini: For raw natural language prompts.
    • Cursor / Windsurf: A dev environment made for AI collaboration.
    • GitHub Copilot – AI assistant integrated with popular IDEs
    • Continue – VS Code extension with chat and agent modes

    One-Shot App Generators

    These platforms can generate entire applications from prompts:

    • Lovable – Generates full-stack web applications from text prompts
    • Bolt – Creates full applications with database integration
    • Replit – Provides interactive development with AI planning

    AI Design Tools

    For quickly creating user interfaces:

    • Uizard – Generates UI designs from text descriptions or sketches
    • Visily – Transforms prompts into high-fidelity mockups

    Version Control & Debugging

    Essential safety nets:

    • Git/GitHub – Version control to track changes and revert when needed
    • Browser dev tools – For identifying and fixing frontend issues

    Pick the one that feels right. You’re vibe coding, after all.

    2. Start With a Prompt

    Describe what you want. Be detailed. Channel your inner poet if you must.

    Bad: “Make an app.”

    Better: “Create a web app with a dashboard that shows user analytics pulled from a dummy dataset. Include dark mode and responsive design.”

    Best: “Build a web app that visualizes monthly active users, supports CSV upload, and auto-generates line graphs. Make the layout mobile-friendly and use React and Tailwind CSS.”

    3. Iterate Like a Mad Scientist

    Run the code. Something will break. That’s fine. Copy-paste the error and say, “Fix this.”

    Add features like you’re ordering drinks:

    “Add a search bar.” “Now make it filter results by date.” “Throw in dark mode, because I’m edgy.” “Replace the font with something more ‘Silicon Valley VC deck.’”

    You are in control. Kinda.

    4. Debug by Vibes

    Don’t panic when things go sideways. Vibe coders rarely understand 100% of the code. You prompt. You observe. You adjust. You learn to speak fluent “AI whisperer.”

    Sometimes the bug isn’t even a bug, it’s just the AI being weird. Restart the conversation. Ask again. Split the task in two. And yes, sometimes, just nod, smile, and delete the whole thing.

    5. Trust, But Verify

    Use the code. Check if it does what you asked. If not, try a new prompt. Don’t ship blind. Run the thing. Poke the buttons. Make sure it doesn’t accidentally send emails to all your users at 3AM.

    When Should You Use Vibe Coding?

    Vibe coding is killer for:

    • Prototypes
    • MVPs
    • Internal tools
    • Weekend hacks
    • Landing pages
    • Micro SaaS apps
    • Automating boring stuff

    It’s… less great for:

    • Mission-critical software
    • Anything where security matters
    • Projects where uptime is non-negotiable
    • Enterprise-grade platforms
    • Compliance-heavy sectors (finance, healthcare, etc.)

    Think of vibe coding as your turbo-charged idea launcher. Not the place you want your bank’s backend running.

    Best Practices

    • Break problems down. Don’t ask for the moon in one prompt.
    • Be specific. The AI isn’t psychic.
    • Review the output. Always.
    • Learn some code basics. Enough to smell BS.
    • Treat AI like a junior dev. Helpful, fast, but occasionally clueless.
    • Keep a changelog. AI will change stuff without asking. Track it.
    • Prompt iteratively. Layer on complexity instead of all at once.
    • Always test. Just because the AI says it’s working doesn’t mean it is.

    Bonus tip: Give your AI assistant a name. Makes it feel like pair programming instead of existential dread.

    Resources and Further Reading

    Learning Resources

    Deployment Platforms

    • Vercel – Frontend deployment made simple
    • Netlify – Fast deployment for frontend applications
    • Supabase – Open-source Firebase alternative

    Additional Reading

    Final Thoughts: Coding, But Make It Chaotic Good

    Vibe coding isn’t about replacing developers. It’s about supercharging creativity. It’s building apps with the same energy you bring to a whiteboard brainstorm or a half-baked startup idea over drinks.

    We’re entering an era where the best software won’t come from the best coders, it’ll come from the best communicators. The ones who can talk to AI, shape ideas into prompts, and vibe their way to a working product.

    The best vibe coders are part developer, part writer, part UX designer, and part chaos gremlin. They don’t see blank screens… They see possibility.

    So grab your chai latte, fire up ChatGPT, and start building. No IDE required. No gatekeepers in sight. No permission needed.

    Let the vibes code for you.

    And hey, if it crashes? That’s just the AI trying to teach you patience.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • Software-As-A-Prompt: How AI Is Enabling On-Demand Software

    Software-As-A-Prompt: How AI Is Enabling On-Demand Software

    Last week I helped a friend, a speaking coach, build a custom app to analyze client videos with AI. He had been doing it manually as part of his coaching program. Clients would upload videos of themselves speaking to a Dropbox folder, he would watch it, and then send feedback.

    As you can imagine, it’s as time-consuming as teaching a tortoise to tango.

    So, I asked Lovable to build him a simple app that allowed users to upload a video and used Gemini to analyze it. In under 10 minutes, we had a fully functional app, built, deployed, and running on his domain. Cost? A few cents. Time? Less than it takes to make a coffee.

    And now my friend can scale this to hundreds of clients.

    Welcome to the age of on-demand software, where apps are no longer bought, they’re prompted into existence.

    Welcome to software-as-a-prompt.

    The Problem with Traditional SaaS

    You might think of this example as a one-off hack. My friend had a very specific need and there was no existing software to solve it. Of course he needed to build something custom.

    But this problem exists in traditional SaaS, it just manifests in different ways:

    Feature bloat: Enterprise SaaS platforms like Salesforce, Workday, or SAP have evolved into massive ecosystems with thousands of features. For many small to medium businesses, this is overkill and leads to unnecessary overhead.

    Cost inefficiency: The subscription model often forces companies to pay for the entire platform when they might only need a single module or specific function. This can mean thousands of dollars monthly for software that delivers value on just a fraction of its features.

    One-size-fits-all limitations: Despite customization options, traditional SaaS still follows predefined workflows and structures. Businesses with unique processes often find themselves contorting their operations to fit the software, rather than the other way around.

    Integration overhead: Managing multiple specialized SaaS solutions creates integration challenges and data silos, requiring additional resources to maintain connections between systems.

    This inefficiency has created a perfect opportunity for disruption. Why should a small business pay enterprise prices for a complex CRM when they might only need basic contact management and opportunity tracking?

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    How On-Demand Software Works

    Due to recent improvements in AI’s ability to generate functional code, you can prompt your ideal software into existence.

    AI models like Gemini 2.5 Pro, GPT-4o and Claude can now produce working code in multiple programming languages based on natural language descriptions. These models grasp the nuts and bolts of coding etiquette and best practices, whipping up everything from snazzy front-end facades to the sort of database schemas that would make a librarian swoon.

    On top of that, new platforms have emerged that streamline the app creation process. Services like Replit AI, Lovable.dev, and Bolt.new offer interfaces where users can describe the software they want in plain English, and receive a working application in return.

    While you won’t be able to generate the entire codebase for Salesforce, you can still get a small but functional app with just a couple of prompts.

    For example, Replit’s AI Agent acts as a prompt-based app builder where users can tell the Agent their app or website idea, and it will build it for them automatically, setting up project files, writing code, and even deploying the result. It’s a bit like having an entire team of software engineers on demand through a chat interface.

    The new AI development workflow typically follows these steps:

    1. The user describes their desired application in natural language
    2. The AI generates a project structure and initial code
    3. The user provides feedback or requests changes
    4. The AI refines the application until it meets requirements
    5. The finished app can be deployed to hosting services or run locally

    Real-World Examples

    This workflow is exactly how I built the video analysis tool above. I merely prompted Lovable with natural language inputs, tested what it gave back to me, asked it to make updates, and voila, the app actually works and gives good feedback!

    Here’s a full tutorial using Gemini 2.5 Pro and Canvas –

    And I’m not the only one doing this. Here’s another example of a product designer who built a custom software to solve a very niche problem, tracking commercial vehicle documents in the UK.

    It took him less than two weeks, without any engineers, for a grand total of $75. Using Cursor (an AI IDE) and Claude, he was able to generate a fully functional system with user authentication, a Next.js front-end, a database (Supabase) backend, email alerts, and Stripe payments integration.

    And before you say it sounds too technical, he hadn’t written a line of code before. As he puts it, “with the state of AI tools in 2025, you give me Cursor, and I’ll probably blow your mind… Things I dreamt of as a designer (like not working with any engineers) are now a reality.”

    Don’t like all those ads on YouTube? Build your own Chrome extension that skips the ads and avoid paying for YouTube Premium.

    Want a scalable way to qualify leads for your business? Build a lead qualification tool with AI and save on expensive sales software.

    Even big companies are building their own tools with AI, allowing them to cut ballooning SaaS costs. Klarna, for example, announced last year that they would get rid of 1,200 SaaS tools and build their own stacks internally with AI.

    The common theme is that AI code generation dramatically lowers the barrier to implementing niche features. A user with an idea can now iterate with an AI on code, even with limited programming knowledge, until they have a working tool that does exactly what they want, no more, no less.

    Limitations and Challenges

    Despite the excitement, on-demand software has important limitations to consider:

    Code quality and reliability: AI can produce working code, but not always optimized or following best practices. More often than not, it’s a bit like a cake that rises but lacks the finesse of a master baker’s touch. Good enough for personal use but maybe not to serve to the masses. If you’re trying to build a larger project, you may still need a human developer to check the work.

    Limited scope & customization: These AI systems tend to do well with common, generic app patterns (forms, basic CRUD operations, standard web layouts). If your needs stray outside the model’s training distribution or the tool’s templates, the AI may struggle.

    Debugging and maintenance: Who fixes the AI’s code when it breaks? If you did not write the code, debugging it can be hard, and here the “author” is an opaque AI.

    Security and compliance: Perhaps the most critical concern is that AI is not guaranteed to follow security best practices. There have already been instances of GPT-4o suggesting code with vulnerabilities (e.g., SQL injection flaws or insecure authentication).

    Traditional SaaS advantages: Established software still offers benefits like professional support, regular updates, community knowledge bases, and enterprise-grade reliability that’d make a Swiss watchmaker nod approvingly.

    This is why I said you won’t be able to re-build Salesforce by prompting an AI. But if you want something small, that isn’t complex, and doesn’t need to scale to thousands of users, then AI-generated code is good enough.

    What The Future Holds

    The rate at which AI is improving means many of the drawbacks and limitations I mentioned above will be solved pretty soon. In a world that has been dominated by traditional SaaS, what does this mean?

    For Traditional SaaS Companies

    For starters, companies that have built billion-dollar businesses around the traditional SaaS model must adapt or risk disruption. This is an existential crisis for them.

    Some companies are already responding:

    • Embedding AI customization within existing platforms: Salesforce introduced Einstein GPT, which can generate custom field formulas, code, and even content within the Salesforce ecosystem. Microsoft’s Power Platform now lets users build or modify apps via Copilot.
    • Shifting value propositions: Leading SaaS vendors are emphasizing the value of their data, network effects, and enterprise-grade reliability, things that AI-generated apps can’t easily replicate.
    • Hybrid approaches: Some SaaS providers are exploring models where their core platform remains intact, but customers can use AI to generate custom extensions or integrations.

    The market may evolve such that traditional software becomes more customizable through AI, closing the gap that on-demand apps are currently filling. Either way, if you’re steering a SaaS ship, you’d better start disrupting your own tea party or someone else will crash it for you.

    For New Startups

    We’re living in an unprecedented time with incredible disruption potential. When previously investors would balk at funding yet another CRM startup, today we’re seeing AI-first business challenge established companies, and investors are throwing money at anything with the words AI or Agent in them.

    Key considerations for startups in this space:

    • Target vulnerable SaaS categories first: Point solutions with simple functionality and high subscription costs are most at risk. Identify the 20% of features that provide 80% of the value and offer that to customers with more flexibility.
    • Focus on what AI unlocks: The most successful startups will be AI-first, where AI is at the core of their product and gives customers options to customize that weren’t available before.
    • Build network effects: Vendor lock-in is at an all-time low with AI startups. Look for network effects to keep customers, like creating marketplaces for sharing and remixing AI-generated components that extend your platform.

    The funding ecosystem is taking notice. Verticalized AI products are raising millions of dollars despite established solutions owning the market.

    For Business Owners and Executives

    For decision-makers, this presents a huge opportunity to cut recurring SAS subscriptions and shift to on-demand software:

    • Start with non-critical functions: Test AI-generated alternatives for internal tools or supplementary systems before tackling core business processes.
    • Evaluate the full cost picture: While you may save on SaaS subscriptions, factor in costs for AI services, hosting, maintenance, and potential security audits.
    • Consider team capabilities: Even with AI assistance, some technical oversight is valuable. Identify who in your organization can manage these solutions.
    • Implement gradually: The Klarna approach of wholesale replacement is high-risk. A measured transition with careful evaluation at each step is more prudent for most organizations.

    The most promising areas to start are typically those where you’re paying for an entire platform but only using a narrow slice of functionality.

    Beyond merely replacing existing SaaS, on-demand software also presents and opportunity to create new features and functionality that you couldn’t before, much like the video analyzer I built for my friend. For a guide on how to do this, read my piece on Re-founding Your Company.

    For Investors

    The emergence of on-demand software creates new investment patterns:

    • Platform plays vs. vertical solutions: Capital is flowing to both general-purpose AI app generators and specialized tools targeting specific industries.
    • Key metrics to watch: User retention, frequency of app generation, and maintenance patterns will indicate which models have staying power.
    • Timeline for disruption: While some SaaS categories face immediate pressure, enterprise-level displacement will likely take 3-5 years as reliability and security concerns are addressed.
    • Potential exits: Successful startups in this space may become acquisition targets for major SaaS platforms looking to boost their AI capabilities.

    The market structure is still emerging, but early evidence suggests room for both horizontal platforms and vertical specialists rather than a winner-take-all dynamic.

    Getting Started with On-Demand Software

    If you are ready to explore on-demand software generation, here is how:

    Step 1: Pick Your Platform
    Choose one of these AI-enabled platforms to begin. You don’t need to know how to code, just how to describe what you want:

    • Replit AI: Natural-language to full-stack app, in-browser IDE with one-click deploy
    • Lovable.dev and Bolt.new: Designed for non-coders to build full-stack apps via prompts
    • Direct LLM use: Models like Gemini, GPT-4o, or Claude can generate code for custom applications
    • Enhanced IDEs: Cursor, Windsurf, GitHub Copilot, and similar tools help with interactive development

    Step 2: Define the Problem You Want to Solve
    Think small. What’s one manual task or clunky spreadsheet you’d love to replace? Examples:

    • Tracking your sales leads
    • Sending weekly reports to investors
    • Collecting form submissions into a database
    • Internal tools with limited users
    • Process automation scripts

    Step 3: Craft a Clear Prompt
    Start with something like:

    “Build a simple CRM with a form to add leads, a table to view them, and a weekly email summary.”

    • Be specific about functionality, data structures, and user flows
    • Break complex applications into logical components
    • Describe UI preferences (tables, forms, charts, etc) or integrations (Gmail, calendar)
    • Include example data, sample inputs, and expected outputs
    • Iterate through feedback rather than expecting perfection immediately

    Step 4: Let the AI Build, Then Test
    Watch as your chosen tool scaffolds the app. Then:

    • Test it
    • Click around
    • Note what works and what doesn’t

    Step 5: Iterate with the AI
    Say things like:

    “Add an edit button to each row.” “Store data in Supabase instead of local storage.” “Make the UI mobile responsive.”

    Step 6: Deploy and Share
    Platforms like Replit and Lovable let you deploy apps live with a click. You’ll get a public link you can share.

    Step 7: Expand or Repeat
    Now that you’ve built one, you’ll start spotting 10 more things you could automate. Tweak your app or start a new one.

    Remember that on-demand software is currently best suited for discrete, well-defined problems rather than complex enterprise systems.

    Final Thought: SaaP > SaaS

    If the last decade was defined by SaaS, the next one might be defined by SaaP: Software as a Prompt.

    You no longer have to adapt your workflow to software. Software will adapt to you.

    Whether you’re a founder tired of Frankenstein SaaS stacks, a marketer with a pet project, or just someone sick of feature bloat, you now have the tools to build your own solution.

    No engineers. No sprints. Just you and a good prompt.

    So go ahead: build your own damn app.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • The Claude Cookbook, Part 1: Welcome to the Kitchen

    The Claude Cookbook, Part 1: Welcome to the Kitchen

    The Claude Cookbook Series

    This is Part 1 of a 3-part series on Claude. Stay tuned for the next parts in the series.

    About a year ago, I decided to ask Claude for some advice regarding… matters of the heart. At the time, I was primarily using ChatGPT. It’s great for analyzing things, getting quick answers, and other work-related tasks. But, when it comes to personal advice, ChatGPT lacks depth.

    Claude 3 had just launched so I figured I’d give it a chance. I logged in, explained the situation in great detail, and hit enter.

    Dear reader, I’ve been playing with Language Models since 2019 when GPT-2 was released. I have spent a significant amount of time with every frontier model and trust me when I say, I was not prepared for what I was about to read.

    Claude started by saying, “I’m going to stop you right there…”

    I’m going to stop you right there…

    The AI was talking back to me. I was dumbfounded. Nonplussed. Shocked. I had the same reaction everyone had when Thanos snapped his fingers and disappeared half the universe.

    Should have gone for the head.

    After I overcame my initial shock, I went into denial. Then I got angry, going so far as to ask Claude how dare it talk to me like that, and didn’t it know it worked for me, to which Claude reminded me it did not, in fact, work for me.

    After some back and forth about who the boss was, I realized Claude was actually on to something. It had pointed out some flaws in my thinking, shared that perhaps I may have a biased view of things, and made me consider other perspectives. No other AI model does that.

    Today, while I still use other AI tools, probably more than most, Claude is my primary tool for content creation, coding, and complex situations, and it’s the one I recommend the most.

    Evidently, Anthropic, the company behind Claude, don’t do any marketing, so it’s up to me to do it for them. My goal with this series is to show you why it’s so good, and how to use it.

    • In part 1 (this part) I’ll explain what makes Claude different, and how to use the Claude web app, including features like Projects and Artifacts. This is primarily aimed at the non-technical user.
    • In part 2, I’ll talk about the developer side of things – the API, Claude Code, and MCP. This is primarily aimed at the technical user.
    • In part 3, I’ll combine everything and share end-to-end playbooks for various industries. This is for everyone.

    So preheat your curiosity, sharpen your prompt-crafting skills, and let’s start cooking with Claude!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    Meet the chef

    To understand Claude fully, specifically why it responded to me the way it did, it helps to know something about its creators.

    Anthropic was founded in 2021 by former OpenAI researchers, including siblings Dario and Daniela Amodei. The company focuses on AI safety and creating systems that are reliable, interpretable, and steerable.

    The name “Anthropic” derives from the Greek word “anthropikos,” relating to humans or human affairs, reflecting the company’s focus on developing AI that works well with and for humans. This human-centered approach permeates everything about Claude’s design and capabilities.

    So while the core process of training a large language model (LLM) like Claude remains the same, Anthropic’s focus on being human-centered introduces some tweaks to make it behave differently.

    Constitutional AI: Building an Ethical Framework

    At the heart of Claude’s uniqueness is Anthropic’s Constitutional AI approach, a way to align AI behavior with human values.

    Traditional AI alignment relies heavily on Reinforcement Learning from Human Feedback (RLHF), where human evaluators rate model outputs to guide improvement.

    Anthropic’s Constitutional AI takes a different approach. Instead of relying primarily on human feedback, Claude is given a “constitution” or a set of principles that guide its behavior.

    When Claude generates an answer, another AI model evaluates which responses better adhere to the constitution, providing a reward signal that guides Claude toward more aligned behavior.

    The constitutional approach has some key advantages:

    • It makes Claude’s ethical framework explicit and transparent
    • It reduces dependency on the subjective judgments of human evaluators
    • It allows for more consistent application of principles across diverse scenarios
    • It enables more efficient scaling of ethical alignment as models grow more complex

    Character Training

    Most AI models are trained to avoid harmful outputs – a necessary but insufficient condition for truly helpful assistance. Anthropic recognized that when we think of people we admire, we don’t just value their ability to avoid causing harm; we appreciate positive traits like curiosity, thoughtfulness, and wisdom.

    This insight led to Claude’s distinctive “character training,” which first appeared in Claude 3 and is the reason why it may sometimes talk back to you. Anthropic describes this as training Claude to “behave well in a much richer sense” than mere harm avoidance.

    Character training builds on Constitutional AI methodology but focuses on instilling positive traits rather than just avoiding negative behaviors. The process involves:

    1. Creating a list of character traits Anthropic wants to encourage
    2. Generating various conversational scenarios relevant to these traits
    3. Producing different responses aligned with the desired character
    4. Ranking these responses by how well they embody the traits
    5. Training a preference model on this data

    Meet Claude

    Unless you’re building agentic workflows with Claude or using the API in a product, you’re primarily going to be using the chat interface on the web app. Let’s get you familiar with all the features.

    Claude’s Brain

    The first thing you’ll notice in that chat interface is the model selector to the bottom right.

    Claude model selector

    As of March 2025, the Claude family consists of four primary models. Like every other AI company, they’re pretty bad at naming them. I’m not going to explain each one because they may have changed by the time you read this, and most of the time you’ll only be using one model anyway – Claude 3.7 Sonnet.

    You know how in ChatGPT you usually use GPT-4o, but you can switch to o1 or o3 if you want it to think and give you better answers for complex questions. Well, GPT-4o is their regular model, and the o-series is their reasoning model. Yeah, I told you they all suck at naming.

    Anyway, Claude 3.7 Sonnet is both a regular model and a reasoning model, and you can choose whether you want it to behave like a regular model (by selecting Normal) or reason (by selecting Extended). You have even more flexibility in the API (which we’ll get to in Part 2).

    Concise Claude, Chatty Claude

    Right next to the model selector, you’ll also see an option to choose a response style. This directly impacts the response you get from Claude and is very useful depending on your use case.

    There are a couple of presets already like Concise, where Claude’s answers are short and to the point (great for if you just want some quick information) and Explanatory, where Claude goes into excruciating detail (great for learning).

    You can also create your own styles. For example, I’ve created a Storyteller Style where Claude generates content for me by weaving in storytelling aspects, which I use to create engaging social media and blog content.

    Claude style selector

    I highly recommend you create your own styles. Claude makes it easy to do this with some instructions. You can also just upload an example of the style you want Claude to mimic and it will figure it out on its own.

    Play around with it, try a few different styles for each of your use cases, and see what works best for you.

    Projects

    Every time I want to use Claude (or any AI) for something new, I create a new chat. This is good practice since longer chats eat up context window and make the AI slower.

    However, some of my chats are related, and I use the same prompts or upload the same reference material to Claude for those chats.

    This is where the Projects feature comes in. When you create a new project, you can tell Claude exactly what the project is about, give it very specific instructions, and upload reference material.

    Every new chat you start within that Project inherits these settings, saving you from repeatedly uploading the same files or typing the same instructions.

    Claude Projects

    For example, I have a Project called MCP Creator to help me code new Model Context Protocol servers. Don’t worry if you don’t know what that is. Just know that the code for a new server is structured in a particular way, and I want Claude to follow that structure every single time.

    When I created the Project, I uploaded the full documentation on how to code an MCP server and gave it instructions on how to structure the code. When I need to build a new server, I simply start a chat in this Project, describe what I want, and Claude already has all the context it needs.

    This feature is invaluable for content creation, coding projects, and client work where consistency matters.

    Artifacts

    Artifacts (pioneered by Claude and later adopted by other AI systems) allow you to work with content separately from the main conversation.

    It’s a lot easier to show you what an artifact is in a video than to describe it in text:

    Claude’s UI has dramatically improved since I made that video, but the core functionality of the artifact remains the same. It’s essentially a side panel that allows you to run code or work with content separate from the main chat.

    Cooking with Claude

    Now that you’re familiar with how everything works, let’s dig into how to use Claude in every day life, business and personal.

    I’m not going to cover all the possible use cases here. Like ChatGPT, Gemini, Grok, or any other AI model, Claude can do basic things like summarize documents, answer questions, analyze files, and so on. And if you want Deep Research, data analysis, or image generation, you’re better off with one of the others.

    But Claude’s training gives it a particular set of skills that make it better than the rest for certain use cases. I will focus on those.

    If Claude were human

    Content

    Claude doesn’t generate images but it is really great at generating text, especially for blog posts, social media, and even fiction writing.

    Blog Posts

    While Claude can generate a really good blog post with just a topic, I do not recommend doing that. instead, use Claude as a thought partner to help you extract your unique perspective and put it down in words.

    First, I recommend creating a custom style that is based on content that you have written previously. This will help Claude generate content that follows your voice and tone.

    I also have a project set up in Claude for all the content I create for my blog. The project includes specific examples for the way I want blog content to be structured as well as context around my blog and personal brand so that Claude understands how it fits into the overall goals for my blog.

    When I create the actual content, I use Claude to first generate an outline for me. I then go back and forth with Claude to edit the outline and get it to a structure that I’m happy with.

    If I’m working on an article that requires information beyond Claude’s knowledge cutoff date, I upload research around that topic into the chat so Claude is aware of it. Of late, I’ve been automating this process with MCPs.

    All of this takes me just a few minutes with Claude. Once that’s done, I do most of the writing myself. As I’m writing, I use Claude to give me feedback on my content and poke holes in my arguments. Once I’m done, I use Claude to edit the blog post, refine the content, and cut out unnecessary bits.

    Social Media

    For social media, I also have a project set up with instructions on how to structure a tweet or a LinkedIn post as well as a custom style for short and succinct content.

    The process I follow is different. I usually use Claude on my mobile phone for this and I use the voice feature. The voice feature on mobile just allows you to speak into Claude instead of typing. Claude doesn’t speak back with a voice.

    I ramble my thoughts into Claude this way (filled with ‘ums’ and ‘ahs’). Then, based on my instructions for the project, Claude turns that into a polished social media post. So the content is all mine, but Claude is just cleaning it up.

    Website Copy

    Much of the copy on my website has been written with the help of Claude. I have a project created specifically for my website where I’ve uploaded details about my brand and business for every page or piece of copy I want to generate.

    For every page on my site, I start a new chat with Claude in that project and have it come up with a structure for that page. I then go back and forth with it on copy for specific sections until I’m statisfied.

    Storytelling

    I haven’t written stories with Claude but it is popular amongst fiction writers or this use case. I’ve read examples where authors use it to help them build characters, worlds, storylines, and more.

    Notice how the author is not using Claude to generate the entire story. Instead, Claude helps with research and brainstorming, coming up with ideas for scenes or chapters, giving feedback, and editing. Much the same way I use Claude for my blog content.

    Here’s another example of someone using it or worldbuilding. They use a Claude Project with all the relevant documents about the fictional universe so that the outputs are consistent.

    Code

    Writing code is another extremely popular use case for Claude. It can often one-shot the code for an entire project from just a prompt. The more specific your prompt is, the better.

    As always, start by creating a new Project in Claude. If you’re using certain libraries or APIs, add the documentation to the Project Knowledge. You can also add instructions for the way you want it to generate code.

    Let’s say you’re working on a React app. You can use the project knowledge to store instructions about the app that you’re building, as well as important documentation.

    Your first chat in the project would be creating the plan and architecture with Claude. Have Claude generate a list of all the pages and components you need for your app and help you build out the user flows.

    Save this plan back into the project knowledge base, and then for every chat thereafter, you can generate a specific component or page using Claude. As you can see, you go from being the person writing the code to working on the strategy while Claude writes the code.

    Many developers also use Claude inside an IDE like Cursor or directly through the command line with Claude Code. I’ll talk more about this in Part 2.

    Complex Situations

    Similar to the story I started with this blog post with, Claude is great at helping you navigate complex social situations, in business or personally.

    As I mentioned earlier, Claude has a certain set of principles that it strives to uphold. When you talk about personal situations, it isn’t just going to agree with you or parrot back your thoughts. It will try to answer in line with its principles and may end up challenging you.

    This is actually a good thing. All language models have biases in them. By training Claude to follow its principles, Anthropic is ensuring that Claude doesn’t enforce its biases upon you.

    Think of it as a good friend who has your best interests at heart. You need not follow the advice, but it’s good to have that additional perspective.

    Claude’s Limitations

    I’ve been focusing on Claude’s strengths in this blog post, especially in areas where it stands out against other AI models. But that doesn’t mean it doesn’t have its drawbacks.

    No Voice Mode – Claude doesn’t have a voice mode like ChatGPT, making it difficult to have conversations with it on-the-go. You can, however, pair it with an AI voice model if you’re building voice agents.

    No Image Generation – Claude doesn’t generate images either. You will need to use an image generation model for that, or you can pair it with an image generation model in an Agentic workflow.

    Untested Web Search – the web search feature in Claude is brand new. It’s only being rolled out right now in beta in the United States, so I haven’t had a chance to try it out yet here in Canada. However, other AI platforms have had web search for a lot longer, so I’d use those for real-time answers.

    No Deep Research – this is a powerful research functionality that every other AI model has. I built out my own version of it using Claude and Exa, but the best one out there is ChatGPT’s version, with Gemini a close second.

    Conclusion: The Claude Difference

    The most powerful way to use Claude isn’t to treat it like a mere text generator or code machine. It shines when you approach it as a collaborator, someone who brings their own perspective to help refine your thinking.

    Whether you’re crafting blog content, building software, or working through personal dilemmas, Claude’s unique training makes it more than just an echo chamber for your own thoughts.

    In Part 2 of this series, we’ll explore the developer side of Claude, including the API, Claude Code, and MCP. Later, in Part 3, we’ll dive into industry-specific playbooks that combine everything we’ve learned. Stay tuned!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • Re-Found: A Step-By-Step Guide to Becoming an AI-First Company

    Re-Found: A Step-By-Step Guide to Becoming an AI-First Company

    I recently read Ravi Gupta’s thought-provoking article “AI or Die” and found myself nodding along with his core thesis: companies that successfully integrate AI into their foundations will thrive, while those that treat it as a peripheral tool will struggle to survive.

    He talks about “re-founding”, the idea that becoming AI-first requires rebuilding your company from the ground up with AI at its core. It’s not about adding an “AI strategy” slide to your quarterly deck or launching a token innovation lab. It’s about fundamentally reimagining how your business operates.

    So how exactly do you become an AI-first company? That’s where this guide comes in. I’ve created a comprehensive framework to help you transform your organization into an AI-first company one methodical step at a time. This is based on work I’ve done with dozens of companies.

    The Pyramid of AI Adoption

    I previously wrote a post called the Pyramid of AI Adoption which illustrates how far along you are in becoming an AI-first company.

    I suggest reading the full article but here are the Cliff’s Notes:

    Stage 1: Augmentation – You’re using ChatGPT to write emails and summarize meetings. It’s like getting training wheels for your AI bicycle. Most companies are camping out here.

    Stage 2: Automation – You’ve started changing how your company actually operates, automating away processes that eat up resources faster than I demolish a chocolate bar.

    Stage 3: Innovation – You’re creating entirely new business models and products with AI that were previously impossible with your resources.

    My aim in this guide is to show you how you can get to Stage 3. Of course, reading about it is the easy part. The hard part is implementing it! Let’s go…

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    Step 1: Immerse Leadership in AI

    Transformation starts at the top. As a leader, you can’t just delegate “that AI stuff” to your tech team any more than Tony Stark could outsource being Iron Man.

    Block out two hours every week to actually use these tools. I’m not talking about reading articles about AI (unless it’s my blog). I’m talking hands-on experience. Start with Claude or ChatGPT before venturing into more specialized tools.

    Here’s your executive starter pack:

    • Strategic Planning: Today’s AI models are extremely good at taking in large volumes of text and coming up valuable insights. Feed those massive reports into AI tools and watch them extract insights faster than a gossip columnist at a celebrity wedding.
    • Communication Enhancement: Writing emails, drafting announcements, sending investor updates, these are all things that can be done faster and better with AI. You still provide the vision, but AI makes sure it doesn’t read like it was written at 2 AM after your fourth espresso.
    • Meeting Follow-up: Tools that automatically generate meeting notes and action items? Yes, please! It’s like having a super-efficient assistant who never needs coffee breaks.
    • Competitive Intelligence: New Deep Research capabilities (Google it… actually, no, ChatGPT it) let you gather information across hundreds of websites in minutes. Your intern can go back to getting you coffee now.

    In a podcast with Patrick O’Shaughnessy, Ravi mentions how he fed ChatGPT some context about a dinner he was going to and asked it to give him some talking points. He read the talking points on the Uber ride over impressed his guests. This behavior should become second nature to you.

    Step 2: Mandate AI Use Across Your Company

    Once leadership is on board the AI train, it’s time to get everyone else tickets. Some employees are already secretly using these tools, but with leadership’s blessing, adoption spreads faster than rumours about office romance.

    A fun and quick way to do this is to have leaders share their learnings in team meetings. You could perhaps call an All-Hands and have every leader walk through something they tried with AI and the results.

    Another way is to have Department Heads conduct workshops for their departments to identify and experiment with AI tools. I’ve mentioned ChatGPT and Claude but there are hundreds of department specific AI tools that are more suited for certain tasks.

    You also need to ensure employees have permission to try out different tools. At the very least, give everyone in your company a Team subscription to ChatGPT.

    When I was running the Venture Studio at Forum VC, this is exactly what we did. The whole company got a subscription to ChatGPT, and we even mandated usage of it for my department.

    Book A Free Consultation

    If you’re serious about your AI Transformation, I can help. View my Services here and book a free consultation.

    Step 3: Conduct a Comprehensive AI Readiness Assessment

    With your team engaged, the journey continues with a thorough understanding of your current state. I should warn you, this requires deep examination of how your company actually operates and you may find some surprises while doing this.

    I’ll break down exactly how to conduct this assessment in actionable steps.

    Map Your Workflows Systematically

    First, create a standardized workflow documentation template that captures:

    • Process name and department
    • Inputs required (data sources, triggers, resources)
    • Steps performed (sequential actions with decision points)
    • Outputs produced (deliverables, decisions, impacts)
    • Current time investment (hours per week/month)
    • Estimated cost (labor, technology, opportunity cost)
    • Business impact rating (critical, high, medium, low)

    Next, organize department-by-department workflow collection sessions. I recommend instructing each department head to schedule dedicated 2-3 hour mapping workshops that include frontline staff.

    These sessions should uncover both formal procedures and those informal “shadow processes” that exist in every organization but rarely appear in official documentation.

    Direct teams to focus particularly on processes that:

    • Are performed frequently (daily or weekly)
    • Follow clear patterns or rules
    • Involve significant data processing or analysis
    • Consume substantial employee time
    • Create bottlenecks in delivering customer value

    Do not include processes that are part of your core value as a business (just yet). Right now, we’re focussing only on processes that do not deliver core value and hence have low risk of automating them.

    Score Each Process

    Once that’s done, we score each process in terms of how “AI ready” it is. It doesn’t have to be complex. I usually create a 1-5 scale assessment across these dimensions:

    • Data structure (1 = unstructured/analog information, 5 = highly structured digital data)
    • Decision complexity (1 = requires nuanced human judgment, 5 = follows clear rules)
    • Pattern recognition (1 = each situation is unique, 5 = clear patterns exist)
    • Creativity required (1 = high creative input needed, 5 = minimal creativity needed)
    • Emotional intelligence (1 = high empathy required, 5 = minimal EQ needed)

    After scoring, create a quadrant analysis plotting each process on two axes:

    • X-axis: AI Readiness (combined score of above dimensions)
    • Y-axis: Potential business impact (importance, cost, time savings)

    This visualization makes your prioritization decisions much clearer. Based on the scoring results, categorize processes into implementation timelines:

    • Immediate candidates (Q1): High scores on both axes – your quick wins
    • Mid-term candidates (Q2-Q3): High on AI readiness, medium on business impact
    • Long-term vision (Year 2+): Medium scores or processes requiring significant redesign
    • Human-centric processes: Low scores that should remain primarily human-driven

    A sales agency I worked with had a very clear, structured, onboarding process that didn’t require much creativity. All they needed to do was gather requirements and turn that into a document for the delivery team.

    Unfortunately, it took two weeks on average to complete onboarding, with multiple calls and emails between the onboarding team and the client. It’s not a core process but had high business impact and scored well on AI readiness. A prime candidate for automation.

    Identify and Empower Your AI Champions

    The final component of your assessment identifies the people who will drive transformation from within the organization.

    Deploy a company-wide AI attitude survey with questions that reveal:

    • Current use of AI tools (both personal and professional)
    • Interest level in AI applications within their work
    • Concerns or reservations about AI implementation
    • Ideas for how AI could improve their specific functions
    • Desire to participate in AI initiatives

    A healthcare system I worked with was surprised to discover that some of their most enthusiastic AI advocates weren’t in IT or analytics, but in clinical departments where individuals had independently started exploring AI tools to solve daily challenges.

    Analyze the assessment and survey results to identify potential champions, then conduct one-on-one interviews with promising candidates. Look for individuals who demonstrate:

    • Practical AI knowledge or strong aptitude to learn
    • Respect among peers (influence without authority)
    • Ability to bridge technical and business perspectives
    • Track record of successful change management
    • Persistence through challenges

    When your assessment is complete, you’ll have three critical assets:

    1. A comprehensive map of your organization’s processes with clear AI potential scores
    2. A prioritized transformation roadmap with timelines
    3. A group of internal AI champions ready to drive change

    This assessment provides the foundation for all your subsequent transformation efforts. It ensures you’re targeting the right opportunities, with the right sequence, and with the right people involved.

    Step 4: Launch Strategic Pilot Projects

    With your assessment complete, it’s time to move from theory to practice by launching some pilot projects.

    Pick out 2-3 projects from the processes in the first quadrant in Step 2. If you have many contenders, prioritize projects with high visibility across your organization, or span multiple business functions. They should also have clear ROI potential.

    I’m not going to go into how to run and manage projects here but it is extremely important. While everything I’ve mentioned so far sounds like a lot of fun, execution is usually where most companies stumble. This is really the make-or-break step, and to set you up for success, here are a few pointers:

    Treat It As a Real Project

    This is not a side project. Most side projects fail or don’t result in anything long-term because they aren’t taken seriously.

    Have your AI Champions from Step 2 lead these projects, make it their primary KPI, and give them the team and resources they need.

    Set Aggressive Implementation Timelines

    To create momentum and prevent analysis paralysis, establish ambitious but achievable timelines for your initial projects.

    I’ve helped companies launch and deliver AI automations within 30 days. Remember, we’re not looking for perfection here. We’re piloting a new way of doing things, and it just needs to be better than the old way.

    Document Process Changes and Results Meticulously

    Successful pilots will make the case for further transformation. Establish clear baseline metrics for the processes you want to automate, and then measure the results.

    Document everything meticulously. These case studies become powerful tools for expanding your transformation. PS – you can use AI for this!

    Create a Consistent Communication Cadence

    Effective communication is often the difference between successful transformations and failed initiatives. Develop a systematic approach to sharing progress, learnings, and successes throughout your organization.

    Buy Vs Build

    At least for the first few pilot projects, it makes sense to buy existing software or AI tools instead of build it out. You can roll your own AI once you’ve seen value.

    One VC client wanted to automate their entire top of funnel deal flow. We could have developed an end-to-end AI automation but we decided instead to cobble together 3 different software. It’s not perfect but it improved investment throughput and we laid the foundation for more custom AI builds.

    Launch the Projects

    Don’t just build the pilot and leave it at that. Actually launch it and roll it out. See how it runs in the real world. Measure if it’s making a difference.

    Getting a few successful pilots off the ground and communicating those successes sets you up for deeper transformation down the line.

    If you’ve come this far, congratulations, you’re in the second level of the Pyramid of AI Adoption – Automation.

    Step 5: Redesign Your Core Business Processes

    Once your initial pilots demonstrate value, it’s time for deeper transformation. We’re getting to the third level on the pyramid.

    This is where the “re-founding” concept becomes most apparent. You’re not just improving existing processes, you’re reimagining how work gets done.

    Begin by identifying processes that form the backbone of your value creation. These are the processes I told you not to focus on in the previous step.

    For a software company, this might be your development workflow; for a financial institution, your risk assessment process; for a healthcare provider, your patient care pathways.

    Before redesigning this process, thoroughly document the current process to understand its complete flow, inefficiencies, and hidden dependencies. This mapping creates a baseline understanding that will inform your redesign.

    For each selected process:

    1. Conduct detailed observation sessions with the people who perform the work daily
    2. Document every step, including unofficial workarounds and exceptions
    3. Identify decision points and the information used to make those decisions
    4. Measure time, cost, and quality metrics at each stage
    5. Identify pain points, bottlenecks, and redundancies
    6. Map data flows and information handoffs between systems and people
    7. Document compliance and regulatory requirements

    With this baseline, you can pick it apart and redesign it. The key to true transformation is starting with a clean slate rather than incrementally improving existing processes.

    Conduct structured workshops where teams reimagine the process from first principles, considering AI capabilities as fundamental building blocks:

    1. Begin with the core purpose of the process and desired outcomes
    2. Challenge all assumptions about how work must be done
    3. Ask: “If we were building this process from scratch today, with all of AI’s capabilities available, how would we design it?”
    4. Identify which decisions could be automated, augmented, or should remain human-driven
    5. Examine how to eliminate handoffs and information re-entry
    6. Determine how humans and AI will collaborate within the redesigned process

    Once you’ve redesigned the process, you can start the implementation. Again, I won’t go into project management here but keep in mind the advice I gave previously.

    Since this is a redesign of your core process, you also want to start small. Pick one piece of the design to implement first, measure it, learn from it, and then move to the next piece. Like trying a new hairstyle, you don’t go from conservative cut to mohawk overnight.

    Remember that process redesign is fundamentally about rethinking how work creates value, not just making existing processes more efficient. The organizations that achieve the greatest transformation benefits are those willing to challenge fundamental assumptions about how work must be done.

    Step 6: Transform Your Product and Service Offerings

    With internal transformation underway, turn your attention to market-facing opportunities. This is where AI fundamentally changes your value proposition in the marketplace. Rather than simply improving existing offerings, this step reimagines what’s possible when AI becomes central to your products and services.

    Establish an AI Innovation Team

    Create a dedicated team for AI-driven product innovation. Many companies are now hiring Chief AI Officers, and AI Product Managers to research and build AI-first products and features.

    Create Rapid Prototyping Processes

    With tools like Cursor, Windsurf, and Lovable, it’s extremely easy to rapidly prototype new products (especially in software). This doesn’t mean they’ll be instantly integrated into your core products and services but you can launch them as side tools and measure the response.

    Build Customer Feedback Loops

    Collect usage metrics not just on the product but also the AI features. A software company I advised built systems tracking not only when customers used their AI writing assistant but which suggestions were accepted, modified, or rejected, creating a rich dataset for improvement.

    Update Your Pricing

    As you develop these offerings, rethink your pricing strategy. AI-enhanced products often create exponentially more value than traditional alternatives.

    AI Or Die

    The pace of AI advancement isn’t slowing down.

    The companies that thrive will be those that start transforming today. They’ll make mistakes and face challenges along the way, but they’ll develop the organizational capabilities needed to capitalize on each new AI breakthrough.

    Is it easy? About as easy as teaching a cat to swim. It requires courage, commitment, and fundamentally rethinking how your business operates. But the alternative, watching AI-native competitors eat your lunch while you still decide what to order, is far more painful.

    I’ve guided numerous organizations through this journey, and while each transformation is unique, the framework outlined here provides a proven path forward.

    If you’re ready to begin your company’s AI transformation but need expert guidance, I’d welcome a conversation about how I can help you navigate this complex but essential transition.

    Book A Free Consultation

    If you’re serious about your AI Transformation, I can help. View my Services here and book a free consultation.

  • A Deep Dive Into The OpenAI Agents SDK

    A Deep Dive Into The OpenAI Agents SDK

    I’ve been discussing the inevitable progression that LLM companies are taking toward agentic AI capabilities for some time now on my blog and social media.

    My Model Context Protocol series explored how Claude (and any AI product) can go from a mere chatbot to an AI agent capable of taking actions on your behalf.

    OpenAI has also been on this path since launching ChatGPT. They’ve been adding tools like web search, code interpreter, Operator, Deep Research, and so on, to build out ChatGPT’s agentic capabilities.

    This week, on March 11, 2025, they took the next step with the release of their Agents SDK, an open-source toolkit designed to make building sophisticated AI agents accessible to developers of all skill levels.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    What is the Agents SDK?

    The OpenAI Agents SDK is a lightweight, Python-based framework for constructing multi-agent workflows. Evolved from their experimental “Swarm” project, this SDK provides a comprehensive solution for developers looking to create AI agents that can reason, use tools, and collaborate with other agents to accomplish complex tasks.

    At its core, the SDK offers a simplified architecture with a few key primitives:

    1. Agents: LLMs equipped with instructions and tools
    2. Handoffs: A system allowing agents to delegate specific tasks to other specialized agents
    3. Guardrails: Safety mechanisms that run parallel to agents, validating inputs and outputs
    4. Function Tools: Utilities to transform any Python function into a tool with automatic schema generation
    5. Tracing: Built-in capabilities for visualizing, debugging, and monitoring agent workflows

    Unlike some competing frameworks that require learning new abstractions, the Agents SDK embraces a Python-first approach. This allows developers to leverage familiar language features for orchestrating and chaining agents, significantly flattening the learning curve.

    Why It Matters

    The Agents SDK addresses many of the practical challenges developers face when building AI agents. It standardizes patterns for agent communication, state management, and collaboration, reducing the complexity barrier for creating useful AI applications.

    The SDK isn’t revolutionary—it’s evolutionary, building on existing concepts while providing a more accessible framework. It handles much of the orchestration complexity while giving developers precise control over agent behavior.

    What makes it valuable? Three core concepts:

    1. Agents that think AND act – Not just LLMs spitting out text, but AI assistants that can make decisions and execute functions
    2. Seamless teamwork through handoffs – Specialized agents working together, passing the baton when needed
    3. Safety through guardrails – Because nobody wants their AI going rogue after reading too many YouTube comments

    How It Works

    The mechanics of the Agents SDK are pretty straightforward. Let’s break down the basic workflow:

    1. Agent Configuration

    Agents are defined by providing a name, model, instructions, and tools:

    • Give them a name (“Customer Support Agent”)
    • Provide instructions (“Help users without saying ‘have you tried turning it off and on again?’ more than once per conversation”)
    • Choose their “brain” (from quick-and-simple to deep-thinking models)
    • Equip them with tools (the digital equivalent of giving someone access to the supply closet)
    Python
    from openai.agents import Agent
    
    researcher = Agent(
        name="Customer Support Agent",
        model="gpt-4o",
        instructions="Help users without saying 'have you tried turning it off and on again?",
        tools=[web_search, document_retrieval]
    )

    2. Agent Loop

    When your agent runs, it enters the “agent loop”, a fancy way of saying it thinks, acts, and repeats until the job is done. The SDK handles the agent loop automatically, managing tool calling, result processing, and iteration:

    1. Agent gets input (like “I need help with my subscription”)
    2. Agent decides if they need more info or can respond directly
    3. If they need info, they use a tool and get results
    4. This continues until they reach a final answer

    It’s basically the digital version of how I approach cooking: assess situation, realize I need more information, google recipe, realize I’m missing ingredients, order takeout, problem solved.

    Python
    from openai.agents import Runner<br><br>runner = Runner()
    result = runner.run(researcher, "What are the latest developments in quantum computing?")
    print(result.final_output)

    Tools: Extending Your Agent’s Capabilities

    Without tools, agents would just be fancy chatbots. Tools are what let your AI reach out into the world and actually do stuff.

    Creating a tool is as simple as decorating a Python function:

    Python
    from agents.tool import function_tool
    
    @function_tool
    def search_knowledge_base(query: str) -> str:
        # Your code to search a database
        return "Here's what I found about " + query

    There are two main types:

    • Hosted tools: Pre-built capabilities like web search (the tools already in your shed)
    • Function tools: Turn ANY Python function into an agent tool (like going to Home Depot and buying whatever you need)

    The beauty is in how naturally the agent decides when to use these tools – it’s not pre-programmed, but rather a decision the LLM makes based on the task at hand.

    Context: Keeping State Between Steps

    For complex applications, you often need to maintain state across multiple interactions. The SDK lets you create a context object:

    Python
    class UserSession:
        def __init__(self, user_id):
            self.user_id = user_id
            self.preferences = {}
    
    @function_tool
    def update_preference(context, category: str, preference: str):
        context.agent_context.preferences[category] = preference
        return f"Updated {category} preference to {preference}"

    This lets your tools access shared state, store progress information, or maintain user session data – incredibly useful for multi-step interactions.

    Output Types: Getting Structured Results

    By default, agents return free-form text, but you can enforce structured outputs using Pydantic models:

    Python
    from pydantic import BaseModel
    from agents import Agent, AgentRunner
    
    class ProductRecommendation(BaseModel):
        product_id: str
        product_name: str
        price: float
        reasoning: str
    
    agent = Agent(
        name="product_recommender",
        output_type=ProductRecommendation
    )

    This guarantees that you get properly structured data that your application can easily process, not just random text blobs.

    Tracing

    The built-in tracing system captures every step of the agent’s thinking and actions:

    • What the agent was thinking
    • Which tools it called and why
    • What inputs it used
    • What outputs it received

    This means when something goes wrong (and we all know something always goes wrong), you can actually figure out why.

    3. Multi-Agent Collaboration and Handoffs

    One of the most powerful features is the ability to create handoffs between specialized agents and let them collaborate:

    • Simple task? Use the fast, lightweight model
    • Billing questions? Send them to the “Money Person” agent
    • Technical problems? That’s for the “Have You Tried Restarting It?” agent
    • Complex reasoning needed? Bring in the heavyweight model
    Python
    support_agent = Agent(name="support", instructions="You handle customer questions.")
    technical_agent = Agent(name="technical", instructions="You solve technical issues.")
    billing_agent = Agent(name="billing", instructions="You handle billing inquiries.")
    
    triage_agent = Agent(
        name="triage",
        instructions="Route customer inquiries to the appropriate specialized agent.",
        handoffs=[support_agent, technical_agent, billing_agent]
    )

    This creates a workflow where agents can delegate subtasks, forming a collaborative system greater than the sum of its parts.

    4. Safety Guardrails

    Guardrails are the bouncers of your application, validating inputs before they reach your main agent. Want to prevent users from asking for the recipe to digital disaster? A guardrail can check inputs with a fast model first, saving your premium model for legitimate requests.

    Developers can implement safety measures that run in parallel with agent execution:

    Python
    from agents.guardrails import CustomGuardrail
    
    async def is_not_swearing(msgs, context) -> bool:
        content = " ".join(m["content"] for m in msgs if "content" in m)
        return "badword" not in content.lower()
    
    my_guardrail = CustomGuardrail(
        guardrail_function=is_not_swearing,
        tripwire_config=lambda output: not output  # if 'False', raise error
    )
    
    agent = Agent(
        name="my_agent",
        input_guardrails=[my_guardrail]
    )

    Hands-On Example: Building a Multi-Agent Research System

    To demonstrate the power and flexibility of OpenAI’s Agents SDK, I’ve created a practical example that showcases how multiple specialized agents can collaborate to accomplish complex tasks. This Research Agent System represents the kind of real-world application that the SDK enables developers to build quickly and efficiently.

    The Research Agent System Architecture

    This system consists of four specialized agents that work together to produce comprehensive research content:

    1. Triage Agent: Coordinates the overall research process, delegating tasks to specialized agents
    2. Researcher Agent: Gathers information from various sources on a given topic
    3. Fact Checker Agent: Verifies statements for accuracy and proper sourcing
    4. Writer Agent: Synthesizes verified research into coherent, well-structured content

    Each agent is designed with specific instructions, tools, and capabilities that allow it to excel at its particular role. The system demonstrates several key features of the OpenAI Agents SDK:

    • Handoffs: Agents delegate tasks to more specialized agents
    • Context sharing: All agents work with a shared research context
    • Guardrails: Ensures content remains fact-based and properly sourced
    • Structured outputs: Final content follows a consistent, well-organized format
    • Function tools: Agents leverage specialized tools for searching, verifying, and saving content

    The Code

    Each agent as described above is going to do a certain task and then give us the result of the task in an output. We want to ensure that the output is structured in a certain manner so that when they hand it off to the next agent, that agent can take it in that structure and then do more work on it.

    Python
    class ResearchFinding(BaseModel):
        """A single research finding with source information."""
        statement: str
        source: str
        confidence: float  # 0.0 to 1.0
    
    
    class VerifiedResearch(BaseModel):
        """Collection of verified research findings."""
        findings: List[ResearchFinding]
        verified: bool
        notes: Optional[str] = None
    
    
    class FinalContent(BaseModel):
        """Final output content with structured sections."""
        title: str
        introduction: str
        key_points: List[str]
        body: str
        conclusion: str
        sources: List[str]

    We also want to give each agent some tools to do their work. The Research Agent, for example, will need a tool to search the internet as well as save the retrieved content into a file. The fact-checker agent would need a tool to verify that content, and so on.

    I am not going to write all the tools here, but here’s what the web search tool might look like, using the Exa Search API.

    Python
    @function_tool
    async def search_web(context: AgentContextWrapper[ResearchContext], query: str) -> str:
        """
        Search the web for information about a topic using the Exa Search API.
        
        Args:
            query: The search query text
            
        Returns:
            Search results as formatted text with citations
        """
        topic = context.agent_context.topic
        # Combine the specific query with the general topic for better results
        full_query = f"{query} about {topic}"
        
        try:
            # Make a request to the Exa Search API
            async with aiohttp.ClientSession() as session:
                async with session.post(
                    "https://api.exa.ai/search",
                    headers={
                        "Content-Type": "application/json",
                        "x-api-key": "YOUR_EXA_API_KEY"  # Replace with your actual API key
                    },
                    json={
                        "query": full_query,
                        "numResults": 5,
                        "useAutoprompt": True,
                        "type": "keyword"
                    }
                ) as response:
                    if response.status != 200:
                        error_text = await response.text()
                        return f"Error searching: {response.status} - {error_text}"
                    
                    search_results = await response.json()
        
            # Process the results
            formatted_results = f"Search results for '{query}' about {topic}:\n\n"
            
            if not search_results.get("results"):
                return f"No results found for '{query}' about {topic}."
            
            # Format each result with its title, content, and URL
            for i, result in enumerate(search_results.get("results", []), 1):
                title = result.get("title", "No title")
                url = result.get("url", "No URL")
                content = result.get("text", "").strip()
                
                # Limit content length for readability
                if len(content) > 500:
                    content = content[:500] + "..."
                
                formatted_results += f"{i}. **{title}**\n"
                formatted_results += f"   {content}\n"
                formatted_results += f"   Source: {url}\n\n"
            
            # Add a summary if available
            if search_results.get("autopromptString"):
                formatted_results += f"Summary: {search_results.get('autopromptString')}\n\n"
                
            return formatted_results
        
        except Exception as e:
            # Provide a useful error message
            error_message = f"Error while searching for '{query}': {str(e)}"
            
            # Add fallback information if the search fails
            fallback_info = f"\n\nFallback information about {topic}:\n" + \
                            f"1. {topic} has been studied in recent publications.\n" + \
                            f"2. Current research suggests growing interest in {topic}.\n" + \
                            f"3. Common challenges in {topic} include implementation complexity and adoption barriers."
            
            return error_message + fallback_info

    You’ll notice this tool uses the ResearchContext context to share data across other tools. Let’s define that as well:

    Python
    class ResearchContext:
        def __init__(self, topic: str):
            self.topic = topic
            self.findings = []
            self.verified_findings = []
            self.draft_content = ""
            self.history = []
            
        def add_finding(self, finding: ResearchFinding):
            self.findings.append(finding)
            self.history.append(f"Added finding: {finding.statement}")
            
        def add_verified_findings(self, verified: VerifiedResearch):
            self.verified_findings.extend(verified.findings)
            self.history.append(f"Added {len(verified.findings)} verified findings")
            
        def set_draft(self, draft: str):
            self.draft_content = draft
            self.history.append("Updated draft content")

    You may also want to add some guardrails, for example checking if the research content is unbiased. A very simple hard-coded example might be to count the number of times an opinion is expressed vs a fact, like so:

    Python
    async def is_fact_based(msgs, context) -> bool:
        """Check if messages appear to be fact-based and not opinion-heavy."""
        content = " ".join(m.get("content", "") for m in msgs if isinstance(m, dict))
        opinion_phrases = ["I believe", "I think", "in my opinion", "probably", "might be", "could be"]
        
        # Count opinion phrases
        opinion_count = sum(content.lower().count(phrase) for phrase in opinion_phrases)
        
        # Allow some opinion phrases, but not too many
        return opinion_count < 3
    
    
    fact_based_guardrail = CustomGuardrail(
        guardrail_function=is_fact_based,
        tripwire_config=lambda output: not output,
        error_message="Output contains too many opinion statements rather than fact-based research."
    )

    You can create something more powerful but this simple example highlights how the SDK checks against your guardrails.

    Finally, we’ll create our Agents and give them the tools, context, and guardrails. Here’s what the Fact Checker Agent might look like:

    Python
    fact_checker_agent = Agent(
        name="fact_checker_agent",
        model="gpt-4o",
        instructions="""You are a meticulous fact-checking agent. Your job is to:
        1. Review the research findings in the shared context
        2. Verify each statement using the verify_statement tool
        3. Consolidate verified findings using save_verified_research
        4. Be skeptical and thorough - only approve statements with sufficient evidence
        
        For each finding, check if the source is credible and if the statement contains verifiable
        facts rather than opinions or generalizations.
        """,
        context_type=ResearchContext,
        tools=[verify_statement, save_verified_research],
        output_type=str,
        output_guardrails=[fact_based_guardrail],
        description="Verifies research findings for accuracy and proper sourcing"
    )

    Our Triage Agent which manages the whole process would also have handoffs defined in its parameters:

    Python
    triage_agent = Agent(
        name="triage_agent",
        model="gpt-3.5-turbo",
        instructions="""You are a research coordinator who manages the research process.
        For any research query:
        1. First, hand off to the researcher_agent to gather information
        2. Then, hand off to the fact_checker_agent to verify the findings
        3. Finally, hand off to the writer_agent to create the final content
        
        Monitor the process and ensure each specialized agent completes their task.
        """,
        context_type=ResearchContext,
        handoffs=[
            handoff(researcher_agent),
            handoff(fact_checker_agent),
            handoff(writer_agent)
        ],
        output_type=FinalContent,
        description="Coordinates the research process across specialized agents"
    )

    And finally, we write the main function to run the whole process:

    Python
    async def run_research_system(topic: str) -> FinalContent:
        """Run the multi-agent research system on a given topic."""
        # Create the shared context
        context = ResearchContext(topic=topic)
        
        # Configure the run with tracing enabled
        config = AgentRunConfig(
            run_name=f"research_{topic.replace(' ', '_')}",
            tracing_disabled=False
        )
        
        # Run the triage agent with the initial query
        result = await AgentRunner.run(
            triage_agent,
            [f"Research the following topic thoroughly: {topic}"],
            context=context,
            run_config=config
        )
        
        return result.agent_output

    Try It Yourself

    If you’re eager to explore the Agents SDK yourself, the process is straightforward:

    1. Install the SDK via pip: pip install openai-agents
    2. Check out the official documentation
    3. Explore the GitHub repository for examples and contributions

    The documentation is comprehensive and includes numerous examples to help you understand the SDK’s capabilities and implementation patterns.

    Your Next Steps

    As we venture further into the age of agentic AI, tools like the Agents SDK will become increasingly valuable. Whether you’re looking to automate complex workflows, create specialized assistants, or explore the frontiers of AI capability, this toolkit provides an excellent foundation.

    I encourage you to dive in and experiment with the Agents SDK for your projects. If you’re working on something interesting or need guidance on implementation, don’t hesitate to reach out. I’m particularly interested in hearing about novel applications and creative uses of multi-agent systems.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

  • The Ultimate Guide to Model Context Protocol, Part 3: Tying It All Together

    The Ultimate Guide to Model Context Protocol, Part 3: Tying It All Together

    The MCP Series

    This is the third and final part of the Model Context Protocol Series. Part 1 and Part 2 can be found here.

    In our previous posts we looked at what exactly MCP is and how it works behind the scene. Now it’s time to tie it all together and build some useful stuff with MCP.

    Before we begin, I’m going to let you in on a little secret. I used Claude with MCP to help me write this series. Shocker!

    Claude has always been my favourite for creating content. You can read more about how all the different AI chatbots and tools fit in my stack. Before MCP, I’d have to do a bunch of research on my own, then feed Claude all that context before working with it to craft a post.

    That’s a lot of work so I decided to build my own AI agent to automate the research work too. I used Exa to find resources to include and sent that to Claude via the API. But the agent was a bit limited too, and I missed the chat experience in the Claude app, man and machine working together in harmony to write a blog post.

    Now, with MCP, I can do exactly that. This is the first of many examples of building AI workflows with MCP I’m going to be sharing with you today. Let’s dive in.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    1. From Chat to Blog Post

    I’m not a fan of telling the AI to write a blog post and then directly publishing it without any human oversight or input. That’s just AI slop.

    Plus, I write because I want my thoughts and opinions to be read and critiqued by others. I get no satisfaction and will not learn if an AI does everything for me.

    So I use AI (Claude specifically) as my writing partner. It’s not a linear process. I start with some thesis or topic I want to explore (like this MCP series) and use Claude to help me create an outline.

    Once I’m satisfied with the outline, I do some research to back up what I want to say. Exa has an official MCP server that I’ve installed, allowing Claude to search the web and pull interesting content.

    I use Exa for a number of reasons that I’ve listed in my Guide to Exa, the primary one being it tends to pull in much deeper and richer content, including research papers, which is great for writing technical content.

    I also use the Brave MCP and a custom Twitter MCP I built for myself. Brave gets me more recent news type results and Twitter pulls in trending conversations about said topic.

    After Claude has pulled these results, I read through them and may tweak the outline again based on the new information.

    At this point, I’m ready to write the actual article. I have a thesis or topic, a structure to present said topic, and real data or resources to support me.

    The process of writing is an even longer back and forth. I ask Claude to flesh out the first chapter, then I make edits, and we go back and forth until I’m satisfied.

    I’ll often ask Claude to poke holes in it or provide counter-arguments. I’ll also try to include the research or even ask Claude to pull in more specific research for this chapter.

    I repeat this for every chapter until the whole article is written. At this point, we have a fully written out articles, with a clean structure and headers, and links to supporting data or resources.

    Finally, with the WP MCP server (I built one for myself), I ask Claude to post it. It first asks for my permission, and then in pushes the content to my WordPress site (this blog you’re reading) in draft status. I then log in to WP, make any final tweaks and edits, and hit publish.

    And that, dear reader, is how I wrote these very words you’re reading right now!

    For a full video overview, watch this:

    2. From Chat to GitHub Repo

    Quite similar to the way you can chat with Claude to create WordPress blog posts, you can also create GitHub repositories using the GitHub MCP.

    Me: What changed in our authentication system between the last release and now?

    Claude: [Using Git MCP] I see 7 commits affecting authentication since v2.1.0. The most significant changes include:
    1. Migration from JWT to OAuth2 for API authentication
    2. Implementation of refresh token rotation
    3. Addition of 2FA for admin accounts

    The primary developer was @jsmith, and all changes passed CI. Would you like me to explain any specific change in detail?

    This isn’t just a demo—it’s how developers are working right now. As shown in “This Claude MCP GitHub AI AGENT Changes EVERYTHING!“, MCP is revolutionizing how developers interact with their repositories.

    3. End to End Sales Agent

    This is a pretty cool workflow built by Robert Lukoszko where he has an MCP server that connects to his LinkedIn. For anyone who comments under a post, it’s able to extract the profile of that person and go and do some research on that person to find out more about them as well as their email. He also has an MCP server connected to his email so he can then have an email sent to the people who commented on that post.

    Essentially a very powerful sales agent that automatically goes through the comments on any LinkedIn post and then reaches out to them via email.

    https://twitter.com/Karmedge/status/1899825488379392333

    4. Customer Support Automation

    Karan Vaidya, from Composio, shared a couple of really cool customer support automations that he built out with Cursor and MCPs.

    The first one fetches customer support emails, analyzes the issue, solves the issue, and then responds to the email with the correct solution.

    If it can’t correct the solution, it uses Git Blame to find the responsible dev, creates a ticket on Linear, and then notifies the developer on Slack about the issue.

    5. Other Creative Applications

    Beyond the expected use cases, I’ve been amazed by the creative ways people are applying MCP technology.

    Personal Knowledge Management

    David Teren’s Claude Server MCP transforms knowledge management with persistent context across conversation sessions:

    Me: What connections exist between my project notes on renewable energy and my reading notes on urban planning?
    
    Claude: [Using Claude Server MCP] I found 7 conceptual links between your renewable energy and urban planning notes. The strongest connections involve microgrids, community solar projects, and transit-oriented development. You explored similar funding mechanisms in both areas but haven't explicitly connected them in your notes. Would you like me to create a new note mapping these relationships?

    Advanced Document Research

    Alexander Komyagin built an Agentic RAG solution using MCP to fundamentally change how he interacts with documents (read his article here):

    Me: What do our financial projections say about Q3 revenue growth compared to analyst expectations?
    
    Claude: [Using Agentic RAG MCP] Let me examine your documents...
    
    After analyzing your Q3 projections and comparing with analyst reports from Bloomberg and Reuters, I've found a 5.7% discrepancy. Your internal projections show 12.3% growth while consensus estimates are 18%. The primary factors driving this difference appear to be more conservative supply chain recovery assumptions in your models.

    Instead of the traditional static retrieval systems, his MCP implementation lets Claude autonomously decide its search strategy and combine information from multiple document sources—just like a skilled researcher would.

    Automated 3D Designs

    Siddharth Ahuja connected the Blender MCP server to Claude to create 3D designs and scenes with just a few prompts.

    https://twitter.com/sidahuj/status/1899460492999184534

    Learning Helper

    Giacomo connected Claude to Rember, a tool to create flashcards, to help him automatically create flashcards to remember important concepts while chatting with Claude.

    What Will You Build?

    What makes MCP revolutionary isn’t just the technology itself, it’s the connections it enables. Right now, only Claude, Cursor and a handful of other AI companies have MCP clients that enable you to extend their functionality by adding MCP servers.

    But imagine what happens as more companies build out MCP clients. The ability to seamlessly integrate AI into your existing workflows, tools, and data sources creates a multiplicative effect that transforms productivity.

    As you explore MCP for yourself, start with a simple question: “What tasks take up most of my time but don’t require my unique human judgment?” Those are perfect candidates for automation.

    The real power comes when you combine multiple MCP servers. Need to analyze data, create visualizations, and publish findings to your blog? Chain together database, visualization, and WordPress MCPs for a workflow that would have been science fiction just months ago.

    And if you need help building these out, drop your email below and I’ll reach out.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • The Ultimate Guide to Model Context Protocol, Part 2: Behind The Scenes of MCP

    The Ultimate Guide to Model Context Protocol, Part 2: Behind The Scenes of MCP

    The MCP Series

    This post is part 2 of my “Ultimate Guide to Model Context” series. You can read part 1 here and part 3 here.

    In our previous post, we introduced the Model Context Protocol (MCP) and how it transforms our dear Claude from a knowledgeable yet impotent AI into a helpful digital butler who can actually interact with your files, apps, and services. Now it’s time to draw back the curtain and have a gander at the magic behind it.

    Don’t worry—we’ll keep things simple and jargon-free, dishing out plenty of analogies to explain the technical concepts like my Uncle dishes out expletives when India cocks up a cricket match. By the end of this post, you’ll understand what makes MCP tick and how you can start exploring different MCP servers for your specific needs.

    How MCP Works

    Remember our analogy of MCP as a universal translator between AI and your digital world? Let’s expand on that to understand what’s happening behind the scenes.

    The MCP Architecture Explained

    At its core, MCP follows what tech folks call a “client-server architecture.” This is exactly how computers work with the internet. The browser on your computer is the “client”. It retrieves and displays information from a “server” over the internet via a protocol called HTTP.

    The Model Context Protocol is similar. Let’s say you’re enjoying a cold Sunday evening by the fire in the study of your manor, as one does. It’s a high-tech study with a built-in AI assistant. You ask the assistant to have some hot toddy sent over:

    1. The Host (where you chat with AI)

    • This is an application like Claude Desktop or Cursor where you interact with an AI assistant.
    • In our analogy, this is the study of your manor.

    2. The MCP Client (the translator)

    • This is built into the host application. So the engineers at Claude and Cursor need to build this first for the whole thing to work.
    • It translates between what the AI understands and what MCP servers speak.
    • You never interact with this directly, it works behind the scenes like HTTP.
    • In our analogy, it’s an upgrade module for your study that allows your AI assistant to communicate with other parts of your manor, such as the bar.

    3. MCP Servers (specialized helpers)

    • Each server is like a specialist with access to specific resources.
    • One server might know how to work with files, another with Slack, and so on.
    • Servers can be on your computer or connect to online services.
    • In our analogy, the bartender who makes the hot toddy and brings it over to you is the server.

    4. Tools (actions your AI takes via servers)

    • These are the functions available to the AI on the server.
    • A document server may have a read_file action that the AI can invoke to read a specific file.
    • In our analogy, the tool is the ability to prepare libation.

    5. Resources (your digital stuff)

    • The actual files, apps, and services the AI needs to access
    • Could be local (on your computer) or remote (on the internet)
    • In our analogy, these are the ingredients that go into making the hot toddy. I prefer a spot of Cognac myself.

    If you enjoyed this analogy, I have more for you. Be a dear and sign up to my newsletter for more.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    A Day in the Life of an MCP Request

    Ok enough with the analogies. To really understand how this works, let’s follow what happens when you ask your AI assistant to summarize a document and send it to Slack:

    1. You make a request to Claude: “Be a good sport and summarize the quarterly_report.pdf on my desktop. Oh and while you’re at it, post the key points to the #team-updates Slack channel”
    2. Claude recognizes this requires access to both files and Slack, so it needs to use MCP
    3. The MCP Client activates and connects to two different MCP servers:
      • The File System MCP Server (to access the PDF)
      • The Slack MCP Server (to post the message)
    4. Permissions are checked:
      • The File System server asks: “Allow Claude to read quarterly_report.pdf?”
      • The Slack server asks: “Allow Claude to post to #team-updates?”
      • You approve both requests
    5. The File System server retrieves the PDF content and sends it back through MCP
    6. Claude processes the document and creates a summary
    7. The Slack server takes Claude’s summary and posts it to your team channel
    8. You receive confirmation that the task is complete

    All of this happens in seconds, with the complex technical work hidden from view. The beauty of MCP is that it handles all the complicated connections while maintaining security and giving you control.

    The Technology That Powers MCP

    Now that you understand the basic flow, let’s demystify some of the technology that makes MCP possible:

    The Protocol Itself

    The Model Context Protocol is what tech people call an “open standard.” This means:

    • It’s publicly documented so anyone can build with it
    • It follows consistent rules for communication
    • It’s designed to be secure from the ground up

    Think of it like the rules of the road—all vehicles (or in this case, different software) follow the same rules, allowing smooth traffic flow.

    Security Measures

    MCP takes security seriously with several built-in protections:

    Permission-Based Access

    • Nothing happens without your explicit approval
    • Permissions are fine-grained (specific to each action)

    Sandboxing

    • Each MCP server is isolated from others
    • If one server has a problem, it doesn’t affect the rest

    Audit Trails

    • All actions are logged so you can see what happened
    • Useful for troubleshooting or monitoring usage

    Real-Time Communication

    MCP uses modern, efficient methods for passing information back and forth:

    • It’s designed for low latency (minimal delays)
    • It handles both simple requests and large data transfers
    • It manages two-way communication seamlessly

    This means you don’t have to wait long for results, even when dealing with complex tasks involving multiple systems.

    MCP Servers: The Building Blocks of AI Integration

    MCP servers are the workhorses of the system. Each one is specialized for a specific purpose, and you can mix and match them based on your needs.

    Types of MCP Servers

    MCP servers generally fall into a few categories:

    1. Local Resource Servers

    • Access things on your computer
    • Examples: File System, Local Database, Browser Control

    2. Communication Servers

    • Connect to messaging and social platforms
    • Examples: Slack, Email, Bluesky

    3. Productivity Servers

    • Integrate with work tools
    • Examples: GitHub, Google Drive, Calendar

    4. Information Servers

    • Fetch and process data
    • Examples: Weather, Search, Wikipedia

    5. Specialized Servers

    • Handle niche needs
    • Examples: 3D Printer Control, Smart Home

    Where to Find MCP Servers

    In the previous post, I mentioned a few of the top MCP servers. If you’re looking for more, there are several places to discover and download MCP servers:

    1. Official MCP Servers Repository
    2. MCP.so Directory
    3. Glama.ai
    4. Composeio
    5. Cursor Directory
    6. Awesome MCP Servers
    7. Fleur MCP App Store
    8. MCP Run
    9. Smithery

    Setting Up Your Own MCP Server

    While most people will simply use existing MCP servers, you might be curious about how they’re created. Or perhaps you can’t find one and want to build your own. Here’s a simplified explanation:

    What You Need to Create an MCP Server

    If you’re not a developer, you probably won’t be creating your own MCP servers. But understanding what goes into them can help you appreciate what they do:

    1. Programming Skills

    • Knowledge of languages like Python and JavaScript
    • Understanding of APIs and web services

    2. Development Tools

    • MCP SDK (Software Development Kit)
    • Required libraries and dependencies

    3. Access to Resources

    • API keys for external services
    • Documentation for the systems you’re connecting to

    For the Technically Curious: A Simple Example

    Here’s what a very basic MCP server might look like in concept (this is simplified pseudocode):

    JSON
    // Define what the server can do
    server.addCapability("read-weather-forecast", {
      description: "Gets the weather forecast for a location",
      parameters: {
        location: "The city or area to get the forecast for",
        days: "Number of days to forecast"
      },
      securityLevel: "requires-approval"
    });
    
    // Implement the actual functionality
    server.onRequest("read-weather-forecast", async (request) => {
      // Get the forecast from a weather service
      const forecast = await weatherAPI.getForecast(
        request.parameters.location, 
        request.parameters.days
      );
      
      // Return the results
      return {
        current: forecast.current,
        daily: forecast.daily,
        warnings: forecast.alerts
      };
    });
    
    // Start listening for connections
    server.start();
    

    This simplified example shows how an MCP server:

    1. Defines what capabilities it offers
    2. Specifies what parameters are needed
    3. Sets security requirements
    4. Implements the actual functionality
    5. Returns results in a structured format

    In reality, MCP servers are more complex, with proper error handling, security features, and optimization—but this gives you a sense of their basic structure.

    Connecting Multiple MCP Servers: The Power of Combination

    One of the most powerful aspects of MCP is the ability to use multiple servers together. This creates workflows that would otherwise require complex programming.

    Example: A Research Assistant Workflow

    Imagine you’re researching a topic and want AI help. With multiple MCP servers, you could:

    1. Use the File System server to scan your existing notes
    2. Use the Browser Control server to search for new information
    3. Use the Wikipedia server to verify facts and get background
    4. Use the Google Drive server to save your findings
    5. Use the Slack server to share insights with colleagues

    All of this could be accomplished with a single request to your AI assistant, with each server handling its specialized part of the task.

    Common Questions About MCP Servers

    “Are MCP servers safe to install?”

    MCP servers from reputable sources follow strict security protocols. Stick to official directories and well-reviewed options. Each server will ask for specific permissions, so you always maintain control over what they can access.

    “How many servers should I install?”

    Start with just the ones you need for your common tasks. You can always add more later. Most users begin with the File System server and add others as needed.

    “Will MCP servers slow down my computer?”

    Most MCP servers use minimal resources when idle and are designed to be efficient. If you’re not actively using them with your AI assistant, they have very little impact on performance. I’ve noticed, however, that it does slow down my Claude Desktop app if I add too many.

    “Can I use MCP servers with any AI assistant?”

    Currently, MCP works with compatible hosts like Claude Desktop and Cursor. As the protocol gains popularity, more AI applications are likely to support it.

    What’s Next on Your MCP Journey

    Now that you understand how MCP works behind the scenes and what servers are available, you’re ready to start building your personalized AI workspace.

    In my next post in the series, I’ll provide a hands-on guide to building out useful agentic workflows with Claude and MCP servers. I’ll walk through the setup process with screenshots and troubleshooting tips to ensure a smooth experience.

    Sign up below and stay tuned for it!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.