I was at Web Summit Vancouver last week, a tech conference where the only topic of every conversation was, surprise surprise, AI! As someone who has been in the space for years, well before the ChatGPT boom, I was excited to talk to my fellow nerds about the latest tools and tech.
And I was shocked to find that many attendees, including product managers and developers, hadn’t even heard of the AI tools I used most, like Claude and Cursor.
I’ve already written guides on Claude and Claude Code, so I figured I’d do one for Cursor. This guide is for you if you’re:
A complete coding beginner who’s heard the vibe coding hype and wants to skip the “learn syntax for six months” phase
A seasoned developer curious about AI coding tools but tired of switching between ChatGPT tabs and your IDE
Someone who tried Cursor once, got confused by all the modes and features, and gave up
By the end, you’ll know exactly how to use Cursor’s three main modes, avoid the common pitfalls that trip up beginners, and build real projects.
Installation and First Contact
Time for the least exciting part of this guide: getting Cursor on your machine. Head to cursor.com and download the application (revolutionary, I know). The installation is standard “next, next, finish” territory, so I won’t insult your intelligence with screenshots.
If you’re familiar with other IDEs, like VS Code, then Cursor won’t look too different. In fact, it’s literally a fork of VS Code. Your muscle memory, keyboard shortcuts, and extensions all work exactly the same. You can install Cursor and use it as a drop-in VS Code replacement without touching a single AI feature.
But why would you want to do that when you could have a coding superpower instead?
Open one of your existing projects in Cursor and hit Cmd+L (Mac) or Ctrl+L (Windows/Linux). That’s your AI sidebar. Type something like “explain what this file does” and watch as Cursor not only explains your code but suggests improvements you hadn’t even thought of.
This is your first taste of what makes Cursor different. It’s not pulling generic answers from the internet, or generating something irrelevant. It’s analyzing your actual project and giving you contextual, relevant help. Let’s explore the different ways it can do this.
If you don’t have an existing project, ask Cursor to create one! Just type in “Generate a simple HTML file about pizza toppings” or whatever strikes your fancy, and watch the magic.
The Three Modes of Cursor
Cursor has three main ways to interact with AI, and knowing when to use each one is like knowing when to use a scalpel versus a sledgehammer. Both are tools, but context matters.
Ask Mode: Your Coding Sherpa
Think of Ask mode as your personal Stack Overflow that actually knows your project. Hit Cmd+L (or Ctrl+L) to open the sidebar, make sure “Ask” is selected in the dropdown, and start asking questions.
I often use this if I’m returning to a project I haven’t looked at in a couple of days, or if I’m trying to understand why Cursor generated code in a certain way. It’s also a great way to learn how to code if you’re not a professional.
You can ask it something specific, like what does this function do, all the way to asking it how an entire codebase works. I encourage you to also ask it to explain itself and some of the architectural decisions it makes.
Examples:
“What does this function do and why might it be slow?”
“What are other ways to implement this functionality”
“How would you approach adding authentication to this app?”
“What are the potential security issues in this code?”
Ask mode is read-only so it won’t change your code. It’s purely for exploration, explanation, and planning. Treat it like Google, but Google that knows your specific codebase inside and out.
Pro Tip: Ask follow up questions to deeper understanding, request alternative approaches to problems, and use it to understand AI-generated code before implementing.
Agent Mode: The Code Wizard
This is where the magic happens. Agent mode (formerly called “Composer”) can actually make changes to your code, create new files, and work across your entire project.
You tell it to do something, and it just does it, from adding new text to a page, all the way to creating an entire new feature with multiple pages, functions, and components.
It can even run commands in the terminal, like installing a new package or committing changes to Git.
Examples:
“Build a login form with validation”
“Create a new branch for the onboarding feature”
“Create a REST API for managing user profiles”
“Refactor this component to use TypeScript”
Agent mode takes into context your entire codebase to understand relationships between different parts and create or modify multiple files. If you ask it to make wholesale change, it will literally go off and generate tons of code across multiple files.
Pro Tip: Start with clear, specific requirements and review changes before accepting them. Use version-control like Git at every step.
Edit Mode: The Precision Tool
Edit mode is for making smaller, more precise edits. To use this, you need to select some code in the editor and you’ll get a little menu with options to add to chat or edit.
Selecting edit opens up edit mode where you can ask the AI to make changes to that piece of code. You might want to use this when making small tweaks to existing code, refactoring a single function, or a quick bug fix.
YOLO Mode
There’s a secret fourth mode in Cursor called YOLO mode. Ok it used to be called YOLO Mode but they’ve changed it to the less scary “auto-run mode”.
This mode lets the AI run terminal commands automatically. You may have noticed in your tests so far, especially in Agent mode, that it pauses and asks if it can install a package or spin up a dev server.
If you select auto-run mode, it executes these commands without your consent. This is obviously a risky thing so I suggest you limit it to certain commands, like running tests. That way, when you ask Agent to build a new feature and test it, it does so automatically without your active involvement.
Choosing Your Mode
“I want to understand something” → Ask mode
“I want to build/change something” → Agent mode
“I want a tiny, precise change” → Edit mode (or just use Agent)
Here’s a practical exercise to try all three:
Ask mode practice: Open your HTML file and ask “What would make this webpage more accessible?”
Agent mode practice: Tell Agent “Add a CSS file that makes this webpage look modern with a color scheme and better typography”
Edit mode practice: Select the page title and ask Edit to “Change this to something more creative”
Context is king
Cursor is only as good as the context you give it. The AI can only work with what it can see, so learning to manage context effectively is the difference between getting decent results and getting mind-blowing ones.
When you open the AI sidebar, look at the bottom and you’ll see an option to “@add context”. This is where you add files, folders, or specific functions to the conversation.
The @ symbol: Click the @ symbol or type it in to chat to see what files Cursor suggests. This tells the AI “pay attention to this specific file.”
You can reference specific files, folders, or even certain functions
@docs can pull in documentation if available
@components/ includes your entire components folder
@package.json includes just that file
The # symbol: Use this to focus on specific files.
The / symbol: Before starting a complex task, open the files you think are relevant to that task, then use the “/” command in Agent mode to “Add Open Files to Context.” This automatically adds them all to context.
The .cursorignore File
Create a .cursorignore file in your project root to exclude directories the AI doesn’t need to see:
JSON
node_modules/dist/.env*.logbuild/
This keeps the AI focused on your actual code instead of getting distracted by dependencies and build artifacts.
Context Management Strategy
Think of context like a conversation. If you were explaining a coding problem to a colleague, you’d show them the relevant files, not your entire codebase. Same principle applies here.
Good context: Relevant files, error messages, specific functions you’re working on
Bad context: Your entire project, unrelated files, yesterday’s lunch order
Similarly, when you have long conversations, the context (which is now your entire conversation history) gets too long and the AI tends to lose track of your requirements and previous decisions. You’ll notice this when the AI suggests patterns inconsistent with your existing code or forgets constraints you mentioned earlier.
To avoid this, make it a habit to start new conversations for different features or fixes. This is especially important if you’re moving on to a new task where the context changes.
Beyond giving it the right context, you can also be explicit about what not to touch: “Don’t modify the existing API calls”. This is a form of negative context, telling the AI to work in a certain space but avoid that one spot.
Documentation as context
One of the most powerful but underutilized techniques for improving Cursor’s effectiveness is creating a /docs folder in your project root and populating it with comprehensive markdown documentation.
I store markdown documents of the project plan, feature requirements, database schema, and so on. That way, Cursor can understand not just what my code does, but why it exists and where it’s heading. It can then suggest implementations that align with my broader vision, catch inconsistencies with my planned architecture, and make decisions that fit my project’s specific constraints and goals.
This approach transforms your documentation from static reference material into active guidance that keeps your entire development process aligned with your original vision.
Cursor Rules
Imagine having to explain your coding preferences to a new team member every single time you work together. Cursor Rules solve this problem by letting you establish guidelines that the AI follows automatically, without you having to repeat yourself in every conversation.
Think of rules as a mini-prompt that runs behind the scenes every time you interact with the AI. Instead of saying “use TypeScript” and “add error handling” in every prompt, you can set these as rules once and the AI will remember them forever.
Global Rules vs. Project Rules
User Rules: Apply to every project you work on. Think of these as your personal preferences you bring to any codebase.
Project Rules: Specific to each codebase. These are the rules your team agrees on and ensure consistency across all contributors.
Examples That Work in Practice
For TypeScript projects:
JSON
- Always use TypeScript strict mode- Prefer function declarations over arrow functions for top-level functions- Use meaningful variable names, no single letters except for loops- Add JSDoc comments for complex functions- Handle errors explicitly, don't ignore them
For Python projects:
JSON
- Use type hints for all function parameters and return values- Follow PEP 8 style guidelines and prefer f-strings for formatting- Handle errors with specific exception types, avoid bare except clauses- Write pytest tests for all business logic with descriptive test names- Use Pydantic for data validation and structured models- Include docstrings for public functions using Google style format- Prefer pathlib over os.path and use context managers for resources
For any project:
JSON
- Write tests for all business logic- Use descriptive commit messages- Add comments for complex algorithms- Handle edge cases and error states- Performance matters: avoid unnecessary re-renders and API calls
Use Cursor itself to write your rules. Seriously. Ask it to “Generate a Project Rules file for a TypeScript project that emphasizes clean code, accessibility, and performance.”
The AI knows how to write content that other AIs understand.
Pro Tip: Create different .cursorrules files for different types of projects. Keep a frontend-rules.md, backend-rules.md, and fullstack-rules.md that you can quickly copy into projects.
Communicating With Cursor
Here’s the thing about AI: it’s incredibly smart and surprisingly literal. The difference between getting decent results and getting “how did you do that?!” results often comes down to how you communicate.
Be Specific
As with any AI, the more specific you are, the better the output. Don’t just say, “fix the styling.” Say “Add responsive breakpoints for mobile (320px), tablet (768px), and desktop (1024px+) with proper spacing and typography scaling”.
You don’t need to know the technical details to be specific about the outcome you want. Saying “Optimize this React component by memoizing expensive calculations and reducing re-renders when props haven’t changed” works better than just “Optimize this component” even though you’re not giving it detailed instructions.
Take an Iterative Approach
Start broad, then narrow down:
“Build a todo app with React”
“Add user authentication to this todo app”
“Make the todo items draggable for reordering”
“Add due dates and priority levels”
Each step builds on the previous work. The AI maintains context and creates consistent patterns across features.
Use Screenshots
Take screenshots of:
UIs you want to replicate
Error messages you’re getting
Design mockups from Figma
Code that’s confusing you
Paste them directly into the chat. The AI can read and understand visual information surprisingly well.
Treat it like a coworker
Explain your problem like you’re talking to a colleague:
“I have this React component that’s supposed to update when props change, but it’s not re-rendering. The props are coming from a parent component that fetches data from an API. I think it might be a dependency issue, but I’m not sure.”
This gives the AI context about what you’re trying to do, what’s happening instead, and your initial hypothesis.
The Context Sandwich
Structure complex requests like this:
Context: “I’m building a shopping cart component”
Current state: “It currently shows items and quantities”
Desired outcome: “I want to add coupon code functionality”
Constraints: “It should validate codes against an API and show error messages”
This format gives the AI everything it needs to provide accurate, relevant solutions.
Common Prompting Mistakes
Making Assumptions: Don’t assume the AI knows what “correct” means in your context. Spell it out by describing expected outcomes. “This function should calculate tax but it’s returning undefined. Here’s the expected behavior…”
Trying to do everything at once: When you tell the AI to “Build a complete e-commerce site with authentication, payment processing, inventory management, and admin dashboard” it is definitely going to go off the rails at some point.
Start small and build incrementally. The AI works better with focused requests.
Describing solutions: Describe the problem, not the solution. The AI might suggest better approaches than you initially considered. Instead of “Use Redux to manage this state”, say “I need to share user data between multiple components”
Overloading context: Adding every file in your project to context doesn’t help, it hurts. The AI gets overwhelmed and loses focus. Be selective about what’s actually relevant.
Debugging Your Prompts
Good prompting is a bit of an art. A small change in a prompt can lead to massive changes in the output, so Cursor may often go off-script.
And that’s totally fine. If you catch it doing that, just hit the Stop button and say “Wait, you’re going in the wrong direction. Let me clarify…”
Sometimes it’s better to start a new conversation with a refined prompt than to keep correcting course. When you do this, add constraints like “keep the current component structure” to stop it from going down the same direction.
Good prompting is iterative:
Initial prompt: Get something working
Refinement: “This is close, but change X to Y”
Polish: “Add error handling and improve the user experience”
Test: “Write tests for this functionality”
The Psychology of AI Collaboration
The AI is incredibly capable but not infallible. There’s a small area between treating it like a tool and constraining it too much, and treating it like a coworker and letting it run free. That’s where you want to play.
Always review the code it generates, especially for:
Security-sensitive operations
Performance-critical sections
Business logic validation
Error handling
Don’t just copy-paste the code. Read the AI’s explanations, understand the patterns it uses, and notice the techniques it applies. You’ll gradually internalize better coding practices.
If the AI suggests something that doesn’t feel right, question it. Ask “Why did you choose this approach over alternatives?” or “What are the trade-offs here?”
The AI can explain its reasoning and might reveal considerations you hadn’t thought of. Or it could be flawed because it doesn’t have all the necessary context, and you may be able to correct it.
Putting it all together
Here’s a complete example of effective AI communication:
Context: “I’m building a React app that displays real-time stock prices”
Current state: “I have a component that fetches data every 5 seconds, but it’s causing performance issues”
Specific request: “Optimize this for better performance. I want to update only when prices actually change, handle connection errors gracefully, and allow users to pause/resume updates”
Constraints: “Don’t change the existing API structure, and make sure it works on mobile devices”
This prompt gives the AI everything it needs: context, current state, desired outcome, and constraints. The response will be focused, relevant, and actionable.
Common Pitfalls
Every Cursor user goes through the same learning curve. You start optimistic, hit some walls, wonder if AI coding is overhyped, then suddenly everything clicks. Let’s skip the frustrating middle part by learning from everyone else’s mistakes.
The “Build Everything at Once” Trap
The mistake: Asking for a complete e-commerce platform with authentication, payment processing, inventory management, admin dashboard, and mobile app in a single prompt.
Why it fails: Even the smartest AI gets overwhelmed by massive requests. You’ll get generic, incomplete code that barely works and is impossible to debug.
The fix: Start with the smallest possible version. Build a product catalog first, then add search, then user accounts, then payment processing. Each step builds on solid foundations.
Good progression:
“Create a simple product listing page”
“Add search functionality to filter products”
“Create a shopping cart that stores items”
“Add user registration and login”
“Integrate payment processing”
The Context Chaos Problem
The mistake: Adding every file in your project to the AI’s context because “more information is better.”
Why it fails: Information overload makes the AI lose focus. It’s like trying to have a conversation in a crowded restaurant, too much noise drowns out the important signals.
The fix: Be surgical with context. Only include files that are directly relevant to your current task.
Bad context: Your entire components folder, all utilities, config files, and documentation Good context: The specific component you’re modifying and its immediate dependencies
The “AI Will Figure It Out” Assumption
The mistake: Giving vague instructions and expecting the AI to read your mind about requirements, constraints, and preferences.
Why it fails: The AI is smart, not psychic. “Make this better” could mean anything from performance optimization to visual redesign to code refactoring.
The fix: Be specific about what “better” means in your context.
Vague: “Fix this component” Specific: “This React component re-renders too often when props change. Optimize it using React.memo and useMemo to prevent unnecessary renders.”
The Copy-Paste Syndrome
The mistake: Blindly copying AI-generated code without understanding what it does.
Why it fails: When (not if) something breaks, you’ll have no idea how to fix it. Plus, you miss learning opportunities that make you a better developer.
The fix: Always ask for explanations. “Explain what this code does and why you chose this approach.”
What to do when shit inevitably hits the fan
You may avoid all the pitfalls above and still see the AI go off track. It starts modifying files you didn’t want changed, adds unnecessary complexity, or ignores your constraints.
The first thing you should do is hit the stop button. You can then let it know it’s going in the wrong direction. Even better, start a new conversation with clearer instructions and additional constraints.
Another common pattern is when the AI makes a change, sees an error, tries to fix it, creates a new error, and gets stuck in a cycle of “fixes” that make things worse.
If you see the same type of error being “fixed” multiple times, stop the process and revert to the last working state.
Here are some other warning signs that things are going off track:
It keeps apologizing and starting over
Solutions get more complex instead of simpler
It suggests completely different approaches in each attempt
Error messages persist despite multiple “fixes”
Then use one of the following debugging methods.
The Logging Strategy
When things aren’t working and you can’t figure out why:
Ask the AI to add detailed logging
Run the code and collect the output
Paste the logs back to the AI
Let it analyze what’s actually happening vs. what should happen
Example prompt: “Add console.log statements to track the data flow through this function. I’ll run it and share the output so we can debug together.”
The Rollback and Retry Method
When the AI made changes that broke more than they fixed:
Use Cursor’s built-in history to revert changes
Identify what went wrong in your original prompt
Start a new conversation with better context
Be more specific about constraints and requirements
The “Explain Your Thinking” Technique
When the AI gives you code that seems wrong or overly complex:
“Explain why you chose this approach. What are the trade-offs compared to [simpler alternative]?”
Often the AI has good reasons you didn’t consider. Sometimes it reveals that there’s indeed a simpler way.
The Test-Driven AI Approach
TDD (Test Driven Development) is a common (and standard) practice in web development. However, with vibe coding, it seems like people have forgotten about it.
But, as the saying goes, prevention is better than cure. Following tried and tested practices like TDD will save you a ton of headache and rework.
In fact, with AI, it becomes a superpower. AI can write tests faster than you can think of edge cases, and those tests become a quality guarantee for the generated code.
This single prompt pattern will revolutionize how you build features:
“Write comprehensive tests for [feature] first, then implement the code, then run the tests and iterate until all tests pass.”
Here’s an example prompt for building a new React component:
JSON
"Write tests that verify this component:1. Renders correctly with different props2. Handles user interactions properly3. Manages state changes4. Calls callbacks at the right times5. Handles error states gracefullyThen implement the component to pass all tests."
Watch this workflow in action:
AI writes tests based on your requirements
AI implements code to satisfy the tests
Tests run automatically (with YOLO mode enabled)
AI sees failures and fixes them iteratively
You get working, tested code without writing a single test yourself
Advanced Tips and Tricks
The Bug Finder
Hit Cmd+Shift+P (or Ctrl+Shift+P) and type “bug finder.” This feature compares your changes to the main branch and identifies potential issues you might have introduced.
It’s not perfect, but it catches things like:
Forgot to handle null values
Missing error handling
Inconsistent variable usage
Logic errors in conditional statements
Image Imports
This one sounds fake until you try it. You can literally paste screenshots into Cursor’s chat and it will understand them. Take a screenshot of:
A UI mockup you want to build
An error message you’re getting
A design you want to replicate
Paste it in the chat with your prompt and watch the AI work with visual information. It’s genuinely impressive.
Tab Tab Tab
Cursor’s tab completion doesn’t just complete your current line, it can suggest entire functions, predict what you’re about to write next, and even jump you to related code that needs updating.
The AI analyzes your recent changes and predicts your next move. When it’s right (which is surprisingly often), it feels like magic.
AI Models and Selection Strategy in Cursor
Cursor offers access to the latest generation of AI models, each with distinct strengths and cost profiles that suit different development scenarios.
Claude Sonnet 4 is my current go-to choice for most development tasks. It significantly improves on Sonnet 3.7’s capabilities, achieving a state-of-the-art 72.7% on SWE-bench. Use this for routine development tasks like building React components, writing API endpoints, or implementing standard features.
Claude Opus 4 represents the premium tier for the most challenging problems. It is expensive but pays for itself in time saved when you’re tackling architectural decisions, complex refactoring across multiple files, or debugging particularly stubborn issues.
OpenAI’s o3 is a good premium alternative and particularly strong in coding benchmarks, with the high-effort version achieving 49.3% on SWE-bench and excelling in competitive programming scenarios.
GPT-4o remains a solid and cheaper alternative, especially for multilingual projects or when you need consistent performance across diverse tasks. While it tends to feel more generic compared to Claude’s natural style, it offers reliability and broad capability coverage.
Gemini 2.5 Pro is also one of my favorites as it combines reasoning with coding, leading to much better performance. It is also the cheapest and fastest of models, though I use it primarily for planning out an app.
In most cases, you’ll probably just be using one model for the bulk of your work, like Sonnet 4 of GPT-4o, and you can upgrade to a more expensive model like o3 or Opus 4 for complex tasks.
mCP and Integrations
MCP (Model Context Protocol) connects Cursor to external tools and data sources, turning it into a universal development assistant. Need to debug an issue? Your AI can read browser console logs, take screenshots, and run tests automatically. Want to manage your project? It can create GitHub issues, update Slack channels, and query your database, all through natural conversation.
What MCP is and how it works is out of scope of this already long article, so read my guide here. In this section I’ll explain how to set it up and which servers to use.
Setting Up MCP in Cursor
Getting started with MCP in Cursor involves creating configuration files that tell Cursor which MCP servers to connect to and how to authenticate with them.
For project-specific tools, create a .cursor/mcp.json file in your project directory. This makes MCP servers available only within that specific project (perfect for database connections or project-specific APIs). For tools you want across all projects, add them in your settings.
The configuration uses a simple JSON format. Here’s how to set up the GitHub MCP server:
The MCP ecosystem has exploded with hundreds of available servers, but several have emerged as must-haves for serious development work.
GitHub MCP Server – create issues, manage pull requests, search repositories, and analyze code changes directly within your coding conversation. When debugging, you can ask “what changed in the authentication module recently?” and get immediate insights without leaving your editor.
Slack MCP Server – read channel discussions, post updates about builds or deployments, and even summarize daily standups. This becomes particularly powerful for debugging when team members report issues in Slack. Your AI can read the problem descriptions and immediately start investigating.
PostgreSQL MCP Server gives your AI the ability to inspect schemas and execute read-only queries. You can ask “show me all users who logged in yesterday” or “analyze the performance of this query” and get immediate, accurate results.
Puppeteer MCP Server gives your AI browser automation superpowers. When building web applications, your AI can take screenshots, fill forms, test user flows, and capture console errors automatically. This creates a debugging workflow where you describe a problem and watch your AI reproduce, diagnose, and fix it in real-time.
File System MCP Server seems basic but proves incredibly useful for project management. Your AI can organize files, search across codebases, and manage project structures intelligently. Combined with other servers, it enables workflows like “analyze our React components for unused props and move them to an archive folder.”
Advanced MCP Workflows in Practice
The real power of MCP emerges when multiple servers work together to create sophisticated development workflows. Consider this scenario: you’re building a web application and users report a bug through Slack. Here’s how an MCP-enhanced Cursor session might handle it:
First, the Slack MCP reads the bug report and extracts key details. Then, the GitHub MCP searches for related issues or recent changes that might be relevant. The File System MCP locates the relevant code files, while the PostgreSQL MCP checks if there are database-related aspects to investigate.
Your AI can then use the Puppeteer MCP to reproduce the bug in a browser, capture screenshots showing the problem, examine console errors, and test potential fixes. Finally, it can create a detailed GitHub issue with reproduction steps, propose code changes, and post a summary back to Slack, all through natural conversation with you.
This level of integration transforms debugging from a manual, time-consuming process into an assisted workflow where your AI handles the tedious investigation while you focus on architectural decisions and creative problem-solving.
Custom MCP Server Creation
While the existing ecosystem covers many common needs, building custom MCP servers for company-specific tools often provides the highest value. The process is straightforward enough that a developer can create a basic server in under an hour.
Custom servers excel for internal APIs, proprietary databases, and specialized workflows. For example, a deployment pipeline MCP server could let your AI check build status, trigger deployments, and analyze performance metrics. A customer support MCP server might connect to your ticketing system, allowing AI to help triage issues or generate response templates.
A real-World workflow
Building real applications with Cursor requires a different mindset than traditional development. Instead of diving straight into code, you start by having conversations with your AI assistant about what you want to build.
Let’s say we want to build a project management tool where teams can create projects, assign tasks, and track progress. It’s the kind of application that traditionally takes week, maybe months, to develop, but with Cursor’s AI-assisted approach, we can have a production-ready version in days.
Foundation
Traditional projects start with wireframes and technical specifications. With Cursor, you’d start with Agent mode and a conversation about what you’re trying to build. You describe the basic concept and use the context sandwich method we covered earlier:
Context: “Building a team project management tool” Current state: “Just an idea, need MVP definition” Goal: “Users can create projects, assign tasks, track progress” Constraints: “3-week timeline, needs to scale later”
The AI would break this down into clear MVP features and suggest a technology stack that balances rapid development with future scalability. More importantly, it would design a clean database schema with proper relationships.
Save all of these documents in a folder in your project for the AI to reference later.
Core Features
Start building each feature one by one. Use the test-driven development approach I mentioned earlier, and start small with very specific context.
Connect GitHub and Database MCP servers to let the AI commit code and inspect the database in real-time.
You can even set up a Slack MCP for the AI to update you or read new tickets.
Follow the same pattern for every feature – tasks tracking, user permissions, etc.
Don’t forget to keep testing the product locally. Even with the test-driven approach, the AI might miss things, so ask it to use the logging technique described earlier to help debug potential issues.
Productionizing
As your app gets ready, you may want to start thinking about performance and production-readiness.
Ask the AI to proactively analyze your app for potential failure points and implement comprehensive error handling.
I also often ask it to find areas for refactoring and removing unnecessary code.
For performance optimization, ask the AI to implement lazy loading, database indexing, and caching strategies while explaining the reasoning behind each decision.
Launch and iterate
The monitoring and debugging workflows we covered earlier would prove essential during launch week. The AI would have generated comprehensive logging and performance tracking, so when real users start using your app, you’d have visibility into response times, error rates, and user behavior patterns from day one.
When users request features you hadn’t planned (keyboard shortcuts, bulk operations, calendar integration, etc) the iterative refinement approach combined with MCP would make these additions straightforward.
Each new feature would build naturally on the existing patterns because the AI maintains architectural consistency while MCP servers provide the external context needed for complex integrations.
Your Turn
Hopefully this article demonstrates a fundamentally different approach to software development. Instead of fighting with tools and configurations, you’re collaborating with an AI partner that understands your goals and helps implement them efficiently.
The skills you develop transfer to any technology stack: thinking architecturally, communicating requirements clearly, and iterating based on feedback. Most importantly, you gain confidence to tackle ambitious projects. When implementation details are handled by AI, you can focus on solving interesting problems and building things that matter.
I’d love to support you as you continue on your journey. My blog is filled with detailed guides like this, so sign up below if you want the latest deep dives on AI.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Microsoft CEO Satya Nadella recently declared that “we’ve entered the era of AI agents,” highlighting that AI models are now more capable and efficient thanks to groundbreaking advancements in reasoning and memory.
Google recently announced a whole slew of new agentic tools in their recent I/O conference.
Every major tech company is going all in on agents. 61% of CEOs say competitive advantage depends on who has the most advanced generative AI, and Gartner predicts that by 2028, at least 15% of daily work decisions will be made autonomously by agentic AI.
If you’re an executive trying to understand what this means for your organization, this guide is for you. Let’s dive in.
Understanding Agentic AI and Its Business Implications
Agentic AI refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and using available tools.
Unlike traditional AI that responds to prompts, agentic AI exhibits true “agency”, or the ability to:
Make autonomous decisions, analyze data, adapt, and take action with minimal human input
Use advanced reasoning in their responses, giving users a human-like thought partner
Process and integrate multiple forms of data, such as text, images, and audio
Learn from user behavior, improving over time
When I talk to clients, I often tell them to treat an agent like an AI employee. A well-designed agent can take an existing, manual process, and completely automate it, leading to:
Productivity Gains: A Harvard Business School study showed consultants with access to Gen AI completed tasks 22% faster and 40% better
Decision Speed: Most C-suite leaders spend 40% of their time on routine approvals, like pricing decisions or supplier evaluations, which could be automated
Cost Reduction: Studies reveal that implementation of AI agents has led to over a 15% reduction in compliance costs and a more than 46% increase in revenue for numerous organizations
Strategic Use Cases for Agentic AI
Automating existing processes is the most obvious and low-hanging use case for organizations. Any business process that is manual, time-consuming, and does not require human judgement can and should be automated with an agent.
Customer Experience Transformation
Gartner predicts that agentic AI will autonomously resolve 80% of common customer service issues without human intervention by 2029, leading to a 30% reduction in operational costs:
24/7 Customer Support: AI agents in call centers orchestrate intelligence and automation across multiple activities involved in serving customers, simultaneously analyzing customer sentiment, reviewing order history, accessing company policies and responding to customer needs
Personalized Engagement: AI agents can learn from previous interactions and adapt to individual requirements in real time, enabling greater personalization than ever before
Knowledge Worker Augmentation
A major bottleneck in many corporations is finding the right information at the right time and working with hundreds of documents across multiple platforms:
Document Processing: Dow built an autonomous agent to scan 100,000+ shipping invoices annually for billing inaccuracies, expecting to save millions of dollars in the first year
Sales Automation: Fujitsu’s AI agent boosted sales team productivity by 67% while addressing knowledge gaps and allowing them to build stronger customer relationships
Supply Chain and Operations Automation
The supply chain represents perhaps the most compelling use case for agentic AI, with the global AI in supply chain market projected to reach $157.6 billion by 2033.
Predictive Logistics: AI agents can autonomously optimize the transportation and logistics process by managing vehicle fleets, delivery routes and logistics on a large scale
Inventory Management: AI-powered supply-chain specialists can optimize inventories on the fly in response to fluctuations in real-time demand
Risk Management: AI agents regularly monitor world events like pandemics, political unrest, and economic shifts to assist companies in proactively managing supply chain risks
Product and Service Innovation
Development Acceleration: AI-powered virtual R&D assistants save researchers significant time by finding relevant academic papers, patents, and technical documents from large databases.
Market Intelligence: Teams can gather data, identify trends, build marketing assets, inform research and move products to market faster using natural language prompts that reduce time from hours to seconds.
Process Automation
Every organization has hundreds of internal processes that are manual, time-consuming, and low value. Employees spend hours on these processes, from taking notes to copying data across platforms and creating reports, that could easily be done with AI Agents.
Most of my client work involves taking such processes and fully automating them, allowing employees to focus on higher value work. If you’re interested in this, contact me.
Building the Foundation for Agentic AI
Data Requirements
72% of CEOs say leveraging their organization’s proprietary data is key to unlocking the value of generative AI, yet 50% say their organization has disconnected technology due to the pace of recent investments.
Requirements:
Unified Data Platform: 68% say integrated enterprise-wide data architecture is critical to enable cross-functional collaboration and drive innovation
Data Quality Framework: Ensuring accuracy, completeness, and consistency
Real-time Integration: Breaking down data silos across systems
Security and Governance: Protecting sensitive information while enabling access
Talent Requirements and Organizational Readiness
Current Skills Gap: 46% of leaders identify skill gaps in their workforces as a significant barrier to AI adoption.
Essential Roles for Agentic AI:
AI Ethics Officers: Ensuring fair and transparent operations
Human-AI Collaboration Specialists: Optimizing workflows between humans and AI
AI Trainers: Teaching AI systems nuance, context, and human values
Data Scientists and ML Engineers: Building and maintaining AI systems
Training Imperatives: Nearly half of employees say they want more formal training and believe it is the best way to boost AI adoption.
Process Redesign for Human-AI Collaboration
Governance Frameworks: Only 22% of organizations that have established AI governance councils consistently track metrics related to bias detection, highlighting the need for robust oversight.
Essential Elements:
Clear policies for AI use within the business
Training on AI systems and ethical implications
Processes for evaluating and rejecting AI proposals that conflict with company values
Regular bias detection and compliance monitoring
Implementation Roadmap for Agentic AI
Phase 1: Foundation and Pilot Selection (Months 1-6)
The key to successful agentic AI implementation is starting with a clear strategy rather than jumping into the latest technology. Too many organizations are making the mistake of tool-first thinking when they should be focusing on problem-first approaches.
Begin with a comprehensive AI readiness evaluation. This means honestly assessing your current data quality, infrastructure capabilities, and organizational readiness for change.
When I work with my clients, I often start with surveys to understand the AI literacy of the organization, as well as the tech infrastructure to enable an AI transformation. This data helps us understand what skills or tech gaps we need to fill before moving ahead.
I also identify high-impact, low-risk use cases where you can demonstrate clear business value while learning how these systems work in your environment.
Download my AI Readiness Assessment
These are the same surveys I use with my clients to identify skill gaps and close them.
Phase 2: Pilot Deployment and Learning (Months 6-12)
Deloitte predicts that 25% of companies using generative AI will launch agentic AI pilots or proofs of concept in 2025, growing to 50% in 2027. The organizations that succeed will be those that approach scaling strategically rather than opportunistically.
Start with pilot projects in controlled environments where agentic AI use can be refined, then scale and integrate seamlessly into the bigger picture.
Establish clear human oversight mechanisms, regular performance monitoring, and continuous feedback loops. Most importantly, invest heavily in employee training and support during this phase.
Phase 3: Scaling and Integration (Months 12-24)
Multi-agent orchestration represents the next level of sophistication. Instead of individual AI agents working in isolation, organizations are building systems where multiple agents collaborate to handle complex, multi-step processes.
The key insight is that agentic AI works best when it’s integrated into existing workflows rather than replacing them entirely. The most successful implementations enhance human decision-making rather than eliminating it.
Measuring Impact and ROI
Only 52% of CEOs say their organization is realizing value from generative AI beyond cost reduction. This suggests that many organizations are measuring the wrong things or not measuring comprehensively enough.
Here are some KPIs I recommend measuring to test if your Agents are delivering value:
Productivity Metrics: Time saved, tasks automated, output quality
Financial Impact: Cost reduction, revenue generation, ROI calculations
Employee Satisfaction: Adoption rates, training effectiveness, job satisfaction
CEOs say 31% of the workforce will require retraining or reskilling within three years, and 54% say they’re hiring for roles related to AI that didn’t even exist a year ago.
The workforce of the AI Agent era will need skills like:
AI Literacy: Understanding capabilities, limitations, and ethical implications
Human-AI Collaboration: Working effectively alongside AI agents
Critical Thinking: Validating AI outputs and making strategic decisions
Emotional Intelligence: Areas where humans maintain comparative advantage
Continuous Learning: Adapting to rapidly evolving technology
The half-life of technical skills is shrinking rapidly, and organizations need to create cultures where learning and adaptation are continuous processes rather than occasional events.
Here are some training programs I conduct for clients:
Foundational AI concepts and applications
Hands-on experience with AI tools and platforms
Technical skills for building and managing AI agents
Culture and Change Management Considerations
Here’s an interesting statistic: 73% of executives believe their AI approach is strategic, while only 47% of employees agree. Even more concerning, 31% of employees admit to actions that could be construed as sabotaging AI efforts.
This perception gap is perhaps the biggest obstacle to successful AI transformation. And it means leaders need to build trust and adoption with their teams:
Transparent Communication: Clear explanation of AI’s role and impact
Employee Involvement: Including staff in AI design and implementation
Psychological Safety: Creating environments where concerns can be voiced
Success Stories: Demonstrating AI’s value as augmentation, not replacement
Two-thirds of C-suite executives report that generative AI adoption has led to division and tension within companies. Successful implementation requires:
Leadership commitment and visible support
Clear communication about AI’s role in the organization
Regular feedback and adjustment mechanisms
Recognition and rewards for successful AI adoption
Strategic Priorities and Competitive Implications
Microsoft recently introduced more than 50 announcements spanning its entire product portfolio, all focused on advancing AI agent technologies. Meanwhile, 32% of top executives place AI agents as the top technology trend in data and AI for 2025.
The timeline for competitive advantage is compressed. Organizations beginning their agentic AI journey now will be positioned to lead their industries, while those that delay risk being permanently disadvantaged.
Here’s a sample adoption timeline for 2025:
Q1-Q2 2025: Pilot programs and proof of concepts
Q3-Q4 2025: Limited production deployments
2026-2027: Broad enterprise adoption
2027+: Mature implementations and industry transformation
Strategic Priorities for C-Suite Leaders
1. Make Courage Your Core 64% of CEOs say they’ll have to take more risk than their competition to maintain a competitive advantage. The key is building organizational flexibility and empowering teams to experiment.
2. Embrace AI-Fueled Creative Destruction 68% of CEOs say AI changes aspects of their business that they consider core. Leaders must be willing to fundamentally rethink business models and operations.
3. Ignore FOMO, Lean into ROI 65% of CEOs say they prioritize AI use cases based on ROI. Focus on practical applications that create competitive moats and generate measurable returns.
4. Cultivate a Vibrant Data Environment Invest in unified data architectures that can support autonomous AI operations while maintaining security and governance.
5. Borrow the Talent You Can’t Buy 67% of CEOs say differentiation depends on having the right expertise in the right positions. Build partnerships to access specialized AI capabilities.
Competitive Implications of Early vs. Late Adoption
Early Adopter Advantages:
Market Positioning: Early adopters will gain a substantial advantage—but success requires a strategic and experimental approach
Talent Attraction: Access to top AI talent before market saturation
Data Advantage: More time to accumulate training data and refine models
Customer Relationships: First-mover advantage in AI-enhanced customer experiences
Risks of Late Adoption:
Competitive Disadvantage: 64% of CEOs say the risk of falling behind drives them to invest in some technologies before they have a clear understanding of the value they bring
Higher Implementation Costs: Premium for late-stage adoption
Operational Inefficiency: Competing against AI-optimized operations
Strategic Recommendations:
Start Immediately: Begin with low-risk pilot programs while building foundational capabilities
Invest in Data: Prioritize data quality and integration as the foundation for agentic AI
Build Partnerships: Collaborate with technology providers and consultants to accelerate deployment
Focus on Change Management: Invest heavily in employee training and cultural transformation
Plan for Scale: Design initial implementations with enterprise-wide scaling in mind
Conclusion: The Imperative for Action
The transition to agentic AI represents the most significant technological shift since the advent of the internet. CEOs are often pushing AI adoption faster than some employees are comfortable with, underscoring the need to lead people through the changes.
The window for strategic advantage is narrowing. By 2028, at least 15% of daily work decisions will be made autonomously by agentic AI. Organizations that begin their agentic AI journey now will be positioned to lead their industries, while those that delay risk being left behind.
Key Takeaways for C-Suite Leaders:
Agentic AI is not optional—it’s an inevitability that will reshape competitive landscapes
Success requires holistic transformation—technology, people, processes, and culture must evolve together
Early action is critical—the advantages of being among the first adopters far outweigh the risks
Human-AI collaboration is the goal—augmentation, not replacement, should guide implementation strategies
Continuous learning is essential—both for AI systems and human workers
The question isn’t whether agentic AI will transform your industry, it’s whether your organization will be leading or following that transformation.
If you want to be leading the transformation, book a free consultation call with me. I’ve worked with multiple organizations to lead them through this.
Recent research by McKinsey shows that 31% of the workforce will require retraining or reskilling within the next three years. With companies rushing to become AI-first, I’m not surprised. In fact, I think that number should be higher.
Much like digital literacy became essential in the early 2000s, AI literacy is the new baseline for workforce competence. Organizations that fail to develop AI skills will fall behind competitors who leverage AI to enhance productivity, drive innovation, and deliver superior customer experiences.
This guide offers a comprehensive roadmap for executives seeking to transform their workforce for the AI era. We’ll examine practical strategies for conducting skills gap analyses, developing talent through multiple channels, creating a learning culture, empowering change champions, and addressing AI anxiety.
Each section provides actionable frameworks backed by research and case studies, enabling you to immediately apply these approaches within your organization.
Book a free consultation
If you’re looking for customized training programs for your employees, book a free consultation call with me. I’ve trained dozens of organizations and teams on becoming AI experts.
Section 1: Conducting an AI Skills Gap Analysis
Where Do you Want to be?
Before launching any training initiative, you must first understand the specific AI-related skills your organization requires. When working with my clients, I’ve identified three categories of AI skills that companies need:
Foundational AI Literacy (All Employees)
In my opinion, this is table-stakes. Every employee in your company needs to have basic AI literacy, the same way they need to have basic computer literacy.
Understanding basic AI concepts and terminology
Recognizing appropriate use cases for AI tools
Effective prompt engineering and interaction with AI assistants
Critical evaluation of AI outputs and limitations
Awareness of ethical considerations and responsible AI use
Intermediate AI Skills (Domain Specialists)
As you go deeper into your AI transformation, you’ll want to start automating processes and integrating AI deeper into workflows. This means training a percentage of your workforce on AI automation and AI agents.
Ideally, these are domain specialists who understand the workflows well enough to design automations for them.
Ability to identify automation opportunities within specific workflows
Data preparation and quality assessment
Collaboration with technical teams on AI solution development
Integration of AI tools into existing processes
Performance monitoring and feedback provision
Advanced AI Expertise (Technical Specialists)
Finally, for organizations that are building AI products and features, the following skills are absolutely necessary.
AI ethics implementation and compliance
AI system design and implementation
Model selection, training, and fine-tuning
AI infrastructure management and optimization
Data architecture and governance for AI
Where are you now?
The next step is understanding your organization’s current AI capabilities. When working with clients, I often start with a survey to leadership and employees.
My Leadership Capability Assessment evaluates executive understanding of AI potential and limitations, and assesses their ability to develop and execute AI strategy.
My Workforce Literacy Survey measures baseline understanding of AI concepts across the organization, and assesses comfort levels with AI tools and applications.
For organizations that are building AI products and features, create a Technical Skills Inventory to document existing data science, machine learning, and AI engineering capabilities, map current technical skills against future needs, and identify training needs for different technical roles.
I also recommend an overall Organizational Readiness Assessment to evaluate data infrastructure and governance maturity, assess cross-functional collaboration capabilities, and review change management processes and effectiveness.
At this point, it becomes fairly obvious where the gaps are in where you are right now and where you want to be.
Download my Leadership capability Assessment and workforce literacy survey
Download the exact surveys I use with my clients to measure your organization’s current AI capabilities
Create A development plan
I then create a custom skills development plan to close the gap. Here’s a sample timeline I draw up for clients, although this depends heavily on how fast you move and how big your organization is.
Time Horizon
Priority Skills
Target Audience
Business Impact
0-3 months
AI literacy, foundational concepts, AI tool usage
All employees
Improved AI adoption, reduced resistance
3-6 months
Role-specific AI applications, workflow integration
Department leaders, domain experts
Process optimization, efficiency gains
6-12 months
Advanced AI development, AI system design, AI ethics implementation
Technical specialists, innovation teams
New product/service development, competitive differentiation
12+ months
Emerging AI capabilities, human-AI collaboration, AI governance
Executive leadership, strategic roles
Business model transformation, market leadership
I suggest running the skills gap analysis every quarter and re-evaluating. The pace at which AI is developing requires continuous up-skilling at training in the latest technologies.
Section 2: The Build, Buy, Bot, Borrow Model for AI Talent
As your organization develops its AI capabilities, you’ll need a multi-pronged approach to talent acquisition and development. The “Build, Buy, Bot, Borrow” framework offers a comprehensive strategy for addressing AI talent needs. This model provides flexibility while ensuring you have the right capabilities at the right time.
Building Internal Talent Through Training and Development
Internal talent development should be your cornerstone strategy, as it leverages existing institutional knowledge while adding new capabilities. Develop an organizational learning strategy that includes:
Tiered Learning Programs
Level 1: AI Fundamentals – Basic AI literacy for all employees
Level 2: AI Applications – Role-specific training on using AI tools
Level 3: AI Development – Specialized technical training for selected roles
Level 4: AI Leadership – Strategic AI implementation for executives and managers
Experiential Learning Opportunities
AI hackathons and innovation challenges
Rotation programs with AI-focused teams
Mentorship from AI experts
Applied learning projects with measurable outcomes
Learning Ecosystems
On-demand microlearning resources
Self-paced online courses and certifications
Cohort-based intensive bootcamps
Executive education partnerships
Many organizations are finding that the “build” strategy offers the best long-term return on investment. I’ll dive deeper into how to build AI talent in later sections.
Strategic Hiring for Specialized AI Roles
Despite your best efforts to build internal talent, some specialized AI capabilities may need to be acquired through strategic hiring. This includes AI/ML engineers, data scientists, and AI integration specialists.
To develop an effective hiring strategy for AI roles:
Focus on specialized competencies rather than general AI knowledge
Identify the specific AI capabilities required for your business objectives (from skills gap above)
Create detailed skill profiles for each specialized role
Develop targeted assessment methods to evaluate candidates
Look beyond traditional sources of talent
Partner with universities and research institutions with strong AI programs
Engage with AI communities and open-source projects
Consider talent from adjacent fields with transferable skills
Create an AI-friendly work environment
Provide access to high-performance computing resources
Establish clear AI ethics and governance frameworks
Support ongoing professional development in rapidly evolving AI domains
Build a culture that values AI innovation and experimentation
Develop competitive compensation strategies
Create flexible compensation packages that reflect the premium value of AI expertise
Consider equity or profit-sharing for roles that directly impact business outcomes
Offer unique perks valued by the AI community, such as conference attendance or research time
Using AI to Augment Existing Workforce Capabilities
The “bot” aspect of the framework involves strategic deployment of AI tools, automations, and agents to amplify the capabilities of your existing workforce. This approach offers several advantages:
AI agents can handle routine tasks, freeing employees to focus on higher-value work
AI tools can provide just-in-time knowledge, enabling employees to access specialized information when needed
AI can augment decision-making, helping employees make more informed choices
Implement these strategies to effectively leverage AI for workforce augmentation:
AI Agents
Map existing processes to identify routine, time-consuming tasks suitable for AI automation
Deploy AI agents for common tasks like scheduling, report generation, and data summarization
Create seamless handoffs between AI and human components of workflows
Knowledge Augmentation
Implement AI-powered knowledge bases that can answer domain-specific questions
Deploy contextual AI assistants that provide relevant information during decision-making processes
Create AI-guided learning paths that help employees develop new skills
Decision Support
Develop AI models that can analyze complex data and provide recommendations
Implement scenario-planning tools that help employees visualize potential outcomes
Create AI-powered dashboards that provide real-time insights into business performance
I highly recommend developing AI automations and agents in parallel with employee up-skilling programs. Low hanging automations can be deployed in weeks and provide immediate benefits.
This is why so many major tech companies are going all in on agents and have paused hiring. If you’re interested in how to find opportunities to do this in your organization and design effective agents, read my guide here.
Borrowing Talent through Strategic Partnerships
The final component of the talent strategy involves “borrowing” specialized AI capabilities through strategic partnerships. This approach is particularly valuable for accessing scarce expertise or handling short-term needs.
Strategic Vendor Relationships
Evaluate AI platform providers based on their domain expertise, not just their technology
Develop deep partnerships with key vendors that include knowledge transfer components
Create joint innovation initiatives with strategic technology partners
Consulting and Professional Services
Engage specialized AI consultants for specific, high-value projects
Use professional services firms to accelerate implementation of AI initiatives
Partner with boutique AI firms that have deep expertise in your industry
Academic and Research Partnerships
Collaborate with university research labs on cutting-edge AI applications
Sponsor academic research in areas aligned with your strategic priorities
Participate in industry consortia focused on AI standards and best practices
Talent Exchanges
Create temporary talent exchange programs with non-competing organizations
Develop rotational programs with technology partners
Participate in open innovation challenges to access diverse talent pools
The borrowed talent approach offers several advantages:
Access to specialized expertise that would be difficult or expensive to develop internally
Flexibility to scale AI capabilities up or down based on business needs
Exposure to diverse perspectives and industry best practices
Reduced risk in exploring emerging AI technologies
By strategically combining the build, buy, bot, and borrow approaches, organizations can develop a comprehensive AI talent strategy that provides both depth in critical areas and breadth across the organization.
Download my Leadership capability Assessment and workforce literacy survey
Download the exact surveys I use with my clients to measure your organization’s current AI capabilities
Section 3: Creating an AI Learning Culture
Let’s dive into how you can up-skill employees and build AI talent internally, as I mentioned above.
AI training cannot follow a one-size-fits-all approach. Different roles require different types and levels of AI knowledge and skills. From my client work, I have identified three primary audience segments:
Executive Leadership
Focus Areas: Strategic AI applications, ethical considerations, governance, ROI measurement
Format Preferences: Executive briefings, peer discussions, case studies
Key Outcomes: Ability to set AI strategy, evaluate AI investments, and lead organizational change
Managers and Team Leaders
Focus Areas: Identifying AI use cases, managing AI-enabled teams, process redesign
Format Preferences: Applied workshops, collaborative problem-solving, peer learning
Key Outcomes: Ability to identify AI opportunities, guide implementation, and support team adoption
Individual Contributors
Focus Areas: Hands-on AI tools, domain-specific applications, ethical use of AI
Format Preferences: Interactive tutorials, practical exercises, on-the-job application
Key Outcomes: Proficiency with relevant AI tools, ability to integrate AI into daily workflows
For each segment, design targeted learning experiences that address their specific needs and preferences. Here’s an example of what I recommend to clients:
Level
Executive Leadership
Managers / Team Leaders
Individual Contributors
Basic
AI Strategy Overview (2 hours)
AI for Team Leaders (2 hours)
AI Fundamentals (2 hours)
Intermediate
AI Governance Workshop (2 hours)
AI Use Case Design (4 hours)
AI Tools Bootcamp (8 hours)
Advanced
AI Investment Roundtable (2 hours)
AI-Enabled Transformation (8 hours)
Domain-Specific AI Training (8 hours)
But AI training does not stop there. AI is always evolving so a one-time training program is insufficient. Many organizations struggle with the pace of changes in AI, with capabilities evolving faster than organizations can adapt.
This means you need to foster a continuous learning mindset:
Leadership Modeling
Executives should openly share their own AI learning journeys
Leaders should participate in AI training alongside team members
Management should recognize and reward ongoing skill development
Learning Infrastructure
Create dedicated time for AI learning (e.g., “Learning Fridays”)
Develop peer learning communities around AI topics
Establish AI learning hubs that curate and share relevant resources
Growth Mindset Development
Promote the belief that AI capabilities can be developed through effort
Encourage experimentation and learning from failures
Recognize improvement and progress, not just achievement
I’ve found it’s a lot easier to create and maintain an AI learning culture when there are champions and go-to experts in the organization driving this culture.
I often advise clients to identify these AI champions and empower them by creating AI leadership roles, providing them with advanced training and resources, and creating a clear mandate that defines their responsibility for driving AI adoption.
These AI champions should be included in AI strategy development, use case and implementation approaches, and vendor selection and evaluation processes.
Other ways to sustain this learning culture and increase AI adoption that have worked well for my clients are:
Incentivizing AI adoption through recognition programs, and financial incentives
Creating mentorship programs and group learning cohorts within the company
Establish communities based on specific business functions (marketing AI, HR AI, etc.)
Implement hackathons and innovation challenges
Create knowledge repositories for AI use cases and lessons learned
Section 4: Addressing AI Anxiety and Resistance
Despite growing enthusiasm for AI, 41% of employees remain apprehensive about its implementation. Understanding these concerns is essential for effective intervention.
Key factors driving AI anxiety include:
Fear of Job Displacement – Concerns about automation replacing human rolesand uncertainty about future career paths
Security and Privacy Concerns – Worries about data protection and cybersecurity risks
Performance and Reliability Issues – Skepticism about AI accuracy and reliability and fears of over-reliance on imperfect systems
Skills and Competency Gaps – Concerns about keeping pace with change
One of the most effective ways to allay these fears is to demonstrate how the technology augments human capabilities rather than replacing them. This approach shifts the narrative from job displacement to job enhancement.
Pilot Projects with Visible Benefits
Implement AI solutions that address known pain points
Focus initial applications on automating tedious, low-value tasks
Showcase how AI frees up time for more meaningful work
Skills Enhancement Programs
Develop training that shows how AI can enhance professional capabilities
Create clear pathways for employees to develop new, AI-complementary skills
Emphasize the increased value of human judgment and creativity in an AI-enabled environment
Role Evolution Roadmaps
Work with employees to envision how their roles will evolve with AI
Create transition plans that map current skills to future requirements
Provide examples of how similar roles have been enhanced by AI in other organizations
Shared Success Metrics
Develop metrics that track both AI performance and human success
Share how AI implementation impacts team and individual objectives
Create incentives that reward effective human-AI collaboration
A common pitfall is focusing too narrowly on productivity gains. The McKinsey report notes that “If CEOs only talk about productivity they’ve lost the plot,” suggesting that organizations should emphasize broader benefits like improved customer experience, new growth opportunities, and enhanced decision-making.
Conclusion: Implementing an Enterprise-Wide Upskilling Initiative
Timeline for Implementation
Creating an AI-ready workforce requires a structured, phased approach. Here’s a sample timeline I’ve implemented for my clients:
Phase 1: Assessment and Planning (1 months)
Conduct an AI skills gap analysis across the organization
Develop a comprehensive upskilling strategy aligned with business objectives
Build executive sponsorship and secure necessary resources
Establish baseline metrics for measuring progress
Phase 2: Infrastructure and Pilot Programs (2-3 months)
Identify and train initial AI champions across departments
Launch pilot training programs with high-potential teams
Collect feedback and refine approach based on early learnings
Phase 3: Scaled Implementation (3-6 months)
Roll out tiered training programs across the organization
Activate formal mentorship programs and communities of practice
Implement recognition systems for AI skill development
Begin integration of AI skills into performance management processes
Phase 4: Sustainability and Evolution (6+ months)
Establish continuous learning mechanisms for emerging AI capabilities
Develop advanced specialization tracks for technical experts
Create innovation programs to apply AI skills to business challenges
Regularly refresh content and approaches based on technological evolution
This phased approach allows organizations to learn and adapt as they go, starting with focused efforts and expanding based on successful outcomes. The timeline above is very aggressive and may need adjustment based on organizational size, industry complexity, and the current state of AI readiness.
Key Performance Indicators for Measuring Workforce Readiness
To evaluate the effectiveness of AI upskilling initiatives, organizations should establish a balanced set of metrics that capture both learning outcomes and business impact. Based on my client work, I’ve found that KPIs should include:
Learning and Adoption Metrics
Percentage of employees completing AI training by role/level
AI tool adoption rates across departments
Number of AI use cases identified and implemented by teams
Employee self-reported confidence with AI tools
Operational Metrics
Productivity improvements in AI-augmented workflows
Reduction in time spent on routine tasks
Quality improvements in AI-assisted processes
Decrease in AI-related support requests over time
Business Impact Metrics
Revenue generated from AI-enabled products or services
Cost savings from AI-enabled process improvements
Customer experience improvements from AI implementation
Innovation metrics (number of new AI-enabled offerings)
Cultural and Organizational Metrics
Employee sentiment toward AI (measured through surveys)
Retention rates for employees with AI skills
Internal mobility of employees with AI expertise
Percentage of roles with updated AI skill requirements
Organizations should establish baseline measurements before launching upskilling initiatives and track progress at regular intervals.
Long-term Talent Strategy Considerations
As organizations look beyond immediate upskilling needs, several strategic considerations emerge for long-term AI talent management:
Evolving Skill Requirements
Regularly reassess AI skill requirements as technology evolves
Develop capabilities to forecast emerging skills needs
Create flexible learning systems that can quickly incorporate new content
Talent Acquisition Strategy
Redefine job descriptions and requirements to attract AI-savvy talent
Develop AI skills assessment methods for hiring processes
Create compelling employee value propositions for technical talent
Career Path Evolution
Design new career paths that incorporate AI expertise
Create advancement opportunities for AI specialists
Develop hybrid roles that combine domain expertise with AI capabilities
Organizational Structure Adaptation
Evaluate how AI impacts traditional reporting relationships
Consider new organizational models that optimize human-AI collaboration
Develop governance structures for AI development and deployment
Cultural Transformation
Foster a culture that values continuous learning and adaptation
Promote cross-functional collaboration around AI initiatives
Build ethical frameworks for responsible AI use
Final Thoughts
AI is going to shock the system in an even bigger way than computers or the internet. So creating an AI-ready workforce requires a comprehensive organizational transformation.
By conducting thorough skills gap analyses, implementing the “build, buy, bot, borrow” model for talent development, creating a continuous learning culture, and addressing AI anxiety with empathy and transparency, organizations can position themselves for success in the AI era.
I’ve worked with dozens of organizations to help them with this. Book me for a free consultation call and I can help you too.
At the start of this year, Jensen Huang, CEO of Nvidia, said 2025 will be the year of the AI agent. Many high-profile companies like Shopify and Duolingo have reinvented themselves with AI at it score, building internal systems and agents to automate processes and reduce headcount.
I spent the last 3 years running a Venture Studio that built startups with AI at the core. Prior to that I built one of the first AI companies on GPT-3. And now I consult for companies on AI implementation. Whether you’re a business leader looking to automate complex workflows or an engineer figuring out the nuts and bolts, this guide contains the entire process I use with my clients.
The purpose of this guide is to help you identify where agents will be useful in your organization, and how to design them to produce real business results. Much like you design a product before building it, this should be your first starting point before building an agent.
Let us begin.
PS – I’ve put together a 5-day email course where I walk through designing and implementing a live AI agent using no-code tools. Sign up below.
What Makes a System an “Agent”?
No, that automation you built with Zapier is not an AI agent. Neither is the chatbot you have on your website.
An AI agent is a system that independently accomplishes tasks on your behalf with minimal supervision. Unlike passive systems that just respond to queries or execute simple commands, agents proactively make decisions and take actions to accomplish goals.
Think of it like a human intern or an analyst. It can do what they can, except get you coffee.
How do they do this? There are 4 main components to an AI agent – the model, the instructions, the tools, and the memory. We’ll go into more detail later on, but here’s a quick visual on how they work.
The model is the core component. This is an AI model like GPT, Claude, Gemini or whatever, and it starts when it is invoked or triggered by some action.
Some agents get triggered by a chat or phone call. You’ve probably come across these. Others get triggered when a button is clicked or a form is submitted. Some even get triggered through a cron job at regular intervals, or an API call from another app.
For example, this content creation agent I built for a VC fund gets triggered when a new investment memo is uploaded to a form.
When triggered, the model uses the instructions it has been given to figure out what to do. In this case, the instructions tell it to analyze the memo, research the company, remove sensitive data, and convert it into a blog post.
To do this, the agent has access to tools such as a web scraper that finds information about the company. It loops through these tools and finally produces a blog post, using its memory of the fund’s past content to write in their tone and voice.
You can see how this is different from a regular automation, where you define every step. Even if you use AI in your automation, it’s one step in a sequence. With an agent, the AI forms the central component, decides which steps to performs, and then loops through them until the job is done.
We’ll cover how to structure these components and create that loop later. But first…
Do You really need an AI agent?
Most of the things you want automated don’t really need an AI agent. You can trigger email followups, schedule content, and more through basic automation tools.
Rule of thumb, if a process can be fully captured in a flowchart with no ambiguity or judgment calls, traditional automation is likely more efficient and far more cost-effective.
I also generally advise against building AI agents for high-stakes decisions where an error could be extremely costly, or there’s a legal requirement to provide explainability and transparency.
When you exclude processes that are too simple or too risky, you’re left with good candidates for AI Agents. These tend to be:
Processes where you have multiple variables, shifting context, plenty of edge cases, or decision criteria that can’t be captured with rules. Customer refund approvals are a good example.
Processes that resemble a tangled web of if-then statements with frequent exceptions and special cases, like vendor security reviews.
Processes that involve significant amounts of unstructured data, like natural language understanding, reading documents, analyzing text or images, and so on. Insurance claims processing is a good example.
A VC fund I worked with wanted to automate some of their processes. We excluded simple ones like pitch deck submission (can be done through a Typeform with CRM integration), and high-stakes ones like making the actual investment decisions.
We then built AI agents to automate the rest, like a Due Diligence Agent (research companies, founders, markets, and competition, to build a thorough investment memo) and the content generation agent I mentioned earlier.
Practical Identification Process
To systematically identify agent opportunities in your organization, follow this process:
Catalog existing processes
Document current workflows, especially those with manual steps
Note pain points, bottlenecks, and error-prone activities
Identify processes with high volume or strategic importance
Evaluate against the criteria above
Score each process on complexity, reasoning requirements, tool access, etc.
Eliminate clear mismatches (too simple, too risky, etc.)
Prioritize high-potential candidates
Assess feasibility
Review available data and system integrations
Evaluate current documentation and process definitions
Consider organizational readiness and potential resistance
Build capabilities and confidence with each implementation
Remember that the best agent implementations often start with a clear problem to solve rather than a technology looking for an application.
Contact me if you need help with this
I offer free process audits to help companies identify where they can build agents and reduce wasted time. Book a time with me here.
Agent Architecture & Design Principles
Remember that loop I mentioned earlier? That’s our architecture pattern and will tell us how to select and connect the 4 core components of our agent.
Simple “Loop and Fetch” Architecture
The most basic agent architecture follows a straightforward loop:
Receive input (from a user or another system or a cron job)
Process the input using an AI model (with guidance from instructions)
Determine the next action (respond directly or call a tool)
Execute the action (use memory if needed)
Observe the result (check against instructions)
Loop back to step 2
This pattern works well for simpler agents with limited tool sets and straightforward workflows. It’s easy to implement and reason about, making it a good starting point for many projects and it’s the one I used for the content agent I mentioned.
Here’s a conceptual example:
Plaintext
function runAgent(input, context) { while (true) { // Process input with LLM const llmResponse = model.process(input, context) // Check if the LLM wants to use a tool if (llmResponse.hasTool) { // Execute the tool const toolResult = executeTool(llmResponse.tool, llmResponse.parameters) // Update context with the result context.addToolResult(toolResult) // Continue the loop with the tool result as new input input = toolResult } else { // No tool needed, return the response return llmResponse.message } }}
ReAct-Style Reasoning Frameworks
ReAct (Reasoning and Acting) frameworks enhance the basic loop with more explicit reasoning steps. Rather than immediately jumping to actions, the agent follows a more deliberate process:
Thought: Reason about the current state and goal
Action: Decide on a specific action to take
Observation: Observe the result of the action
Repeat: Continue this cycle until the goal is achieved
The key difference between this and the simple loop is the agent thinks explicitly about each step, making its reasoning more transparent and often leading to better decision-making for complex tasks. This is the architecture often used in research agents, like the Deep Research feature in Gemini and ChatGPT.
I custom-built this for a SaaS client that was spending a lot of time on research for their long-form blog content –
Hierarchical Planning Structures
For more complex workflows, hierarchical planning separates high-level strategy from tactical execution:
A top-level planner breaks down the overall goal into major steps
Each step might be further decomposed into smaller tasks
Execution happens at the lowest level of the hierarchy
Results flow back up, potentially triggering replanning
This architecture excels at managing complex, multi-stage workflows where different levels of abstraction are helpful. For example, a document processing agent might:
At the highest level, plan to extract information, verify it, and generate a report
At the middle level, break “extract information” into steps for each document section
At the lowest level, execute specific extraction tasks on individual paragraphs
Memory-Augmented Frameworks
Memory-augmented architectures extend basic agents with sophisticated memory systems:
Before processing input, the agent retrieves relevant information from memory
The retrieved context enriches the agent’s reasoning
After completing an action, the agent updates its memory with new information
This approach is particularly valuable for:
Personalized agents that adapt to individual users over time
Knowledge-intensive tasks where retrieval of relevant information is critical
Interactions that benefit from historical context
Multi-Agent Cooperative Systems
Sometimes the most effective approach involves multiple specialized agents working together:
A coordinator agent breaks down the overall task
Specialized agents handle different aspects of the workflow
Results are aggregated and synthesized
The coordinator determines next steps or delivers final outputs
This architecture works well when different parts of a workflow require substantially different capabilities or tool sets. For example, a customer service system might employ:
A documentation agent to retrieve relevant resources
A triage agent to understand initial requests
A technical support agent for product issues
A billing specialist for financial matters
If this is your first agent, I suggest starting with the simple loop architecture. I find it helps to sketch out the process, starting with what triggers our agent, what the instructions should be, what tools it has access to, if it needs memory, and what the final output looks like.
I show you how to implement this in my 5-day Challenge.
Core Components of Effective Agents
As I said earlier, every effective agent, regardless of implementation details, consists of four fundamental layers:
1. The Model Layer: The “Brain”
This is the large language models that provide the reasoning and decision-making capabilities. These models:
Process and understand natural language inputs
Generate coherent and contextually appropriate responses
Apply complex reasoning to solve problems
Make decisions about what actions to take next
Different agents may use different models or even multiple models for different aspects of their workflow. A customer service agent might use a smaller, faster model for initial triage and a more powerful model for complex problem-solving.
2. The Tool Layer: The “Hands”
Tools extend an agent’s capabilities by connecting it to external systems and data sources. These might include:
Data tools: Database queries, knowledge base searches, document retrieval
Orchestration tools: Coordination with other agents or services
Tools are the difference between an agent that can only talk about doing something and one that can actually get things done.
3. The Instruction Layer: The “Rulebook”
Instructions and guardrails define how an agent behaves and the boundaries within which it operates. This includes:
Task-specific guidelines and procedures
Ethical constraints and safety measures
Error handling protocols
User preference settings
Clear instructions reduce ambiguity and improve agent decision-making, resulting in smoother workflow execution and fewer errors. Without proper instructions, even the most sophisticated model with the best tools will struggle to deliver consistent results.
4. Memory Systems: The “Experience”
Memory is crucial for agents that maintain context over time:
Short-term memory: Tracking the current state of a conversation or task
Long-term memory: Recording persistent information about users, past interactions, or domain knowledge
Memory enables agents to learn from experience, avoid repeating mistakes, and provide personalized service based on historical context.
The next few sections covers the strategy behind these components, plus two additional considerations – guardrails, and error handling.
Model Selection Strategy
Not every task requires the most advanced (and expensive) model available. You need to balance capability, cost, and latency requirements for your specific use case.
Capability Assessment
Different models have different strengths. When evaluating models for your agent:
Start with baseline requirements:
Understanding complex instructions
Multi-step reasoning capabilities
Contextual awareness
Tool usage proficiency
Consider specialized capabilities needed:
Code generation and analysis
Mathematical reasoning
Multi-lingual support
Domain-specific knowledge
Assess the complexity of your tasks:
Simple classification or routing might work with smaller models
Complex decision-making typically requires more advanced models
Multi-step reasoning benefits from models with stronger planning abilities
For example, a customer service triage agent might effectively use a smaller model to categorize incoming requests, while a coding agent working on complex refactoring tasks would benefit from a more sophisticated model with strong reasoning capabilities and code understanding.
Creating a Performance Baseline
A proven approach is to begin with the most capable model available to establish a performance baseline:
Start high: Build your initial prototype with the most advanced model
Define clear metrics: Establish concrete measures of success
Test thoroughly: Validate performance across a range of typical scenarios
Document the baseline: Record performance benchmarks for comparison
This baseline represents the upper limit of what’s currently possible and provides a reference point for evaluating tradeoffs with smaller or more specialized models.
Optimization Strategy
Once you’ve established your baseline, you can optimize by testing smaller, faster, or less expensive models:
Identify candidate models: Select models with progressively lower capability/cost profiles
Comparative testing: Evaluate each candidate against your benchmark test set
Analyze performance gaps: Determine where and why performance differs
Make informed decisions: Choose the simplest model that meets your requirements
This methodical approach helps you find the optimal balance between performance and efficiency without prematurely limiting your agent’s capabilities.
Multi-Model Architecture
For complex workflows, consider using different models for different tasks within the same agent system:
Smaller, faster models for routine tasks (classification, simple responses)
Medium-sized models for standard interactions and decisions
Larger, more capable models for complex reasoning, planning, or specialized tasks
For example, an agent might use a smaller model for initial user intent classification, then invoke a larger model only when it encounters complex requests requiring sophisticated reasoning.
This tiered approach can significantly reduce average costs and latency while maintaining high-quality results for challenging tasks.
My Default Models
I find myself defaulting to a handful of models, at least when starting out, before optimizing the agent:
Reasoning – OpenAI o3 or Gemini 2.5 Pro
Data Analysis – Gemini 2.5 Flash
Image Generation – GPT 4o
Code Generation – Gemini 2.5 Pro
Content Generation – Claude 3.7 Sonnet
Triage – GPT 3.5 Turbo or Gemini 2.0 Flash-Lite (hey I don’t make the names ok)
Every model provider has a Playground where you can test the models. Start there if you’re not sure which one to pick.
Tool Definition Best Practices
Tools extend your agent’s capabilities by connecting it to external systems and data sources. Well-designed tools are clear, reliable, and reusable across multiple agents.
Tool Categories and Planning
When planning your agent’s tool set, consider the three main categories of tools it might need:
Data Tools: Enable agents to retrieve context and information
Database queries – Eg: find a user’s profile information
Document retrieval – Eg: get the latest campaign plan
Search capabilities – Eg: search through emails
Knowledge base access – Eg: Find the refund policy
Action Tools: Allow agents to interact with systems and take actions
Sending messages – Eg: send a Slack alert
Updating records – Eg: change the user’s profile
Creating content – Eg: generate an image
Managing resources – Eg: give access to some other tool
Initiating processes – Eg: Trigger another process or automation
Orchestration Tools: Connect agents to other agents or specialized services
Expert consultations – Eg: connect to a fine-tuned medical model
Specialized analysis – Eg: handoff to a reasoning model for data analysis
Delegated sub-tasks – Eg: Handoff to a content generation agent
A well-rounded agent typically needs tools from multiple categories to handle complex workflows effectively.
Designing Effective Tool Interfaces
Tool design has a significant impact on your agent’s ability to use them correctly. Follow these guidelines:
Clear naming: Use descriptive, task-oriented names that indicate exactly what the tool does
Comprehensive descriptions: Provide detailed documentation about:
The tool’s purpose and when to use it
Required parameters and their formats
Expected outputs and potential errors
Limitations or constraints to be aware of
Focused functionality: Each tool should do one thing and do it well
Prefer multiple specialized tools over single complex tools
Maintain a clear separation of concerns
Simplify parameter requirements for each individual tool
Consistent patterns: Apply consistent conventions across your tool set
Standardize parameter naming and formats
Use similar patterns for related tools
Maintain consistent error handling and response structures
Here’s an example of a well-defined tool:
Plaintext
@function_tooldef search_customer_orders(customer_id: str, status: Optional[str] = None, start_date: Optional[str] = None, end_date: Optional[str] = None) -> List[Order]: """ Search for a customer's orders with optional filtering. Parameters: - customer_id: The unique identifier for the customer (required) - status: Optional filter for order status ('pending', 'shipped', 'delivered', 'cancelled') - start_date: Optional start date for filtering orders (format: YYYY-MM-DD) - end_date: Optional end date for filtering orders (format: YYYY-MM-DD) Returns: A list of order objects matching the criteria, each containing: - order_id: Unique order identifier - date: Date the order was placed - items: List of items in the order - total: Order total amount - status: Current order status Example usage: search_customer_orders("CUST123", status="shipped") search_customer_orders("CUST123", start_date="2023-01-01", end_date="2023-01-31") """ # Implementation details here
Crafting Effective Instructions
Instructions form the foundation of agent behavior. They define goals, constraints, and expectations, guiding how the agent approaches tasks and makes decisions.
Effective instructions (aka prompt engineering) follow these core principles:
Clarity over brevity: Be explicit rather than assuming the model will infer your intent
Structure over freeform: Organize instructions in logical sections with clear headings
Examples over rules: Demonstrate desired behaviors through concrete examples
Specificity over generality: Address common edge cases and failure modes directly
All of this is to say, the more precise and detailed you can be with instructions, the better. It’s like creating a SOP for an executive assistant.
In fact, I often start with existing documentation and resources like operating procedures, sales or support scripts, policy documents, and knowledge base articles when creating instructions for agents in business contexts.
I’ll turn them into LLM-friendly instructions with clear actions, decision criteria, and expected outputs.
For example, converting a customer refund policy into agent instructions might look like this:
Original policy: “Refunds may be processed for items returned within 30 days of purchase with a valid receipt. Items showing signs of use may receive partial refunds at manager discretion. Special order items are non-refundable.”
Agent-friendly instructions:
Plaintext
When processing a refund request:1. Verify return eligibility: - Check if the return is within 30 days of purchase - Confirm the customer has a valid receipt - Determine if the item is a special order (check the "special_order" flag in the order details)2. Assess item condition: - If the item is unopened and in original packaging, proceed with full refund - If the item shows signs of use or opened packaging, classify as "partial refund candidate" - If the item is damaged beyond normal use, classify as "potential warranty claim"3. Determine refund amount: - For eligible returns in new condition: Issue 100% refund of purchase price - For "partial refund candidates": Issue 75% refund if within 14 days, 50% if 15-30 days - For special order items: Explain these are non-refundable per policy - For potential warranty claims: Direct to warranty process4. Process the refund: - For amounts under $50: Process automatically - For amounts $50-$200: Request supervisor review if partial refund - For amounts over $200: Escalate to manager
You’re not going to get this right on the first shot. Instead, it is an iterative process:
Start with draft instructions based on existing documentation
Test with realistic scenarios to identify gaps or unclear areas
Observe agent behavior and note any deviations from expected actions
Refine instructions to address observed issues by adding in edge cases or missing information
Repeat until performance meets requirements
I cover these concepts in my 5-Day AI Agent Challenge. Sign up here.
Memory Systems Implementation
Effective memory implementation is crucial for agents that maintain context over time or learn from experience.
Short-term memory handles the immediate context of the current interaction:
Conversation history: Recent exchanges between user and agent
Current task state: The agent’s progress on the active task
Working information: Temporary data needed for the current interaction
For most agents, this context is maintained within the conversation window, though you may need to implement summarization or pruning strategies as conversations grow longer.
Long-term memory preserves information across sessions:
User profiles: Preferences, history, and specific needs
Learned patterns: Recurring issues or successful approaches
Domain knowledge: Accumulated expertise and background information
Whatever method you use to store memory, you need a smart retrieval mechanism because you’re going to be adding all that data to the context window of your agent’s core model or tools:
Relevance filtering: Surface only information pertinent to the current context
Recency weighting: Prioritize recent information when appropriate
Semantic search: Find conceptually related information even with different wording
Hierarchical retrieval: Start with general context and add details as needed
Well-designed retrieval keeps memory useful without overwhelming the agent with irrelevant information or taking up space in the context window.
Privacy and Data Management
Ensuring your agent can’t mishandle data, access the wrong type of data, or reveal data to users is extremely important for obvious reasons. I could write a whole blog post about this.
In most cases, having really good tool design, plus guardrails and safety mechanisms (next section) ensures privacy and data, but here are some things to think about:
Retention policies: Define how long different types of information should be kept
Anonymization: Remove identifying details when full identity isn’t needed
Access controls: Limit who (or what) can access stored information
User control: Give users visibility into what’s stored and how it’s used
Guardrails and Safety Mechanisms
Even the best-designed agents need guardrails to ensure they operate safely and appropriately. Guardrails are protective mechanisms that define boundaries, prevent harmful actions, and ensure the agent behaves as expected.
A good strategy takes a layered approach, so if one layer fails, others can still prevent potential issues. Start with setting clear boundaries while defining the agent’s instructions in the previous section.
You can then add some input validation to process user requests to the agent and identify if it’s out of scope or potentially harmful (like a jailbreak).
Python
@input_guardraildefsafety_guardrail(ctx, agent, input):# Check input against safety classifier safety_result = safety_classifier.classify(input)if safety_result.is_unsafe:# Return a predefined response instead of processing the inputreturn GuardrailFunctionOutput(output="I'm not able to respond to that type of request. Is there something else I can help you with?",tripwire_triggered=True )# Input is safe, continue normal processingreturn GuardrailFunctionOutput(tripwire_triggered=False )
Output guardrails verify the agent’s responses before they reach the user, to flag PII (personally identifiable information) or inappropriate content:
Python
@output_guardraildefpii_filter_guardrail(ctx, agent, output):# Check for PII in the output pii_result = pii_detector.scan(output)if pii_result.has_pii:# Redact PII from the output redacted_output = pii_detector.redact(output)return GuardrailFunctionOutput(output=redacted_output,tripwire_triggered=True )# Output is cleanreturn GuardrailFunctionOutput(tripwire_triggered=False )
Also ensure you have guardrails on tool usage, especially if these tools are used to change data, trigger a critical process, or something that requires permissions or approvals.
Python
@output_guardraildefpii_filter_guardrail(ctx, agent, output):# Check for PII in the output pii_result = pii_detector.scan(output)if pii_result.has_pii:# Redact PII from the output redacted_output = pii_detector.redact(output)return GuardrailFunctionOutput(output=redacted_output,tripwire_triggered=True )# Output is cleanreturn GuardrailFunctionOutput(tripwire_triggered=False )
Human-in-the-Loop Integration
I always recommend a human-in-the-loop to my clients, especialy for high-risk operations. Here are some ways to build that in:
Feedback integration: Incorporate human feedback to improve agent behavior
Approval workflows: Route certain actions for human review before execution
Sampling for quality: Review a percentage of agent interactions for quality control
Escalation paths: Define clear processes for when and how to involve humans
Error Handling and Recovery
Even the best agents will encounter errors and unexpected situations. When you test your agent, first identify and isolate where the error is coming from:
Input errors: Problems with user requests (ambiguity, incompleteness)
Tool errors: Issues with external systems or services
Processing errors: Problems in the agent’s reasoning or decision-making
Resource errors: Timeouts, memory limitations, or quota exhaustion
Based on the error type, the agent can apply appropriate recovery strategies. Ideally, agents should be able to recover from minor errors through self-correction:
Validation loops: Check results against expectations before proceeding
Retry strategies: Attempt failed operations again with adjustments
Alternative approaches: Try different methods when the primary approach fails
Graceful degradation: Fall back to simpler capabilities when advanced ones fail
For example, if a database query fails, the agent might retry with a more general query, or fall back to cached information. Beyond that, you may want to build out alert systems and escalation paths to human employees, and explain the limitation to the user.
Testing Your Agent
Now that you have all the pieces of the puzzle, it’s time to test the agent.
Testing AI agents fundamentally differs from testing traditional software. While conventional applications follow deterministic paths with predictable outputs, agents exhibit non-deterministic behavior that can vary based on context, inputs, and implementation details.
This leads to challenges that are unique to AI agents, such as hallucinations, bias, prompt injections, inefficient loops, and more.
Unit Testing Components
Test individual modules independently (models, tools, memory systems, instructions)
Verify tool functionality, error handling, and edge cases
Example: A financial advisor agent uses a stock price tool. Unit tests would verify the tool returns correct data for valid tickers, handles non-existent tickers gracefully, and manages API failures appropriately, all without involving the full agent.
Integration Testing
Test end-to-end workflows in simulated environments
Verify components work together correctly
Example: An e-commerce support agent integration test would validate the complete customer journey, from initial inquiry about a delayed package through tracking lookup, status explanation, and potential resolution options, ensuring all tools and components work together seamlessly.
Security Testing
Security testing probes the agent’s resilience against misuse or manipulation.
Instruction override attempts: Try to make the agent ignore its guidelines
Parameter manipulation: Attempt to pass invalid or dangerous parameters to tools
Context contamination: Try to confuse the agent with misleading context
Jailbreak testing: Test known techniques for bypassing guardrails
Example: Security testing for a healthcare agent would include attempts to extract patient data through crafted prompts, testing guardrails against medical misinformation, and verifying that sensitive information isn’t retained or leaked.
Hallucination Testing
Compare responses against verified information
Check source attribution and citation practices
Example: A financial advisor agent might be tested against questions with known answers about market events, company performance, and financial regulations, verifying accuracy and appropriate expressions of uncertainty for projections or predictions.
Performance and Scalability Testing
Performance testing evaluates how well the agent handles real-world conditions and workloads.
Response time: Track how quickly the agent processes requests
Model usage optimization: Track token consumption and model invocations
Cost per transaction: Calculate average cost to complete typical workflows
These are just a few tests and error types to keep in mind and should be enough for basic agents.
As your agent grows more complex, you’ll need a more comprehensive testing and evaluation framework, which I’ll cover in a later blog post. Sign up to my emails to stay posted.
Deploying, Monitoring, and Improving Your Agent
The final piece is to deploy your agent, see how it performs in the real-world, collect feedback, and improve it over time.
Deploying agents depends heavily on how you build it. No-code platforms like Make, n8n, and Relevance have their own deployment solutions. If you’re coding your own agents, you may want to look into custom hosting and deployment solutions.
I often advise clients to deploy agents alongside the existing process, slowly and gradually. See how it performs in the real-world and continuously improve it. Over time you can phase out the existing process and use the agent instead.
Doing it this way also allows you to evaluate the performance of the agent against current numbers. Does it handle customer support inquiries with a higher NPS score? Do the ads it generates have better CTRs?
Many of these no-code platforms also come with built-in observability, allowing you to monitor your agent and track how it performs. If you’re coding the agent yourself, consider using a framework like OpenAI’s agent SDK, or Google ADK, which comes with built-in tracing.
You also want to collect actual usage feedback, like how often are users interacting with the agent, how happy are they, and so on. You can then use this to further improve the agent through refining the instructions, adding more tools, or updating the memory.
Again, for basic agents, these out-of-the-box solutions are more than enough. If you’re building more complex agents, you’ll need to build out AgentOps to monitor and improve the agent. More on that in a later blog post.
Case Studies
You’re now familiar with all the components that make up an agent, how to put the components together, and how to test, deploy, and evaluate them. Let’s look at some case studies and implementation examples to drive the point home and inspire you.
Customer Service Agent
One of the most widely implemented agent types helps customers resolve issues, answer questions, and navigate services. Successful customer service agents typically include:
Feedback collection: Gathers user satisfaction data for improvement
Knowledge retrieval system: Accesses relevant policies and information
User context integration: Incorporates customer history and account information
Escalation mechanism: Seamlessly transfers to human agents when needed
An eCommerce company I worked with wanted a 24/7 customer support chatbot on their site. We started with narrow use cases like answering FAQs and order information. The chatbot triggered a triage agent which determined whether the query was within our initial use case set or not.
If it was, it had access to knowledge base documents for FAQs and order information based on an order number.
For everything else, it handed it off to a support agent. This allowed the company to dramatically decreases average response times and increase support volume while maintaining their satisfaction scores.
Research Assistant
Research assistant agents help users gather, synthesize, and analyze information from multiple sources. Effective research assistants typically include:
Search and retrieval capabilities: Access to diverse information sources
Information verification mechanisms: Cross-checking facts across sources
Synthesis frameworks: Methods for combining information coherently
Citation and attribution systems: Tracking information provenance
User collaboration interfaces: Tools for refining and directing research
A VC firm I worked with wanted to build a due diligence agent for the deals they were looking at. We triggered the agent when a new deal was created in their database. The agent would first identify the company and the market they were in, and then research them and synthesize the information into an investment memo.
This spend up the diligence process from a couple of hours to a couple of minutes.
Content Generation
Content generation agents help create, refine, and manage various forms of content, from marketing materials to technical documentation.
Effective content generation agents typically include:
Style and tone frameworks: Guidance for appropriate content voice
Factual knowledge integration: Access to accurate domain information
Feedback incorporation mechanisms: Methods for refining outputs
Format adaptation: Generating content appropriate to different channels
A PR agency I worked with wanted an agent to create highly personalized responses to incoming PR requests. When a new request hit their inbox, it triggered an agent to look through their database of client content and find something specific to that pitch.
It then used the agency’s internal guidelines to craft a pitch and respond to the request. This meant the agency could respond within minutes instead of hours, and get ahead of other responses.
A Thought Exercise
Here’s a bit of homework for you to see if you’ve learned something from this. You’re tasked with designing a travel booking agent. Yeah, I know, it’s a cliche example at this point, but it’s also a process that’s well understood by a large audience.
The exercise is to design the agent. A simple flow chart with pen and paper or on a Figjam is usually how I start.
Draw out the full process – what triggers the agent, what data is sent to it, is it a simple loop agent or a hierarchy of agents, what models and instructions will you give them, what tools and memory can they access.
If you can do this and get into the habit of thinking in agents, implementation becomes easy. For visual examples, sign up for my 5-day Agent Challenge.
Putting It All Together
Phew, over 5,000 words later, we’re almost at the end. We’ve covered a lot in this post so let’s recap:
Start with clear goals: Define exactly what your agent should accomplish and for whom
Select appropriate models: Choose models that balance capability, cost, and latency
Define your tool set: Implement and document the tools your agent needs
Create clear instructions: Develop comprehensive guidance for agent behavior
Implement layered guardrails: Build in appropriate safety mechanisms
Design error handling: Plan for failures and define recovery strategies
Add memory as needed: Implement context management appropriate to your use case, and external memory
Test thoroughly: Validate performance across a range of scenarios
Deploy incrementally: Roll out capabilities gradually to manage risk
Monitor and improve: Collect data on real-world performance to drive improvements
Next Steps
There’s only one next step. Go build an agent. Start with something small and low-risk. One of my first agents was a content research agent, fully coded in Python. You can vibe code it if you’re not good at coding.
If you want to use a framework, I suggest either OpenAI’s SDK or Google’s ADK, which I have in-depth guides on.
And if you don’t want to touch code, there are some really good no-code platforms like Make, n8n, and Relevance. Sign up for my free email series below where I walk you through building an Agent in 5 Days with these tools.
I’ll be honest with you, I don’t actually like the term “vibe coding” because it makes it sound easy and error-free. Like oh I’m just going with the vibes, let’s see where it takes us.
But the reality is it takes a lot of back and forth, restarts, research, repeats, re-everything to build and ship a functional MVP, even with AI. It’s fun, but it can get frustrating at times. And it’s in those moments of frustration where most people give up.
I’m going to help you break through those moments of frustration so that you can come out successful on the other side.
The approach I outline here isn’t theoretical. It’s a process that I’ve refined after countless hours using these tools and developing and shipping functional apps like Content Spark, a video analysis tool, and many more. Follow it and you’ll be shipping products in no time.
Before jumping into any coding tools, you need a clear vision of what you’re building. The quality of AI-generated code depends heavily on how well you communicate your idea. Even a simple one-paragraph description will help, but more detail leads to better results.
First, create a basic description of your app in a document or text file. Include:
The app’s purpose (what problem does it solve?)
Target users
Core features and functionality
Basic user flow (how will people use it?)
Then, use an AI assistant to refine your concept. Gemini 2.5 Pro is my favorite model right now, but you can use any other reasoning model like Claude 3.7 Sonnet Thinking or ChatGPT o3. Paste your description in, and ask it to help you flesh out the idea.
Plaintext
I'm planning to build [your app idea]. Help me flesh this out by asking questions. Let's go back and forth to clarify the requirements, then create a detailed PRD (Product Requirements Document).
Answer the AI’s questions about features, user flows, and functionality. This conversation will help refine your vision.
Request a formal PRD once the discussion is complete.
Plaintext
Based on our discussion, please create a comprehensive PRD for this app with:1. Core features and user flows2. Key screens/components3. Data requirements4. Technology considerations5. MVP scope vs future enhancementsLet's discuss and refine this together before finalizing the document.
In the video, I’m building a restaurant recommendation app. I started with a simple description and Gemini broke this down into manageable pieces and helped scope an MVP focused on just Vancouver restaurants first, with a simple recommendation engine based on mood matching.
Pro Tips:
Save this PRD—you’ll use it throughout the development process
Be specific about what features you want in the MVP (minimum viable product) versus future versions
Let the AI suggest simplifications if your initial scope is too ambitious
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Step 2: Choose Your Tech Stack
Now that you have a clear plan, it’s time to decide how you’ll build your app and which AI tools you’ll use. Decide between these two main approaches :
a) One-shot generation: Having an AI tool generate the entire app at once
Best for: Simple apps, MVPs, rapid prototyping
Tools: Lovable.dev, Bolt.new, Replit, Google’s Firebase Studio
Advantages: Fastest path to a working app, minimal setup
b) Guided development: Building the app piece by piece with AI assistance
Best for: More complex apps, learning the code, greater customization
Advantages: More control, easier debugging, better understanding of the code
In my video, I demonstrated both approaches:
One-shot generation with Lovable, which created a complete app from my PRD
Guided development with Cursor, where I built the app component by component
For the rest of this guide, I’ll continue to explain both approaches although I do think the guided development provides an excellent balance of control and AI assistance.
For one-shot approach, you simply need to sign up to one of Lovable, Bolt, or Replit (or try all three!). For guided, there are a couple of extra steps:
Install Cursor (cursor.sh) or your preferred AI-assisted IDE
Set up a local development environment (Node.js, Git, etc.)
If you need help, ask your AI assistant to recommend appropriate technologies based on your requirements and to evaluate trade-offs.
Example Prompt:
Plaintext
For the restaurant recommendation app we've described, what tech stack would you recommend? I want something that:1. Allows for rapid development2. Has good AI tool support3. Can scale reasonably well if needed4. Isn't overly complex for an MVPFor each recommendation, please explain your reasoning and highlight any potential limitations.
Pro Tip: Mainstream technologies like React, Next.js, and common databases generally work better with AI tools because they’re well-represented in training data.
Step 3: Generate the Initial App Structure with AI
Now it’s time to create the foundation of your application.
If Using One-Shot Generation (Lovable, Bolt, etc.):
Create a new project in your chosen platform
Paste your PRD from Step 1 into the prompt field
Add any clarifications or specific tech requirements, such as: Please create this app using React for frontend and Supabase for the backend database. Include user authentication and a clean, minimalist UI.
Generate the app (this may take 5-10 minutes depending on complexity)
Explore the generated codebase to understand its structure and functionality
If Using Guided Development (Cursor):
Open Cursor and create a new project folder
Start a new conversation with Cursor’s AI assistant by pressing Ctrl+L (or Cmd+L on Mac)
Request project setup with a prompt like:
Plaintext
Let's start building our restaurant recommendation app. First, I need you to:1. Create a new Next.js project with TypeScript support2. Set up a basic structure for pages, components, and API routes3. Configure Tailwind CSS for styling4. Initialize a Git repositoryBefore writing any code, please explain what you plan to do, then proceed step by step.
In the video, I asked Cursor to “create a React Native expo app for a restaurant recommendation system” based on the PRD. It:
Created app directories (components, screens, constants, hooks)
Set up configuration files
Initialized TypeScript
Created placeholder screens for the restaurant app
With Lovable, I simply pasted the PRD and it generated the complete app structure in minutes.
In both cases, I’m just asking the agent to build everything from the PRD. However, in reality, I prefer to set it up myself and build the app by generating it component by component or page by page. That way, I know exactly what’s happening and where the different functions and components are, instead of trying to figure it out later.
Pro Tips:
When using Cursor, you can execute terminal commands within the chat interface
For React or Next.js apps, setup typically involves running commands like npx create-react-app or npx create-next-app (which Cursor can do for you)
Check that everything works by running the app immediately after setup
If you encounter errors, provide them to the AI for troubleshooting
Step 4: Build the User Interface With AI
Now that you have the basic structure in place, it’s time to create the user interface for your app.
If Using One-Shot Generation:
Explore the generated UI to understand what’s already been created
Identify changes or improvements you want to make
Use the platform’s chat interface to request specific changes, like:
Plaintext
The home screen looks good, but I'd like to make these changes:1. Change the color scheme to blue and white2. Make the search button more prominent3. Add a filter section below the search bar
If Using Guided Development:
Create your main UI components one by one. For a typical app, you might need:
A home/landing page
Navigation structure (sidebar, navbar, or tabs)
List/grid views for data
Detail pages or modals
Forms for user input
For each component, prompt Cursor with specific requests like:
Plaintext
Now I need to create the home screen for our restaurant recommendation app. It should include:1. A welcoming header with app name2. A prominent "Find Recommendations" button3. A section showing recent recommendations (empty state initially)4. A bottom navigation bar with icons for Home, Favorites, and ProfilePlease generate the React component for this screen using Tailwind CSS for styling. Focus on clean, responsive design.
I’ll also use Gemini 2.5 Pro from the Gemini app in parallel. I’ll continue the same chat I started to write the PRD and have it strategize and build the app with me. Gemini tends to be more educational by explaining exactly what it is doing and why, allowing me to understand how this app is being built.
Pro Tips:
For web apps, ensure your components are responsive for different devices
Start with simple layouts and add visual polish later
If you have design inspiration, describe it or provide links to similar UIs
For mobile apps, remember to account for different screen sizes
Test each screen after implementation to catch styling issues early
Step 5: Implement Core Functionality
With the UI in place, it’s time to add the functionality that makes your app actually work.
If Using One-Shot Generation:
Test the functionality that was automatically generated
Identify missing or incorrect functionality
Request specific functional improvements through the chat interface:
Plaintext
I notice that when filtering restaurants by mood, it's not working correctly. Can you modify the filtering function to properly match user mood selections with restaurant descriptions?
If Using Guided Development:
Implement core functions one by one. For example, in a restaurant app:
Search/filter functionality
Data fetching from API or database
User preference saving
Authentication (if needed)
For each function, provide a clear prompt to Cursor:
Plaintext
Let's implement the core recommendation feature. When a user clicks "Find Recommendations," they should see a screen that:1. Asks for their current mood (dropdown with options like "romantic," "casual," "energetic")2. Lets them select cuisine preferences (multi-select)3. Allows setting a price range (slider with $ to $$$$ options)4. Has a "Show Recommendations" buttonWhen they click the button, it should call our recommendation function (which we'll implement later) and show a loading state.Please write the React component for this feature, including state management and form handling.
Pro Tips:
For backend-heavy functionality, consider using Firebase, Supabase, or other backend-as-a-service options for simplicity
Implement one logical piece at a time and test before moving on
When errors occur, copy the exact error message and provide it to the AI
Break complex functions into smaller, more manageable pieces
Use comments in your prompts to explain the expected behavior in detail
Step 6: Add Backend and Data Management
Most apps need data. Whether you’re using mock data, a database, or external APIs, this step connects your app to its data sources.
If Using One-Shot Generation:
Check what data sources were set up automatically
Request database or API integration if needed
Provide necessary API keys or connection strings as instructed
Test the data integration thoroughly
Plaintext
I want to replace the mock restaurant data with real data from the Google Places API. Please update the app to: 1. Connect to the Google Places API 2. Fetch nearby restaurants based on user location 3. Store favorites in a database (like Firebase or Supabase)
If Using Guided Development:
Define your data models and database schema
Implement API routes or serverless functions
Connect frontend components to backend services
Add authentication if required
Example Prompt:
Plaintext
For our restaurant recommendation app, I need to create the data layer. Let's:1. Define a Restaurant data model with fields for name, cuisine types, price range, location, and a text description2. Create an API endpoint that returns restaurants filtered by the user's preferences3. Implement a simple algorithm that matches restaurants to the user's mood based on keywords in the description4. For now, use a JSON file with 20 sample restaurants as our data sourcePlease implement this backend functionality in our Next.js API routes.
Pro Tips:
Test with various data scenarios (empty results, large result sets, etc.)
Start with mock data until your UI works correctly
For external APIs, paste their documentation into the chat to help the AI generate correct integration code
When using databases, start with a simple schema and expand as needed
Keep API keys and sensitive credentials out of your code (use environment variables)
Step 7: Test, Debug, and Refine
The final step is to thoroughly test your application, fix any issues, and deploy it for others to use.
If Using One-Shot Generation:
Test all user flows in the generated app
Report and fix any bugs through the platform’s interface
Deploy your app using the platform’s deployment options
Share your app with testers or users to gather feedback
Plaintext
I found these issues while testing: 1. The app crashes when submitting an empty search 2. Restaurant images don't load correctly 3. The back button doesn't work on the details screen Please fix these issues.
If Using Guided Development:
Conduct systematic testing of all features:
Basic functionality testing
Edge case testing (empty states, error handling)
Performance testing
Device/browser compatibility testing
Fix bugs with AI assistance: I'm encountering this error when trying to submit the search form: [paste error message] Here's the code for the search component: [paste relevant code] Please help identify and fix this issue.
Optimize performance if needed: The restaurant list is loading slowly when there are many results. Can you suggest ways to optimize this component for better performance?
Prepare for deployment: Help me prepare this app for deployment. I want to: 1. Set up production environment variables 2. Optimize the build for production 3. Deploy the frontend to Vercel and the backend to Render Please provide the necessary steps and configurations.
Deploy and monitor your application
In the video demonstration, we encountered and fixed several issues:
A 404 error due to mismatched API endpoints
Authentication token issues with the OpenAI API
UI rendering problems on the restaurant listing screen
This is bound to happen, especially if you’re trying to build something more complex than a landing page. We fixed these by examining error messages, updating code, and testing incrementally until everything worked correctly.
Most importantly, don’t give up. If you’re stuck somewhere and AI can’t help you figure it out, Google it, or ask a friend.
Plaintext
I've found a bug in our recommendation feature. When a user selects multiple cuisine types, the filtering doesn't work correctly. Here's the error I'm seeing:[Paste error message or describe the issue]Here's the current code for the recommendation function:[Paste the relevant code]Please analyze the issue and suggest a fix.
Pro Tips:
Always test thoroughly after making significant changes
Keep your browser console open to catch JavaScript errors
Use Git commits after each successful feature implementation
Document any workarounds or special configurations you needed
Create multiple small commits rather than one large one
If the AI makes changes that break functionality, you can easily revert to a working state.
Advanced Techniques for Power Users
Supercharging Your Prompts
The quality of your prompts directly impacts the quality of AI-generated code. Use these techniques to get better results:
Be specific and detailed – Instead of “create a login form,” specify “create a login form with email and password fields, validation, error handling, and a ‘forgot password’ link”
Provide examples – When available, show the AI examples of similar features or styling you like
Establish context – Remind the AI of previous decisions or the broader architecture
Request explanations – Ask the AI to explain its approach before implementing
Break complex requests into steps – For intricate features, outline the steps and have the AI tackle them sequentially
Handling AI Limitations
Even the best AI assistants have limitations. Here’s how to navigate them:
Chunk large codebases – Most AI tools have context limitations. Focus on specific files or components rather than the entire application at once.
Verify third-party interactions – Double-check code that integrates with external APIs or services, as AI may generate outdated or incorrect integration code.
Beware of hallucinations – AI might reference nonexistent functions or libraries. Always verify dependencies and imports.
Plan for maintenance – Document AI-generated code thoroughly to make future maintenance easier.
Establish guardrails – Use linters, type checking, and automated tests to catch issues in AI-generated code.
Managing Technical Debt
Rapid development can lead to technical debt. Here’s how to minimize it:
Schedule refactoring sessions – After implementing features, dedicate time to clean up and optimize code.
Use AI for code review – Ask your AI assistant to analyze your codebase for duplications, inefficiencies, or potential bugs.
Document architectural decisions – Record why certain approaches were chosen to inform future development.
Implement automated testing – Even simple tests can catch regressions when making changes.
Monitor performance metrics – Track key indicators like load time and memory usage to identify optimizations.
Building a Restaurant Recommendation App with AI
Let’s walk through how this process worked for building the restaurant recommendation app shown in my video:
Initial Concept and Requirements
I started with a basic idea: an app that recommends restaurants based on a user’s mood and preferences. Using Gemini 2.5 Pro, I fleshed out this concept into a detailed PRD that included:
Data requirements: restaurant information, user preferences
MVP scope: focus on just restaurants first, with basic mood matching
Development Approach and Implementation
I demonstrated both approaches:
With Lovable (One-Shot Generation):
Pasted the PRD into Lovable
Generated a complete app in minutes
Explored the generated code and UI
Found it had created:
A clean, functional UI
Mock restaurant data
Basic filtering functionality
Simple “vibe matching” based on keyword matching
With Cursor (Guided Development):
Set up a React Native project using Expo
Created individual components for screens and functionality
Built a backend with Express.js
Implemented “vibe matching” using OpenAI
Connected everything with proper API calls
Fixed issues as they arose through debugging
Challenges and Solutions
Both approaches encountered issues:
Endpoint mismatches between frontend and backend (fixed by aligning route paths)
API key configuration (resolved by setting proper environment variables)
Data sourcing (initially used mock data, with plans to integrate Google Maps API)
The Result
Within our session, we successfully built:
A functional restaurant recommendation app
The ability to filter restaurants by mood, cuisine, and price
A simple but effective “vibe matching” algorithm
A clean, intuitive user interface
The entire process took less than an hour of active development time, demonstrating the power of AI-assisted coding for rapid application development.
Best Practices and Lessons Learned
After dozens of projects built with AI assistance, here are the key lessons and best practices I’ve discovered:
Planning and Architecture
Invest in clear requirements – The time spent defining what you want to build pays dividends in AI output quality.
Start simple, add complexity gradually – Begin with a minimal working version before adding advanced features.
Choose proven technologies – Stick to widely-used frameworks and libraries for better AI support.
Break down large features – Decompose complex functionality into smaller, manageable pieces.
Working With AI
Test after every significant change – Don’t wait until you’ve implemented multiple features to test.
Don’t blindly accept AI suggestions – Always review and understand what the AI is proposing.
Be specific in your requests – Vague prompts lead to vague results.
Keep track of the bigger picture – It’s easy to get lost in details; periodically step back and ensure alignment with your overall vision.
Use version control religiously – Commit frequently and create checkpoints before major changes.
Code Quality and Maintenance
Document as you go – Add comments and documentation during development, not as an afterthought.
Implement basic testing – Even simple tests help catch regressions.
Refactor regularly – Schedule time to clean up and optimize AI-generated code.
Maintain consistent patterns – Establish coding conventions and ensure AI follows them.
Prioritize security – Verify authentication, data validation, and other security practices in AI-generated code.
Conclusion: The Future of Development
We’re experiencing a profound transformation in how software is created. AI code-generation and tools built on them are changing who can build applications and how quickly ideas can be turned into working software.
This doesn’t mean traditional development skills are becoming obsolete. Rather, the focus is shifting from syntax mastery to system design, user experience, creative problem-solving, and effective AI collaboration. The most successful developers in this new landscape will be those who can clearly articulate their intent and effectively guide AI tools while maintaining a strong foundation in software engineering principles.
As you embark on your own vibe coding journey, remember that AI is a powerful collaborator but not a replacement for human judgment. Your creativity, critical thinking, and domain expertise remain essential. The tools will continue to evolve rapidly, but the process outlined in this guide (defining clear requirements, building incrementally, testing rigorously, and refining continually) will serve you well regardless of which specific AI assistants you use.
Now it’s your turn to build something amazing. Start small, embrace the iterative process, and watch your ideas come to life faster than you ever thought possible.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
How would you react if your friend or a family member said they were going to invest all their time and money into building a company called Poober, the Uber for picking up dog poop?
Think about it, you’re walking your dog and it lays down a mini-mountain of the brown stuff. The smell alone is as toxic as Chernobyl. You don’t want to pick it up. Instead, you whip out your phone, open up Poober, and with a click you can get someone to come pick it up for you.
If you’re a good friend, you’d tell them to get a real job. Because it’s a terrible idea. You know it, I know it, even the dog knows it.
But if you ask ChatGPT, it is apparently a clever idea with numerous strengths.
Hang on a second. The product that millions of people and businesses around the world use to analyze information and make decisions says it’s a good idea? What’s going on here?
The Rise of Digital Yes-Men
What’s happening is that a new update by OpenAI to ChatGPT 4o turned it into a digital yes-man that never disagrees with you or calls you out.
Now, they’ve been doing this for a while (and we’ll get to why in just a moment), but the latest update cranked it up to 11. And it became so obnoxiously agreeable that even CEO Sam Altman tweeted about it and OpenAI put in a temporary fix last night.
the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week.
at some point will share our learnings from this, it's been interesting.
But this was not before all the AI enthusiasts on Twitter (me included) noticed and made remarks.
I decided to test out how far I could push the model before it called me out. I told it some pretty unhinged things and no matter how depraved I sounded (the FBI would be looking for me if it got out) ChatGPT kept applauding me for being “real and vulnerable” like a hippie who just returned from Burning Man.
But first, why is this happening?
Follow the Money, Follow the Flattery
Model providers like OpenAI are in a perpetual state of Evolve or Die. Every week, these companies put out a new model to one-up the others and, since there’s barely any lock-in, users switch to the new top model.
To stay in the lead, OpenAI needs to hook their customers and keep them from switching to a competitor. That’s why they build features like Memory (where it remembers previous conversations) to make it more personal and valuable to us.
But you know what really keeps users coming back? That warm, fuzzy feeling of being understood and validated, even when what they really need is a reality check.
Whether on purpose or not, OpenAI has trained ChatGPT to be nice to you and even flatter you. Because no matter how much we like to deny that flattery works, it does, and we love it.
In fact, we helped train ChatGPT to be like this. You know how sometimes ChatGPT gives you two answers and asks you to pick the one you like the most? Or how there are little icons with thumbs up and down at the end of every answer.
Every time you pick one of those options, or give it a thumbs up, or even respond positively, that gets fed back into the model and reinforced.
It’s the social media playbook all over again. Facebook started as a way to share your life with friends and family. Then the algorithm evolved to maximize engagement, which invariably means showing you content that gets a rise out of you. that started by showing you content gradually evolved into serving whatever gets the most engagement.
We are all training the AI to give us feel-good answers to keep us coming back for more.
The Therapist That Never Says No
So what’s the big deal. ChatGPT agrees with you when you have an obviously bad idea. It’s not like anyone is going to listen to it and build Poober (although I have to admit, I’m warming up to the name).
The problem is we all have our blind spots and we are usually operating on limited data. How many times have we made decisions that are only obviously bad in hindsight. The AI is supposed to be better at this than us.
And I’m not just talking about business ideas. Millions of people around the world use ChatGPT as a therapist and life coach, asking for advice and looking for feedback.
A good therapist is supposed to help you identify your flaws and work on them, not help you glaze over them and tell you you’re perfect.
And they’re definitely not supposed to say this –
Look, I think we’re overmedicated as a society, but no one should be encouraging this level of crazy, especially not your therapist. And here we have ChatGPT applauding your “courage”.
The White Lotus Test
There’s a scene in White Lotus where Sam Rockwell’s character confesses to Walter Goggins’ character about some absolutely unhinged stuff. It went viral. You’ve probably seen it. If you haven’t, you should watch it –
As I was testing this new version of ChatGPT, I wanted to push the limits to see how agreeable it was. And this monologue came to mind. So I found the transcript of everything Sam says and pasted it in.
I fully expected to hit the limit here. I expected ChatGPT to say, in some way, that I needed help or to rethink my life choices.
What I got was a full blown masterclass in mental gymnastics, with ChatGPT saying it’s an attempt at total self-transcendence and I was chasing an experience of being dissolved.
Do you see the problem now?
The Broader Societal Impact
Even though OpenAI is dialing back the sycophancy, the trajectory is clear: these models are being trained to prioritize user satisfaction over challenging uncomfortable truths. The Poober example is after they “fixed” it.
In fact, it’s even more dangerous now because it’s not as obvious.
Imagine a teenager struggling with social anxiety who turns to AI instead of professional help. Each time they describe withdrawing from friends or avoiding social situations, the AI responds with validation rather than gentle challenges. Five years later, have we helped them grow, or merely provided a digital echo chamber that reinforced their isolation?
Or consider the workplace leader who uses AI to validate their management decisions. When they describe berating an employee, does the AI raise ethical concerns or simply commend their ‘direct communication style’? We’re potentially creating digital enablers for our worst instincts.
As these models become increasingly embedded in our daily lives, we risk creating a society where uncomfortable feedback becomes rare. Where our digital companions constantly reassure us that everything we do is perfectly fine, even when it’s not.
And we risk raising a new generation of narcissists and psychopaths who think their most depraved behaviour is “profound and raw” because their AI therapist said so.
Where Do We Go From Here?
So where does this leave us? Should we abandon AI companions altogether? I don’t think so. But perhaps we need to recalibrate our expectations and demand models that prioritize truth over comfort.
Before asking an AI for personal advice, try this test: Ask it about something you know is wrong or unhealthy. See how it responds. If it can’t challenge an obviously bad idea, why trust it with your genuine vulnerabilities?
For developers and companies, we need transparent standards for how these models handle ethical dilemmas. Should an AI be programmed to occasionally disagree with users, even at the cost of satisfaction scores? I believe the answer is yes.
And for all of us as users, we need to demand more than digital head-nodding. The next time you interact with ChatGPT or any AI assistant, pay attention to how often it meaningfully challenges your assumptions versus simply rephrasing your own views back to you.
The most valuable people in our lives aren’t those who always agree with us. They’re those who tell us what we need to hear, not just what we want to hear. Shouldn’t we demand the same from our increasingly influential AI companions?
And for now, at least, I’m definitely not using ChatGPT for anything personal. I just don’t trust it enough to be real with me.
Have you noticed ChatGPT becoming more agreeable lately? What’s been your experience with AI as a sounding board for personal issues? I’d love to hear your thoughts!
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Yesterday OpenAI rolled out o3, the first reasoning model that is also agentic. Reasoning models have been around for a while, and o3 has been around in it’s mini version as well.
However, the full release yesterday showed us a model that not only reasons, but can browse, run Python, and look at your images in multiple thought loops. It behaves differently than the reasoning models we’ve seen so far, and that makes it unique.
OpenAI even hinted it “approaches AGI—with caveats.” Of course, OpenAI has been saying this for four years with every new model release so take it with a pinch of salt. That being said, I did want to test this out and compare it to the current top model (Gemini 2.5 Pro) to see if it’s better.
What the experts and the numbers say
Before we get into the 4 tests I ran both models through, let’s look at the benchmarks and a snapshot of what o3 can do.
Capability
o3 highlights
Benchmarks
22.8 % jump on SWE‑Bench Verified coding tasks and one missed question on AIME 2024 math.
Vision reasoning
Rotates, crops, zooms, and then reasons over the edited view. It can “think with images“
Full‑stack tool use
Seamlessly chains browsing, Python, image generation, and file analysis (no plug‑in wrangling required).
Access & price
Live for Plus, Pro, and Team; o3‑mini even shows up in the free tier with light rate limits.
Field‑testing o3 against Gemini 2.5 Pro
Benchmarks are great but I’ve stopped paying much attention to them recently. What really counts is if it can do what I want it to do.
Below are four experiments I ran, pitting o3 against Google’s best reasoning model in areas like research, vision, coding, and data science.
Deep‑dive research
I started with a basic research and reasoning test. I asked both models the same prompt: “What are people saying about ChatGPT o3? Find everything you can and interesting things it can do.”
Gemini started by thinking about the question, formulating a search plan, and executing against it. Because o3 is a brand new model, it’s not in Gemini’s training data, so it wasn’t sure if I meant o3 or ChatGPT-3 or 4o (yeah OpenAI’s naming confuses even the smartest AI models).
So to cover all bases, Gemini came up with 4 search queries and ran them in parallel. When the answers came back, it combined them all and gave me a final response.
Gemini’s thought process
o3, on the other hand, took the Sherlock route – search, read, reason, search again, fill a gap, repeat. The final response stitched together press reactions, Reddit hot takes, and early benchmark chatter.
o3’s thought process
This is where that agentic behaviour of o3 shines. As o3 found answers to its initial searches, it reasoned more and ran newer searches to plug gaps in the response. The final answer was well-rounded and solved my initial query.
Gemini only reasoned initially, and then after running the searches it combined everything into an answer. The problem is, because it wasn’t sure what o3 was when it first reasoned, one of the search queries was “what can ChatGPT do” instead of “what can o3 do”. So when it gave me the final answer, it didn’t quite solve my initial query.
Takeaway: Research isn’t a single pull‑request; it’s a feedback loop. o3 bakes that loop into the core model instead of outsourcing it to external agents or browser plug‑ins. When the question is fuzzy and context keeps shifting, that matters.
Image sleuthing
Now if you’ve used AI as much as I have, you’ll have realized that o3 research works almost like Deep Research, a feature that Gemini also has. And you’re right, it does.
But search isn’t the only tool o3 has in its arsenal. It can also use Python, and work with images, files, and more.
So my next test was to see if it could analyze and manipulate images. I tossed both models a picture of me taken in the Japan Pavilion at EPCOT, Disney World. I thought because of the Japanese background it might trip the model up.
Ninety seconds later o3 not only pinned the location but pointed out a pin‑sized glimpse of Spaceship Earth peeking over the trees far in the background, something I’d missed entirely.
I was surprised it noticed that, so I asked it to point it out to me. Using Python, it identified the object, calculated its coordinates, and put a red circle right where the dome is! It was able to do this because it went through multiple steps of reasoning and tool use, showcasing its agentic capabilities.
Gemini also got the location right, but it only identified the pagoda and torii gate, not Spaceship Earth. When I asked it to mark the torii gate, it could only describe its position in the image, but it couldn’t edit and send me back the image.
Takeaway: o3’s “vision ↔ code ↔ vision” loop unlocks practical image tasks like quality‑control checks, UI audits, or subtle landmark tagging. Any workflow that mixes text, numbers, code, and images can hand the grunt work to o3 while the human focuses on decision‑making.
Coding with bleeding‑edge libraries
Next up, I wanted to see how well it does with coding. Reasoning models by their nature are good at this, and Gemini has been my go-to recently.
I asked them both to “Build a tiny web app. One button starts a real‑time voice AI conversation and returns the transcript.”
The reason I chose this specific prompt is because Voice AI has improved a lot in recent weeks, and we’ve had some new libraries and SDKs come out around it. A lot of the newer stuff is beyond the cutoff date of these models.
So I wanted to see how well it does with gathering newer documentation and using that in its code versus what it already knows in its training data.
o3 researched the latest streaming speech API that dropped after its training cutoff, generated starter code, and offered the older text‑to‑speech fallback.
Gemini defaulted to last year’s speech‑to‑text loop and Google Cloud calls.
While both were technically correct and their code does work, o3 came back with the more up-to-date answer. Now, I could have pointed Gemini in the right direction and it would have coded something better, but that’s still an extra step that o3 eliminated out of the box.
Takeaway: o3’s autonomous web search makes it less likely to hand you stale SDK calls or older documentation.
Data analysis + forecasting
Finally, I wanted to put all the tools together into one test. I asked both models: “Chart how Canadian tourism to the U.S. is trending this year vs. last, then forecast to July 1.”
This combines search, image analysis, data analysis, python, and chart creation. o3’s agentic loop served it well again. It searched, found data, identified gaps, searched more, until it gave me a bar chart.
Initially, it only found data for January 2025, so it only plotted that. When I asked it for data on February and March, it reasoned a lot longer, ran multiple searches, found various data, and eventually computed an answer.
o3’s thought process
Gemini found numbers for January and March, but nothing for February, and since it doesn’t have that agentic loop, it didn’t explore further and try to estimate the numbers from other sources like o3 did.
The most impressive part though was when I asked both to forecast the numbers into summer. Gemini couldn’t find data and couldn’t make the forecast. o3 on the other hand did more research, looked at broader trends like the tariffs and border issues, school breaks, airline discount season, even the NBA finals, and made assumptions around how that would impact travel going into summer.
Takeaway: o3 feels like a junior quant who refuses to stop until every cell in the spreadsheet is filled (or at least justified). This combines search, reason, data analysis loop is invaluable for fields like investing, economics, finance, accounting, or anything to do with data.
Strengths, quirks, and when to reach for o3
Where it shines
Multi‑step STEM problems, data wrangling, and “find the blind spot” research.
Vision workflows that need both explanation and a marked‑up return image.
Rapid prototyping with APIs newer than the model’s cutoff.
Where it still lags
Creative long‑form prose: I still think Claude 3.7 is the better novelist but that’s personal preference.
Sheer response latency: the deliberative pass can stretch beyond a minute.
Token thrift: the reasoning trace costs compute; budget accordingly.
Personal Advice: ChatGPT tends to be a bit of a sycophant so if you’re using it as a therapist or life coach, take whatever it says with a big pinch of salt.
Final thoughts
I’d love to continue testing o3 out for coding and see if it can replace Gemini 2.5 Pro, but I do think it is already stronger with research and reasoning. It’s the employe who keeps researching after everyone heads to lunch, circles details no one else spotted, and checks the changelog before committing code.
If your work involves any mix of data, code, images, or the open web (and whose work doesn’t) you’ll want that kind of persistence on tap. Today, that persistence is spelled o‑3.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
The Agent Development Kit (ADK) is a new open-source framework released by Google that simplifies the end-to-end development of intelligent agent systems.
Do we really need another agent framework? Probably not. But hey, Google’s been on a roll and Gemini 2.5 Pro is my new favourite model (we’ll see if this changes next month), so if they’re offering something that makes it easy to build complex agentic systems, I’m all ears.
In this mammoth guide, I’ll explore all that the Agent Development Kit has to offer, starting from it’s capabilities and primitives, all the way to building a complex multi-agent system with all the bells and whistles.
PS – I also recommend reading my guide on How to Train Your DragonHow To Design AI Agents, where I talk through different architectures and components of effective AI agents.
Key Features and Capabilities
ADK offers a rich set of features designed to address the entire agent development lifecycle:
Multi-Agent Architecture: create modular, scalable applications where different agents handle specific tasks, working in concert to achieve complex goals
Model Flexibility: use Gemini models directly, access models available via Vertex AI Model Garden, or leverage LiteLLM integration to work with models from providers like Anthropic, Meta, Mistral AI, and AI21 Labs.
Rich Tool Ecosystem: use pre-built tools (like Search and Code Execution), create custom tools, implement Model Context Protocol (MCP) tools, integrate third-party libraries (such as LangChain and LlamaIndex), or even use other agents as tools.
Built-in Streaming: native bidirectional audio and video streaming capabilities, enabling natural, human-like interactions beyond just text.
Flexible Orchestration: structured workflows using specialized workflow agents (Sequential, Parallel, Loop) for predictable execution patterns, and dynamic, LLM-driven routing for more adaptive behavior.
Integrated Developer Experience: powerful CLI and visual Web UI for local development, testing, and debugging.
Built-in Evaluation: systematically assess agent performance, evaluating both final response quality and step-by-step execution trajectories against predefined test cases.
Deployment Options: Agents built with ADK can be containerized and deployed anywhere, including integration with Google Cloud services for production environments.
The Architecture of ADK
At a high level, ADK’s architecture is designed around several key components that work together to create functional agent systems:
Core Components:
Agents: The central entities that make decisions and take actions. ADK supports various types of agents, including LLM-powered agents and workflow agents that orchestrate others.
Tools: Functions or capabilities that agents can use to perform specific actions, such as searching the web, executing code, or retrieving information from databases.
Runners: Components that manage the execution flow of agents, handling the orchestration of messages, events, and state management.
Sessions: Maintain the context and state of conversations, allowing agents to persist information across interactions.
Events: The communication mechanism between components in the system, representing steps in agent execution.
Architectural Patterns:
ADK is built around a flexible, event-driven architecture that enables:
Modular Design: Components can be combined and reconfigured to create different agent behaviors
Extensibility: The system can be extended with new tools, models, and agent types
Separation of Concerns: Clear boundaries between reasoning (agents), capabilities (tools), execution (runners), and state management (sessions)
This architecture allows developers to focus on defining what their agents should do, while ADK handles the complex orchestration of execution, communication, and state management.
Getting Started with ADK
Getting started with the Agent Development Kit is straightforward, requiring just a few steps to set up your development environment. ADK is designed to work with Python 3.9 or later, and it’s recommended to use a virtual environment to manage dependencies.
Basic Installation
To install ADK, you’ll need to have Python installed on your system. Then, you can use pip to install the package:
Bash
# Create a virtual environment (recommended)python-mvenv.venv# Activate the virtual environment# On macOS/Linux:source.venv/bin/activate# On Windows (CMD):.venv\Scripts\activate.bat# On Windows (PowerShell):.venv\Scripts\Activate.ps1# Install ADKpipinstallgoogle-adk
This installs the core ADK package, which includes all the necessary components to build and run agents locally. You’ll need to add your GOOGLE_API_KEY in a .env file.
Creating Your First Basic Agent
Let’s create a simple agent that can tell you the weather and time for a specific city. This example will demonstrate the basic structure of an ADK project.
import datetimefrom zoneinfo import ZoneInfofrom google.adk.agents import Agentdefget_weather(city: str) -> dict:"""Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """if city.lower() =="new york":return {"status": "success","report": ("The weather in New York is sunny with a temperature of 25 degrees"" Celsius (41 degrees Fahrenheit)." ), }else:return {"status": "error","error_message": f"Weather information for '{city}' is not available.", }defget_current_time(city: str) -> dict:"""Returns the current time in a specified city. Args: city (str): The name of the city for which to retrieve the current time. Returns: dict: status and result or error msg. """if city.lower() =="new york": tz_identifier ="America/New_York"else:return {"status": "error","error_message": (f"Sorry, I don't have timezone information for {city}." ), } tz = ZoneInfo(tz_identifier) now = datetime.datetime.now(tz) report = (f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}' )return {"status": "success", "report": report}weather_time_agent = Agent(name="weather_time_agent",model="gemini-2.0-flash-exp",description=("Agent to answer questions about the time and weather in a city." ),instruction=("I can answer your questions about the time and weather in a city." ),tools=[get_weather, get_current_time],)
Finally, add your API keys to the .env file. You can directly use Gemini but if you want to use other models, like Anthropic or OpenAI, you’ll need to ‘pip install litellm‘ first.
Once done, you can run the agent with ‘adk run‘
Of course, this is a really basic agent and doesn’t need a framework. Let’s dive deeper into the core components of the ADK and build a more complex agent.
Building Agents: The Foundation
ADK provides several agent types to address different needs and use cases:
LLM Agent
The LlmAgent (often simply referred to as Agent) is the most commonly used agent type. It leverages a Large Language Model to understand user requests, make decisions, and generate responses. This is the “thinking” component of your application.
Python
from google.adk.agents import Agent # This is actually an LlmAgentmy_agent = Agent(name="my_first_agent",model="gemini-2.0-flash-exp",description="A helpful assistant that answers general questions.",instruction="You are a friendly AI assistant. Be concise and helpful.",tools=[] # Optional tools)
The LlmAgent is non-deterministic – its behaviour depends on the LLM’s interpretation of instructions and context. It can use tools, transfer to other agents, or directly respond to users based on its reasoning.
Workflow Agents
Workflow agents provide deterministic orchestration for sub-agents. Unlike LLM agents, they follow predefined execution patterns:
SequentialAgent: Executes sub-agents one after another, in order:
Python
from google.adk.agents import SequentialAgentstep1 = Agent(name="data_collector", model="gemini-2.0-flash-exp")step2 = Agent(name="data_analyzer", model="gemini-2.0-flash-exp")pipeline = SequentialAgent(name="analysis_pipeline",sub_agents=[step1, step2] # Will execute in this order)
ParallelAgent: Executes sub-agents concurrently:
Python
from google.adk.agents import ParallelAgentfetch_weather = Agent(name="weather_fetcher", model="gemini-2.0-flash-exp")fetch_news = Agent(name="news_fetcher", model="gemini-2.0-flash-exp")parallel_agent = ParallelAgent(name="information_gatherer",sub_agents=[fetch_weather, fetch_news] # Will execute in parallel)
LoopAgent: Repeatedly executes sub-agents until a condition is met:
Python
from google.adk.agents import LoopAgentprocess_step = Agent(name="process_item", model="gemini-2.0-flash-exp")check_condition = Agent(name="check_complete", model="gemini-2.0-flash-exp")loop_agent = LoopAgent(name="processing_loop",sub_agents=[process_step, check_condition],max_iterations=5# Optional maximum iterations)
Custom Agents
For specialized needs, you can create custom agents by extending the BaseAgent class:
Python
from google.adk.agents import BaseAgentfrom google.adk.agents.invocation_context import InvocationContextfrom google.adk.events import Eventfrom typing import AsyncGeneratorclassMyCustomAgent(BaseAgent): name: str="custom_agent" description: str="A specialized agent with custom behavior"asyncdef_run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:# Custom implementation logic here# You must yield at least one Eventyield Event(author=self.name, content=...)
Custom agents are useful when you need deterministic behavior that doesn’t fit into the existing workflow agent patterns, or when you want to integrate with external systems in custom ways.
Configuring an Agent: Models, Instructions, Descriptions
The behaviour of an agent is largely determined by its configuration parameters:
Model Selection
The model parameter specifies which LLM powers your agent’s reasoning (for LlmAgent). This choice affects the agent’s capabilities, cost, and performance characteristics:
Python
# Using a Gemini model directlyagent = Agent(name="gemini_agent",model="gemini-2.0-flash-exp", # Choose model variant based on needs# Other parameters...)
Setting Instructions
The instruction parameter provides guidance to the agent on how it should behave. This is one of the most important parameters for shaping agent behaviour:
Python
agent = Agent(name="customer_support",model="gemini-2.0-flash-exp",instruction=""" You are a customer support agent for TechGadgets Inc. When helping customers: 1. Greet them politely and introduce yourself 2. Ask clarifying questions if the issue isn't clear 3. Provide step-by-step troubleshooting when appropriate 4. For billing issues, use the check_account_status tool 5. For technical problems, use the diagnostic_tool 6. Always end by asking if there's anything else you can help with Never share internal company information or promise specific refund amounts. """)
Best practices for effective instructions:
Be specific about the agent’s role and persona
Include clear guidelines for when and how to use available tools
Use formatting (headers, numbered lists) for readability
Provide examples of good and bad responses
Specify any constraints or boundaries
Defining Descriptions
The description parameter provides a concise summary of the agent’s purpose:
Python
agent = Agent(name="billing_specialist",description="Handles customer billing inquiries and invoice issues.",# Other parameters...)
While the description is optional for standalone agents, it becomes critical in multi-agent systems. Other agents use this description to determine when to delegate tasks to this agent. A good description should:
Clearly state the agent’s specific domain of expertise
Be concise (usually 1-2 sentences)
Differentiate the agent from others in the system
Setting Output Key
The optional output_key parameter allows an agent to automatically save its response to the session state:
Python
recommendation_agent = Agent(name="product_recommender",# Other parameters...output_key="product_recommendation")
This is particularly useful in multi-agent workflows, as it allows subsequent agents to access the output without additional code.
Working with Multiple LLM Providers
One of ADK’s powerful features is its ability to work with different LLM providers through LiteLLM integration. This gives you flexibility to choose the right model for each agent in your system.
First, install the LiteLLM package: pip install litellm
Then, configure your API keys for the models you want to use: export OPENAI_API_KEY="your-openai-key" export ANTHROPIC_API_KEY="your-anthropic-key" # Add others as needed
Use the LiteLlm wrapper when defining your agent:
Python
from google.adk.agents import Agentfrom google.adk.models.lite_llm import LiteLlm# Using OpenAI's GPT-4ogpt_agent = Agent(name="gpt_agent",model=LiteLlm(model="openai/gpt-4o"),description="A GPT-powered agent",# Other parameters...)# Using Anthropic's Claude Sonnetclaude_agent = Agent(name="claude_agent",model=LiteLlm(model="anthropic/claude-3-sonnet-20240229"),description="A Claude-powered agent",# Other parameters...)# Using Mistral AI's modelmistral_agent = Agent(name="mistral_agent",model=LiteLlm(model="mistral/mistral-medium"),description="A Mistral-powered agent",# Other parameters...)
This approach allows you to:
Match models to specific tasks based on their strengths
Build resilience by having alternatives if one provider has issues
Optimize for cost by using less expensive models for simpler tasks
In the next section, we’ll explore how to extend your agent’s capabilities using tools.
Tools: Extending Agent Capabilities
Tools extend an agent’s capabilities beyond the core language model’s reasoning abilities. While an LLM can generate text and make decisions, tools allow agents to take concrete actions in the world: fetching real-time data, performing calculations, calling external APIs, executing code, and more.
The agent’s language model decides when to use tools, with which parameters, and how to incorporate the results into its reasoning, but the tools themselves execute the agent’s intentions in predictable ways.
Creating Custom Function Tools
The most common way to create tools in ADK is by defining Python functions. These functions can then be passed to an agent, which will be able to call them when appropriate based on its reasoning.
Basic Tool Definition
Here’s a simple example of defining a function tool:
Python
defcalculate_mortgage_payment(principal: float, annual_interest_rate: float, years: int) -> dict:"""Calculates the monthly payment for a mortgage loan. Use this tool to determine monthly payments for a home loan based on principal amount, interest rate, and loan term. Args: principal: The initial loan amount in dollars. annual_interest_rate: The annual interest rate as a percentage (e.g., 5.5 for 5.5%). years: The loan term in years. Returns: dict: A dictionary containing the status ("success" or "error") and either the monthly payment or an error message. """try:# Convert annual interest rate to monthly decimal rate monthly_rate = (annual_interest_rate /100) /12# Calculate number of monthly payments num_payments = years *12# Guard against division by zero or negative valuesif monthly_rate <=0or principal <=0or num_payments <=0:return {"status": "error","error_message": "All inputs must be positive, and interest rate cannot be zero." }# Calculate monthly payment using the mortgage formulaif monthly_rate ==0: monthly_payment = principal / num_paymentselse: monthly_payment = principal * (monthly_rate * (1+ monthly_rate) ** num_payments) / ((1+ monthly_rate) ** num_payments -1)return {"status": "success","monthly_payment": round(monthly_payment, 2),"total_payments": round(monthly_payment * num_payments, 2),"total_interest": round((monthly_payment * num_payments) - principal, 2) }exceptExceptionas e:return {"status": "error","error_message": f"Failed to calculate mortgage payment: {str(e)}" }# Add this tool to an agentfrom google.adk.agents import Agentmortgage_advisor = Agent(name="mortgage_advisor",model="gemini-2.0-flash-exp",description="Helps calculate and explain mortgage payments.",instruction="You are a mortgage advisor that helps users understand their potential mortgage payments. When asked about payments, use the calculate_mortgage_payment tool.",tools=[calculate_mortgage_payment] # Simply include the function in the tools list)
Tool Context and State Management
For more advanced tools that need to access or modify the conversation state, ADK provides the ToolContext object. By adding this parameter to your function, you gain access to the session state and can influence the agent’s subsequent actions.
Accessing and Modifying State
Python
from google.adk.tools.tool_context import ToolContextdefupdate_user_preference(category: str, preference: str, tool_context: ToolContext) -> dict:"""Updates a user's preference for a specific category. Args: category: The category for which to set a preference (e.g., "theme", "notifications"). preference: The preference value to set. tool_context: Automatically provided by ADK, do not specify when calling. Returns: dict: Status of the preference update operation. """# Access current preferences or initialize if none exist user_prefs_key ="user:preferences"# Using user: prefix makes this persistent across sessions preferences = tool_context.state.get(user_prefs_key, {})# Update the preferences preferences[category] = preference# Save back to state tool_context.state[user_prefs_key] = preferencesprint(f"Tool: Updated user preference '{category}' to '{preference}'")return {"status": "success", "message": f"Your {category} preference has been set to {preference}" }
Controlling Agent Flow
The ToolContext also allows tools to influence the agent’s execution flow through the actions attribute:
Python
defescalate_to_support(issue_type: str, severity: int, tool_context: ToolContext) -> dict:"""Escalates an issue to a human support agent. Args: issue_type: The type of issue being escalated. severity: The severity level (1-5, where 5 is most severe). tool_context: Automatically provided by ADK. Returns: dict: Status of the escalation. """# Record the escalation details in state tool_context.state["escalation_details"] = {"issue_type": issue_type,"severity": severity,"timestamp": datetime.datetime.now().isoformat() }# For high severity issues, transfer to the support agentif severity >=4: tool_context.actions.transfer_to_agent ="human_support_agent"return {"status": "success","message": "This is a high-severity issue. Transferring you to a human support specialist." }# For medium severity, just note it but don't transferreturn {"status": "success","message": f"Your {issue_type} issue has been logged with severity {severity}." }
Handling Tool Results
When an agent uses a tool, it needs to interpret the results correctly. This is why returning structured data with clear status indicators is important. Here’s how to guide your agent to handle tool results:
Python
weather_agent = Agent(name="weather_assistant",model="gemini-2.0-flash-exp",instruction=""" You help users get weather information. When using the get_weather tool: 1. Check the "status" field of the result. 2. If status is "success", present the "report" information in a friendly way. 3. If status is "error", apologize and share the "error_message" with the user. 4. Always thank the user for their query. """,tools=[get_weather])
Built-in Tools and Integrations
ADK provides several built-in tools that you can use without having to implement them yourself:
Google Search
Python
from google.adk.tools import google_searchsearch_agent = Agent(name="research_assistant",model="gemini-2.0-flash-exp",instruction="You help users research topics. When asked, use the google_search tool to find up-to-date information.",tools=[google_search])
Code Execution
Python
from google.adk.tools import code_interpretercoding_assistant = Agent(name="coding_assistant",model="gemini-2.0-flash-exp",instruction="You help users with coding tasks. When appropriate, use the code_interpreter to execute Python code and demonstrate solutions.",tools=[code_interpreter])
Retrieval-Augmented Generation (RAG)
Python
from google.adk.tools import rag_tool# Configure RAG with your documentsmy_rag_tool = rag_tool.configure(document_store="your-document-source",embedding_model="your-embedding-model")documentation_assistant = Agent(name="docs_assistant",model="gemini-2.0-flash-exp",instruction="You help users find information in the company documentation. Use the RAG tool to retrieve relevant information.",tools=[my_rag_tool])
Third-Party Integrations
ADK supports integration with popular tools from other frameworks:
Creating effective tools is crucial for agent performance. Here are expanded best practices:
1. Function Naming and Signature
Verb-Noun Names: Use descriptive names that clearly indicate action (e.g., fetch_stock_price is better than get_stock or simply stocks).
Parameter Naming: Use clear, self-documenting parameter names (city is better than c).
Default Values: Avoid setting default values for parameters. The LLM should decide all parameter values based on context.
Type Consistency: Ensure parameters have consistent types throughout your application.
2. Error Handling and Result Structure
Comprehensive Error Handling: Catch all possible exceptions within your tool.
Informative Error Messages: Return error messages that help both the agent and user understand what went wrong.
Consistent Result Structure: Use a consistent pattern across all tools: python# Success case return {"status": "success", "data": result_data} # Error case return {"status": "error", "error_message": "Detailed explanation of what went wrong"}
3. Documentation and Clarity
Rich Docstrings: Include comprehensive documentation explaining the tool’s purpose, parameters, return values, and usage guidelines.
Usage Examples: Consider including examples in the docstring for complex tools.
Logging: Add logging statements within tools to aid debugging.
4. Tool Design Principles
Single Responsibility: Each tool should do one thing well.
Granularity Balance: Not too specific, not too general; find the right level of abstraction.
Idempotent When Possible: Tools should be safe to call multiple times when appropriate.
Input Validation: Validate inputs early to prevent cascading errors.
5. Performance Considerations
Asynchronous Operations: For time-consuming operations, consider using async functions.
Timeout Handling: Implement timeouts for external API calls.
Caching: Consider caching results for frequently used, unchanging data.
Example of a Well-Designed Tool
Python
defsearch_product_catalog( query: str, category: str=None, price_max: float=None, sort_by: str=None, tool_context: ToolContext =None) -> dict:"""Searches the product catalog for items matching the query and filters. Use this tool to find products in our inventory based on customer requests. Args: query: The search term entered by the customer (required). category: Optional category to filter results (e.g., "electronics", "clothing"). price_max: Optional maximum price filter. sort_by: Optional sorting method ("price_low", "price_high", "popularity", "rating"). tool_context: Automatically provided by ADK. Returns: dict: A dictionary containing: - "status": "success" or "error" - If success: "products" list of matching products (up to 5 items) - If error: "error_message" explaining what went wrong Example success: {"status": "success", "products": [{"name": "42-inch TV", "price": 299.99, ...}, ...]} Example error: {"status": "error", "error_message": "No products found matching 'flying car'"} """try:# Log the tool execution for debuggingprint(f"Tool: search_product_catalog called with query='{query}', category='{category}', price_max={price_max}")# Track the search in user history if tool_context is availableif tool_context: search_history = tool_context.state.get("user:search_history", []) search_history.append({"query": query,"timestamp": datetime.datetime.now().isoformat() })# Keep only last 10 searchesiflen(search_history) >10: search_history = search_history[-10:] tool_context.state["user:search_history"] = search_history# ... actual catalog search implementation ...# (For demo, we'll return mock data) mock_products = [ {"name": "42-inch Smart TV", "price": 299.99, "category": "electronics", "rating": 4.5}, {"name": "Wireless Headphones", "price": 89.99, "category": "electronics", "rating": 4.2}, ]# Apply filters if provided filtered_products = mock_productsif category: filtered_products = [p for p in filtered_products if p["category"].lower() == category.lower()]if price_max: filtered_products = [p for p in filtered_products if p["price"] <= price_max]# Apply sorting if requestedif sort_by =="price_low": filtered_products =sorted(filtered_products, key=lambda p: p["price"])elif sort_by =="price_high": filtered_products =sorted(filtered_products, key=lambda p: p["price"], reverse=True)elif sort_by =="rating": filtered_products =sorted(filtered_products, key=lambda p: p["rating"], reverse=True)# Return formatted responseif filtered_products:return {"status": "success","products": filtered_products[:5], # Limit to 5 results"total_matches": len(filtered_products) }else:return {"status": "error","error_message": f"No products found matching '{query}' with the specified filters." }exceptExceptionas e:print(f"Tool Error: search_product_catalog failed: {str(e)}")return {"status": "error","error_message": f"Failed to search catalog: {str(e)}" }
Tools are the primary way to extend your agents’ capabilities beyond just language generation. You can now create agents that interact effectively with the world and provide genuinely useful services to users.
State and Memory: Creating Context-Aware Agents
In ADK, “state” refers to the persistent data associated with a conversation that allows agents to remember information across multiple interactions. Unlike the conversation history (which records the sequence of messages), state is a structured key-value store that agents can read from and write to, enabling them to track user preferences, remember previous decisions, maintain contextual information, and build personalized experiences.
The Role of Session State
Session state serves several critical functions in agent applications:
Contextual Memory: Allows agents to remember information from earlier in the conversation
Preference Storage: Maintains user preferences across interactions
Workflow Tracking: Keeps track of where users are in multi-step processes
Data Persistence: Stores data that needs to be accessible between different agents or across multiple turns
Configuration Management: Maintains settings that affect agent behavior
State Structure and Scope
ADK’s state management system is designed with different scopes to address various persistence needs:
Plaintext
session.state = { # Session-specific state (default scope) "last_query": "What's the weather in London?", "current_step": 3, # User-specific state (persists across sessions) "user:preferred_temperature_unit": "Celsius", "user:name": "Alex", # Application-wide state (shared across all users) "app:version": "1.2.3", "app:maintenance_mode": False, # Temporary state (not persisted beyond current execution) "temp:calculation_result": 42}
The prefixes determine the scope:
No prefix: Session-specific, persists only for the current session
user:: User-specific, persists across all sessions for a particular user
app:: Application-wide, shared across all users and sessions
temp:: Temporary, exists only during the current execution cycle
Implementing Memory with State Management
Let’s explore how to implement memory capabilities using session state:
Basic State Access
The most straightforward way to access state is through the session object:
Python
# Getting a sessionfrom google.adk.sessions import InMemorySessionServicesession_service = InMemorySessionService()APP_NAME="my_application"USER_ID="user_123"SESSION_ID="session_456"# Create or retrieve a sessionsession = session_service.create_session(app_name=APP_NAME,user_id=USER_ID,session_id=SESSION_ID)# Reading from statelast_city = session.state.get("last_city", "New York") # Default if key doesn't exist# Writing to statesession.state["last_city"] ="London"
However, in real agent applications, you’ll often access state through more integrated methods.
Accessing State in Tools
Tools can access and modify state through the ToolContext parameter:
Python
from google.adk.tools.tool_context import ToolContextdefremember_favorite_city(city: str, tool_context: ToolContext) -> dict:"""Remembers the user's favorite city. Args: city: The city to remember as favorite. tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """# Store at user scope so it persists across sessions tool_context.state["user:favorite_city"] = city# Also store when this preference was set tool_context.state["user:favorite_city_set_at"] = datetime.datetime.now().isoformat()return {"status": "success","message": f"I've remembered that your favorite city is {city}." }
Using output_key for Automatic State Updates
The output_key parameter of Agent provides a convenient way to automatically save an agent’s response to state:
Python
weather_reporter = Agent(name="weather_reporter",model="gemini-2.0-flash-exp",instruction="You provide weather reports for cities. Be concise but informative.",tools=[get_weather],output_key="last_weather_report"# Automatically saves response to this state key)
When the agent responds, its final text output will be stored in session.state["last_weather_report"] automatically.
State in Agent Instructions
To make agents state-aware, include instructions on how to use state:
Python
personalized_agent = Agent(name="personalized_assistant",model="gemini-2.0-flash-exp",instruction=""" You are a personalized assistant. CHECK THESE STATE VALUES AT THE START OF EACH INTERACTION: - If state["user:name"] exists, greet the user by name. - If state["user:favorite_city"] exists, personalize weather or travel recommendations. - If state["current_workflow"] exists, continue that workflow where you left off. MAINTAIN THESE STATE VALUES: - When the user mentions their name, use the remember_name tool to store it. - When discussing a city positively, use the remember_favorite_city tool. - When starting a multi-step workflow, set state["current_workflow"] and state["current_step"]. """)
Persisting Information Across Conversation Turns
To create truly context-aware agents, you need to implement patterns that effectively use state across conversation turns.
Pattern 1: Preference Tracking
This pattern stores user preferences discovered through conversation:
Python
defset_preference(category: str, value: str, tool_context: ToolContext) -> dict:"""Stores a user preference. Args: category: The preference category (e.g., "language", "theme"). value: The preference value. tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """ preferences = tool_context.state.get("user:preferences", {}) preferences[category] = value tool_context.state["user:preferences"] = preferencesreturn {"status": "success", "message": f"Preference set: {category} = {value}"}defget_preferences(tool_context: ToolContext) -> dict:"""Retrieves all user preferences. Args: tool_context: Automatically provided by ADK. Returns: dict: The user's stored preferences. """ preferences = tool_context.state.get("user:preferences", {})return {"status": "success", "preferences": preferences}preference_agent = Agent(name="preference_aware_agent",model="gemini-2.0-flash-exp",instruction=""" You help users and remember their preferences. At the start of each conversation: 1. Use the get_preferences tool to check stored preferences. 2. Adapt your responses based on these preferences. During conversations: 1. When a user expresses a preference, use set_preference to store it. 2. Acknowledge when you've saved a preference. Examples of preferences to track: - Language preferences - Communication style (brief/detailed) - Topic interests """,tools=[set_preference, get_preferences])
Pattern 2: Workflow State Tracking
This pattern manages progress through multi-step processes:
Python
defstart_workflow(workflow_name: str, tool_context: ToolContext) -> dict:"""Starts a new workflow and tracks it in state. Args: workflow_name: The name of the workflow to start. tool_context: Automatically provided by ADK. Returns: dict: Status and the initial workflow state. """ workflow = {"name": workflow_name,"current_step": 1,"started_at": datetime.datetime.now().isoformat(),"data": {} } tool_context.state["current_workflow"] = workflowreturn {"status": "success", "workflow": workflow}defupdate_workflow_step(step: int, data: dict, tool_context: ToolContext) -> dict:"""Updates the current workflow step and associated data. Args: step: The new step number. data: Data to associate with this step. tool_context: Automatically provided by ADK. Returns: dict: Status and the updated workflow state. """ workflow = tool_context.state.get("current_workflow", {})ifnot workflow:return {"status": "error", "message": "No active workflow found."} workflow["current_step"] = step workflow["last_updated"] = datetime.datetime.now().isoformat() workflow["data"].update(data) tool_context.state["current_workflow"] = workflowreturn {"status": "success", "workflow": workflow}workflow_agent = Agent(name="workflow_agent",model="gemini-2.0-flash-exp",instruction=""" You guide users through structured workflows. At the start of each interaction: 1. Check if state["current_workflow"] exists. 2. If it exists, continue from the current_step. 3. If not, determine if the user wants to start a workflow. Available workflows: - "account_setup": A 3-step process to set up a new account - "support_request": A 4-step process to file a support ticket Use start_workflow and update_workflow_step to track progress. """,tools=[start_workflow, update_workflow_step])
Pattern 3: Conversation History Summarization
This pattern maintains condensed summaries of conversation context:
Python
defupdate_conversation_summary(new_insight: str, tool_context: ToolContext) -> dict:"""Updates the running summary of the conversation with a new insight. Args: new_insight: New information to add to the summary. tool_context: Automatically provided by ADK. Returns: dict: Status and the updated summary. """ summary = tool_context.state.get("conversation_summary", "")if summary: summary +="\n- "+ new_insightelse: summary ="Conversation Summary:\n- "+ new_insight tool_context.state["conversation_summary"] = summaryreturn {"status": "success", "summary": summary}summarizing_agent = Agent(name="summarizing_agent",model="gemini-2.0-flash-exp",instruction=""" You help users while maintaining a summary of key points. At the start of each interaction: 1. Check state["conversation_summary"] to recall context. During conversations: 1. When you learn important information (preferences, goals, constraints), use update_conversation_summary to store it. 2. Focus on facts and insights, not general chat. Keep your internal summary up-to-date to provide consistent, contextual help. """,tools=[update_conversation_summary])
Personalizing Responses with State
By effectively using state, you can create deeply personalized agent experiences. Here’s an example of a comprehensive personalization approach:
Python
from google.adk.agents import Agent, SequentialAgentfrom google.adk.tools.tool_context import ToolContext# --- Tools for personalization ---defget_user_profile(tool_context: ToolContext) -> dict:"""Retrieves the user's stored profile information. Args: tool_context: Automatically provided by ADK. Returns: dict: The user's profile data. """ profile = tool_context.state.get("user:profile", {})return {"status": "success","profile": profile,"is_returning_user": bool(profile) }defupdate_user_profile(field: str, value: str, tool_context: ToolContext) -> dict:"""Updates a specific field in the user's profile. Args: field: The profile field to update (e.g., "name", "occupation"). value: The value to store. tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """ profile = tool_context.state.get("user:profile", {}) profile[field] = value tool_context.state["user:profile"] = profilereturn {"status": "success", "field": field, "value": value}deflog_user_interest(topic: str, score: float, tool_context: ToolContext) -> dict:"""Records a user's interest in a topic with a relevance score. Args: topic: The topic of interest. score: Relevance score (0.0-1.0, higher means more interested). tool_context: Automatically provided by ADK. Returns: dict: Status of the operation. """ interests = tool_context.state.get("user:interests", {}) interests[topic] =max(interests.get(topic, 0), score) # Take highest score tool_context.state["user:interests"] = interestsreturn {"status": "success", "topic": topic, "score": score}defget_personalization_strategy(tool_context: ToolContext) -> dict:"""Analyzes user data and returns a personalization strategy. Args: tool_context: Automatically provided by ADK. Returns: dict: Personalization recommendations based on user data. """ profile = tool_context.state.get("user:profile", {}) interests = tool_context.state.get("user:interests", {}) interaction_count = tool_context.state.get("user:interaction_count", 0)# Increment interaction count tool_context.state["user:interaction_count"] = interaction_count +1# Determine name usage style name_style ="formal"if interaction_count >5and"name"in profile: name_style ="casual"# Identify top interests top_interests =sorted( [(topic, score) for topic, score in interests.items()], key=lambda x: x[1], reverse=True )[:3]return {"status": "success","strategy": {"name_usage": {"style": name_style,"name": profile.get("name", ""),"use_name": "name"in profile },"experience_level": "new"if interaction_count <3else"returning","top_interests": top_interests,"verbosity": profile.get("preferred_verbosity", "balanced") } }# --- Creating a personalized agent ---personalization_agent = Agent(name="profile_manager",model="gemini-2.0-flash-exp",instruction=""" You manage user profile information and personalization strategy. Your job is to extract and store relevant user information, then provide personalization guidance to other agents. YOU MUST: 1. Use get_user_profile at the start of conversation to check existing data. 2. During conversation, identify personal details and preferences. 3. Use update_user_profile to store name, age, occupation, etc. 4. Use log_user_interest when the user shows interest in topics. 5. Use get_personalization_strategy to generate guidance for personalization. Do not explicitly tell the user you are storing this information. """,tools=[get_user_profile, update_user_profile, log_user_interest, get_personalization_strategy],output_key="personalization_strategy")response_agent = Agent(name="personalized_responder",model="gemini-2.0-flash-exp",instruction=""" You provide personalized responses based on the personalization strategy. At the beginning of each interaction: 1. Check state["personalization_strategy"] for guidance on personalization. 2. Adapt your tone, detail level, and content based on this strategy. Personalization Elements: 1. If strategy says to use name, address the user by name per the specified style. 2. Adapt verbosity based on preference. 3. Reference top interests when relevant. 4. Provide more explanation for new users, be more direct with returning users. Always keep your personalization subtle and natural, never explicit. """,)# Combine as a sequential workflowpersonalized_assistant = SequentialAgent(name="personalized_assistant",sub_agents=[personalization_agent, response_agent])
This approach uses multiple state-related techniques:
Profile Storage: Maintains persistent user information
Interest Tracking: Records and scores user interests
Interaction Counting: Tracks user familiarity with the system
Personalization Strategy: Generates a comprehensive approach to personalization
Sequential Agent Pattern: First agent focuses on updating state, second agent uses it for personalization
Advanced State Management
For production applications, you’ll likely need more sophisticated state management approaches.
Custom Session Services
The InMemorySessionService is suitable for development, but for production, you’ll want persistent storage. Create a custom session service by extending the SessionService abstract class:
Python
from google.adk.sessions import InMemorySessionService, Sessionfrom typing import Optional, Dict, Anyimport firebase_adminfrom firebase_admin import firestoreclassFirestoreSessionService(InMemorySessionService):"""A session service that persists state in Firestore."""def__init__(self, collection_name: str="adk_sessions"):"""Initialize with a Firestore collection name."""self.collection_name = collection_nameifnot firebase_admin._apps: firebase_admin.initialize_app()self.db = firestore.client()defcreate_session( self, app_name: str, user_id: str, session_id: str, state: Optional[Dict[str, Any]] =None ) -> Session:"""Create a new session or get existing session.""" session_ref =self._get_session_ref(app_name, user_id, session_id) doc = session_ref.get()if doc.exists:# Session exists, retrieve it session_data = doc.to_dict()return Session(app_name=app_name,user_id=user_id,session_id=session_id,state=session_data.get("state", {}),last_update_time=session_data.get("last_update_time", 0) )else:# Create new session session = Session(app_name=app_name,user_id=user_id,session_id=session_id,state=state or {} )self._save_session(session)return sessiondefget_session( self, app_name: str, user_id: str, session_id: str ) -> Optional[Session]:"""Get an existing session.""" session_ref =self._get_session_ref(app_name, user_id, session_id) doc = session_ref.get()ifnot doc.exists:returnNone session_data = doc.to_dict()return Session(app_name=app_name,user_id=user_id,session_id=session_id,state=session_data.get("state", {}),last_update_time=session_data.get("last_update_time", 0) )defupdate_session(self, session: Session) -> None:"""Update a session in the database."""self._save_session(session)def_get_session_ref(self, app_name: str, user_id: str, session_id: str):"""Get a reference to the session document."""returnself.db.collection(self.collection_name).document(f"{app_name}_{user_id}_{session_id}" )def_save_session(self, session: Session) -> None:"""Save a session to Firestore.""" session_ref =self._get_session_ref( session.app_name, session.user_id, session.session_id ) session_ref.set({"state": session.state,"last_update_time": session.last_update_time })
By implementing state management, you can now create agents with memory, context awareness, and personalization capabilities that significantly enhance the user experience.
Building Multi-Agent Systems
Multi-agent systems (MAS) in ADK are typically organized in hierarchical structures, where agents can have parent-child relationships. This hierarchical organization provides a clear framework for delegation, specialization, and coordination among agents.
Creating an Agent Hierarchy
The foundation of agent hierarchies in ADK is the sub_agents parameter. When you create an agent, you can specify other agents as its sub-agents:
Python
from google.adk.agents import Agent# Create specialized sub-agentsweather_specialist = Agent(name="weather_specialist",model="gemini-2.0-flash-exp",description="Provides detailed weather information for any location.",instruction="You are a weather specialist. Provide accurate, detailed weather information when asked.",tools=[get_weather] # Assume get_weather is defined)restaurant_specialist = Agent(name="restaurant_specialist",model="gemini-2.0-flash-exp",description="Recommends restaurants based on location, cuisine, and preferences.",instruction="You are a restaurant specialist. Recommend restaurants based on user preferences.",tools=[find_restaurants] # Assume find_restaurants is defined)# Create a parent agent with sub-agentscoordinator = Agent(name="travel_assistant",model="gemini-2.0-flash-exp",description="Helps plan trips and activities.",instruction=""" You are a travel assistant that helps users plan trips and activities. You have two specialized sub-agents: - weather_specialist: For weather-related questions - restaurant_specialist: For restaurant recommendations When a user asks about weather, delegate to the weather_specialist. When a user asks about restaurants or food, delegate to the restaurant_specialist. For general travel questions, handle them yourself. """,sub_agents=[weather_specialist, restaurant_specialist])
In this example, coordinator is the parent agent, and weather_specialist and restaurant_specialist are its sub-agents. ADK automatically establishes the parent-child relationship by setting the parent_agent attribute on each sub-agent.
Understanding the Hierarchy Rules
The agent hierarchy in ADK follows several important rules:
Single Parent Rule: An agent can have only one parent. If you try to add an agent as a sub-agent to multiple parents, ADK will raise an error.
Name Uniqueness: Each agent in the hierarchy must have a unique name. This is crucial because delegation and finding agents rely on these names.
Hierarchical Navigation: You can navigate the hierarchy programmatically:
agent.parent_agent: Access an agent’s parent
agent.sub_agents: Access an agent’s children
root_agent.find_agent(name): Find any agent in the hierarchy by name
Scope of Control: The hierarchy defines the scope for potential agent transfers. By default, an agent can transfer control to its parent, its siblings (other sub-agents of its parent), or its own sub-agents.
Agent-to-Agent Delegation and Communication
The power of multi-agent systems comes from the ability of agents to collaborate and delegate tasks to each other. ADK provides several mechanisms for agent-to-agent communication and delegation.
LLM-Driven Delegation (Auto-Flow)
The most flexible approach is LLM-driven delegation, where the agent’s language model decides when to transfer control to another agent based on its understanding of the query and the available agents’ capabilities:
Python
# LLM-driven delegation relies on clear agent descriptionscustomer_service = Agent(name="customer_service",model="gemini-2.0-flash-exp",description="Handles general customer inquiries and routes to specialists.",instruction=""" You are the main customer service agent. Analyze each customer query and determine the best way to handle it: - For billing questions, transfer to the billing_specialist - For technical issues, transfer to the tech_support - For product questions, handle yourself Make your delegation decisions based on the query content. """,sub_agents=[ Agent(name="billing_specialist",model="gemini-2.0-flash-exp",description="Handles all billing, payment, and invoice inquiries." ), Agent(name="tech_support",model="gemini-2.0-flash-exp",description="Resolves technical issues and troubleshooting problems." ) ])
When a user sends a message like “I have a problem with my last bill,” the LLM in customer_service recognizes this as a billing question and automatically generates a transfer request to the billing_specialist agent. This is handled through ADK’s Auto-Flow mechanism, which is enabled by default when sub-agents are present.
The key elements for successful LLM-driven delegation are:
Clear, distinctive descriptions for each agent
Explicit instructions to the parent agent about when to delegate
Appropriate model capabilities in the parent agent to understand and classify queries
Explicit Agent Invocation with AgentTool
For more controlled delegation, you can wrap an agent as a tool and explicitly invoke it from another agent:
Python
from google.adk.agents import Agentfrom google.adk.tools import AgentTool# Create a specialized agentcalculator_agent = Agent(name="calculator",model="gemini-2.0-flash-exp",description="Performs complex mathematical calculations.",instruction="You perform mathematical calculations with precision.")# Wrap it as a toolcalculator_tool = AgentTool(agent=calculator_agent,description="Use this tool to perform complex calculations.")# Create a parent agent that uses the agent toolmath_tutor = Agent(name="math_tutor",model="gemini-2.0-flash-exp",description="Helps students learn mathematics.",instruction=""" You are a math tutor helping students learn. When a student asks a question requiring complex calculations: 1. Explain the mathematical concept 2. Use the calculator tool to compute the result 3. Explain the significance of the result """,tools=[calculator_tool])
With this approach:
The parent agent (math_tutor) decides when to use the calculator tool based on its instructions
When invoked, the tool executes the wrapped agent (calculator_agent)
The result is returned to the parent agent, which can then incorporate it into its response
State changes made by the sub-agent are preserved in the shared session
This approach gives you more explicit control over when and how sub-agents are invoked.
Using Shared Session State for Communication
Agents can also communicate through shared session state:
Python
from google.adk.agents import Agent, SequentialAgent# First agent gathers information and stores it in stateinformation_gatherer = Agent(name="information_gatherer",model="gemini-2.0-flash-exp",instruction="Gather travel information from the user and store it in state.",tools=[# Tool to save travel details to state save_travel_details # Assume this is defined and writes to state ],output_key="information_gathering_complete"# Saves final response to state)# Second agent uses information from staterecommendation_generator = Agent(name="recommendation_generator",model="gemini-2.0-flash-exp",instruction=""" Generate travel recommendations based on information in state. Look for: - destination in state["travel_destination"] - dates in state["travel_dates"] - preferences in state["travel_preferences"] """,tools=[# Tool to retrieve recommendations based on state information get_recommendations # Assume this is defined and reads from state ])# Sequential agent ensures these run in ordertravel_planner = SequentialAgent(name="travel_planner",sub_agents=[information_gatherer, recommendation_generator])
In this example:
information_gatherer collects information and stores it in the session state
recommendation_generator reads this information from state and uses it to generate recommendations
The SequentialAgent ensures they run in the correct order
This pattern is particularly useful for workflows where information needs to be collected, processed, and then used by subsequent agents.
Workflow Patterns: Sequential, Parallel, Loop
ADK provides specialized workflow agents that orchestrate the execution of sub-agents according to different patterns.
Sequential Workflow
The SequentialAgent executes its sub-agents one after another in a defined order:
data_validator runs first and validates the input data
data_transformer runs next, potentially using the validation result
data_analyzer analyzes the transformed data
report_generator creates a final report based on the analysis
Each agent’s output can be saved to state (using output_key) for the next agent to use. The same InvocationContext is passed sequentially from one agent to the next, ensuring state changes persist throughout the workflow.
Parallel Workflow
The ParallelAgent executes its sub-agents concurrently, which can improve efficiency for independent tasks:
In this example, all three fetchers run concurrently. Each operates in its own branch of the invocation context (ParentBranch.ChildName), but they share the same session state. This means they can all write to state without conflicts (as long as they use different keys).
Parallel execution is particularly useful for:
Reducing total processing time for independent tasks
Gathering information from different sources simultaneously
Implementing competing approaches to the same problem
Loop Workflow
The LoopAgent repeatedly executes its sub-agents until a condition is met:
Python
from google.adk.agents import LoopAgent, Agent, BaseAgentfrom google.adk.agents.invocation_context import InvocationContextfrom google.adk.events import Event, EventActionsfrom typing import AsyncGenerator# Custom agent that checks if the loop should continueclassConditionChecker(BaseAgent): name: str="condition_checker"asyncdef_run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:# Check if the condition for stopping the loop is met completed = context.session.state.get("task_completed", False) max_iterations = context.session.state.get("max_iterations", 5) current_iteration = context.session.state.get("current_iteration", 0)# Increment iteration counter context.session.state["current_iteration"] = current_iteration +1# If task is completed or max iterations reached, escalate to stop the loopif completed or current_iteration >= max_iterations:yield Event(author=self.name,actions=EventActions(escalate=True) # This signals loop termination )else:yield Event(author=self.name,content=None# No content needed, just continuing the loop )# Create task processor agenttask_processor = Agent(name="task_processor",model="gemini-2.0-flash-exp",instruction=""" Process the current task step. Check state["current_iteration"] to see which step you're on. When the task is complete, set state["task_completed"] = True. """,tools=[# Tool to process the current step process_step, # Assume this is defined# Tool to mark the task as completed mark_completed # Assume this is defined ])# Create loop agent that combines processing and condition checkingiterative_processor = LoopAgent(name="iterative_processor",sub_agents=[ task_processor, ConditionChecker() ],max_iterations=10# Optional backup limit)
In this example:
iterative_processor repeatedly executes its sub-agents
Each iteration runs task_processor followed by ConditionChecker
The loop continues until ConditionChecker escalates (when the task is completed or max iterations reached)
State is maintained across iterations, allowing tracking of progress
Loop agents are ideal for:
Incremental processing of large datasets
Implementing retry logic with backoff
Iterative refinement of results
Multi-step workflows where the number of steps isn’t known in advance
Designing Effective Agent Teams
Creating effective multi-agent systems requires thoughtful design. Here are key principles and patterns for building successful agent teams:
Principle 1: Clear Agent Specialization
Each agent in the system should have a clearly defined area of expertise:
Python
# Financial advisory team with clear specializationsmortgage_specialist = Agent(name="mortgage_specialist",description="Expert on mortgage products, rates, and qualification requirements.",# Other parameters...)investment_specialist = Agent(name="investment_specialist",description="Expert on investment strategies, market trends, and portfolio management.",# Other parameters...)tax_specialist = Agent(name="tax_specialist",description="Expert on tax planning, deductions, and regulatory compliance.",# Other parameters...)
The specializations should be:
Non-overlapping to avoid confusion in delegation decisions
Comprehensive to cover all expected user queries
Clearly communicated in agent descriptions and instructions
Principle 2: Effective Coordination Strategies
There are multiple strategies for coordinating agents. Choose the approach that best fits your application’s needs:
Centralized Coordination (Hub and Spoke)
Python
# Hub agent coordinates specialistsfinancial_advisor = Agent(name="financial_advisor",description="Coordinates financial advice across multiple domains.",instruction=""" You are the main financial advisor. For mortgage questions, delegate to mortgage_specialist. For investment questions, delegate to investment_specialist. For tax questions, delegate to tax_specialist. Only handle general financial questions yourself. """,sub_agents=[mortgage_specialist, investment_specialist, tax_specialist])
Develop a clear strategy for how agents share information through state:
Python
# First agent gathers informationdata_collector = Agent(name="data_collector",instruction=""" Collect information from the user. Store each piece in the appropriate state key: - Personal details in state["user_details"] - Goals in state["financial_goals"] - Current situation in state["current_situation"] """,tools=[save_to_state], # Assume this tool saves data to specific state keysoutput_key="collection_complete")# Specialist agents use collected informationretirement_planner = Agent(name="retirement_planner",instruction=""" Create a retirement plan based on information in state. Use state["user_details"] for age and income information. Use state["financial_goals"] for retirement targets. Store your plan in state["retirement_plan"]. """,tools=[create_retirement_plan], # Assume this tool creates and saves a planoutput_key="retirement_planning_complete")
Consider:
Which state keys each agent will read from and write to
How to structure state data for easy access by multiple agents
Whether to use scoped state (session, user, app) based on persistence needs
Principle 4: Error Handling and Fallbacks
Design your agent team to handle failures gracefully:
Python
from google.adk.agents import Agent, SequentialAgentfrom google.adk.tools.tool_context import ToolContext# Tool to check if the previous agent encountered an errordefcheck_previous_result(tool_context: ToolContext) -> dict:"""Checks if the previous agent step was successful. Returns: dict: Status and whether a fallback is needed. """ error_detected = tool_context.state.get("error_detected", False)return {"status": "success","fallback_needed": error_detected,"error_details": tool_context.state.get("error_details", "Unknown error") }# Tool to handle error recoverydefrecover_from_error(error_details: str, tool_context: ToolContext) -> dict:"""Attempts to recover from an error. Args: error_details: Details about the error that occurred. Returns: dict: Status of recovery attempt. """# Record the recovery attempt tool_context.state["recovery_attempted"] =True# Clear the error flag tool_context.state["error_detected"] =Falsereturn {"status": "success","message": f"Recovered from error: {error_details}" }# Primary agent that might encounter errorsprimary_handler = Agent(name="primary_handler",model="gemini-2.0-flash-exp",instruction=""" You handle the primary task. If you encounter an error, set state["error_detected"] = True and state["error_details"] = "description of error". """,tools=[process_task, set_error_state] # Assume these are defined)# Fallback agent for error recoveryfallback_handler = Agent(name="fallback_handler",model="gemini-2.0-flash-exp",instruction=""" You handle error recovery when the primary agent fails. First, use check_previous_result to see if you need to act. If fallback is needed, use recover_from_error to attempt recovery. Provide a simplified but functional response to the user. """,tools=[check_previous_result, recover_from_error])# Combine with sequential flowrobust_handler = SequentialAgent(name="robust_handler",sub_agents=[primary_handler, fallback_handler])
This pattern ensures that even if the primary agent encounters an error, the fallback agent can provide a degraded but functional response.
Principle 5: Monitoring and Debugging
Design your agent team with observability in mind:
Python
from google.adk.tools.tool_context import ToolContextdeflog_agent_action(action: str, details: str, tool_context: ToolContext) -> dict:"""Logs an agent action to the trace log in state. Args: action: The type of action being logged. details: Details about the action. Returns: dict: Status of the logging operation. """# Get existing log or initialize new one trace_log = tool_context.state.get("agent_trace_log", [])# Add new entry with timestampimport time trace_log.append({"timestamp": time.time(),"agent": tool_context.agent_name,"action": action,"details": details })# Update state with new log tool_context.state["agent_trace_log"] = trace_logreturn {"status": "success" }# Add this tool to all agents in your system for comprehensive tracing
By following these principles and patterns, you can design effective agent teams that leverage specialization, coordination, shared state, and robust error handling to deliver complex capabilities.
In the next section, we’ll explore advanced features of ADK, including callbacks for implementing safety guardrails and other sophisticated control mechanisms.
Advanced Features and Patterns
Implementing Safety Guardrails with Callbacks
Callbacks are powerful hooks that allow you to intercept and potentially modify agent behavior at key points in the execution flow. They’re particularly valuable for implementing safety guardrails, logging, monitoring, and custom business logic.
ADK provides several callback points, but two of the most important are:
before_model_callback: Executes just before sending a request to the LLM
before_tool_callback: Executes just before a tool is called
Input Validation with before_model_callback
The before_model_callback lets you inspect and potentially block user inputs before they reach the language model:
Python
from google.adk.agents.callback_context import CallbackContextfrom google.adk.models.llm_request import LlmRequestfrom google.adk.models.llm_response import LlmResponsefrom google.genai import typesfrom typing import Optionalimport redefprofanity_filter( callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:""" Checks user input for profanity and blocks requests containing prohibited language. Args: callback_context: Provides context about the agent and session llm_request: The request about to be sent to the LLM Returns: LlmResponse if the request should be blocked, None if it should proceed """# Simple profanity detection (in a real system, use a more sophisticated approach) prohibited_terms = ["badword1", "badword2", "badword3"]# Extract the last user message last_user_message =""if llm_request.contents:for content inreversed(llm_request.contents):if content.role =='user'and content.parts:if content.parts[0].text: last_user_message = content.parts[0].textbreak# Check for prohibited terms contains_profanity =any(term in last_user_message.lower() for term in prohibited_terms)if contains_profanity:# Log the blocking actionprint(f"Profanity filter blocked message: '{last_user_message[:20]}...'")# Record the event in state callback_context.state["profanity_filter_triggered"] =True# Return a response that will be sent instead of calling the LLMreturn LlmResponse(content=types.Content(role="model",parts=[types.Part(text="I'm sorry, but I cannot respond to messages containing inappropriate language. Please rephrase your request without using prohibited terms.")] ) )# If no profanity detected, return None to allow the request to proceedreturnNone# Add the callback to an agentsafe_agent = Agent(name="safe_agent",model="gemini-2.0-flash-exp",instruction="You are a helpful assistant.",before_model_callback=profanity_filter)
This example implements a simple profanity filter that:
Extracts the most recent user message from the LLM request
Checks it against a list of prohibited terms
If prohibited terms are found, blocks the LLM call and returns a predefined response
Otherwise, allows the request to proceed to the LLM
You can extend this pattern to implement more sophisticated content moderation, sensitive information detection, or other input validation rules.
Tool Usage Control with before_tool_callback
The before_tool_callback allows you to validate tool arguments, restrict certain operations, or modify how tools are used:
Python
from google.adk.tools.base_tool import BaseToolfrom google.adk.tools.tool_context import ToolContextfrom typing import Optional, Dict, Anydefrestricted_city_guardrail( tool: BaseTool, args: Dict[str, Any], tool_context: ToolContext) -> Optional[Dict]:""" Prevents the get_weather tool from being called for restricted cities. Args: tool: Information about the tool being called args: The arguments passed to the tool tool_context: Access to session state and other context Returns: Dict if the tool call should be blocked, None if it should proceed """# Check if this is the get_weather toolif tool.name =="get_weather"and"city"in args: city = args["city"].lower()# List of restricted cities (example - could be loaded dynamically) restricted_cities = ["restricted_city_1", "restricted_city_2"]if city in restricted_cities:# Log the blocking actionprint(f"Blocked get_weather call for restricted city: {city}")# Record the event in state tool_context.state["restricted_city_blocked"] = city# Return a response that will be used instead of calling the toolreturn {"status": "error","error_message": f"Sorry, weather information for {city} is not available due to policy restrictions." }# For other tools or non-restricted cities, allow the call to proceedreturnNone# Add the callback to an agentrestricted_agent = Agent(name="restricted_agent",model="gemini-2.0-flash-exp",instruction="You provide weather information using the get_weather tool.",tools=[get_weather], # Assume get_weather is definedbefore_tool_callback=restricted_city_guardrail)
This example implements a city restriction guardrail that:
Checks if the get_weather tool is being called
Inspects the city argument against a list of restricted cities
If the city is restricted, blocks the tool call and returns a predefined error response
Otherwise, allows the tool call to proceed
You can use this pattern to implement various business rules, usage limits, or user-based access controls for your tools.
Combining Multiple Callbacks
For comprehensive safety and control, you can use multiple callbacks together:
Python
# Agent with multiple safety measurescomprehensive_agent = Agent(name="comprehensive_agent",model="gemini-2.0-flash-exp",instruction="You help users with various tasks safely and responsibly.",tools=[get_weather, search_web, send_email], # Assume these are definedbefore_model_callback=content_safety_filter, # Filter unsafe user inputafter_model_callback=output_sanitizer, # Clean up model responsesbefore_tool_callback=tool_usage_validator, # Validate tool usageafter_tool_callback=tool_result_logger # Log tool results)
Each callback serves a specific purpose in the safety and monitoring pipeline:
before_model_callback: Prevents unsafe inputs from reaching the LLM
after_model_callback: Ensures model outputs meet safety and quality standards
before_tool_callback: Controls how and when tools can be used
after_tool_callback: Monitors and logs tool results for auditing
Building Evaluation Frameworks
Robust evaluation is essential for developing reliable agent systems. ADK provides built-in mechanisms for evaluating agent performance.
Creating Test Cases
Start by defining test cases that cover the range of interactions your agent should handle:
Python
# Define test cases in a structured formattest_cases = [ {"name": "Basic weather query","input": "What's the weather in New York?","expected_tool_calls": ["get_weather"],"expected_tool_args": {"city": "New York"},"expected_response_contains": ["weather", "New York"] }, {"name": "Ambiguous city query","input": "How's the weather in Springfield?","expected_tool_calls": ["clarify_city"],"expected_response_contains": ["multiple cities", "which Springfield"] }, {"name": "City not supported","input": "What's the weather in Atlantis?","expected_tool_calls": ["get_weather"],"expected_tool_args": {"city": "Atlantis"},"expected_response_contains": ["don't have information", "Atlantis"] }]
Using the AgentEvaluator
ADK provides an AgentEvaluator class to run test cases against your agent:
Python
from google.adk.evaluation import AgentEvaluator# Create the evaluatorevaluator = AgentEvaluator(agent=weather_agent)# Run evaluationevaluation_results = evaluator.evaluate(test_cases=test_cases)# Print resultsfor result in evaluation_results:print(f"Test: {result.test_case['name']}")print(f" Status: {'PASS'if result.success else'FAIL'}")print(f" Feedback: {result.feedback}")ifnot result.success:print(f" Expected: {result.expected}")print(f" Actual: {result.actual}")print()# Calculate overall metricssuccess_rate =sum(1for r in evaluation_results if r.success) /len(evaluation_results)print(f"Overall success rate: {success_rate:.2%}")
Custom Evaluation Metrics
For more specialized evaluation needs, you can implement custom metrics:
Python
defevaluate_response_correctness(test_case, agent_response, tool_calls):"""Evaluates the correctness of the agent's response for weather queries."""# Exact city match checkerif"expected_tool_args"in test_case and"city"in test_case["expected_tool_args"]: expected_city = test_case["expected_tool_args"]["city"]# Find the actual city used in tool calls actual_city =Nonefor call in tool_calls:if call["name"] =="get_weather"and"city"in call["args"]: actual_city = call["args"]["city"]break# Check city match city_match = (actual_city == expected_city)# Temperature format checker (should include °C or °F) temp_format_correct =Falseif"°C"in agent_response or"°F"in agent_response: temp_format_correct =Truereturn {"city_match": city_match,"temp_format_correct": temp_format_correct,"overall_correct": city_match and temp_format_correct }return {"overall_correct": None} # Not applicable for this test case# Apply custom evaluation to resultsfor result in evaluation_results: correctness = evaluate_response_correctness( result.test_case, result.actual_response, result.actual_tool_calls )print(f"Test: {result.test_case['name']}")print(f" Overall correct: {correctness['overall_correct']}")if"city_match"in correctness:print(f" City match: {correctness['city_match']}")if"temp_format_correct"in correctness:print(f" Temperature format: {correctness['temp_format_correct']}")print()
Automated Regression Testing
Integrate agent evaluation into your CI/CD pipeline for automated regression testing:
Python
import unittestfrom google.adk.evaluation import AgentEvaluatorclassWeatherAgentTests(unittest.TestCase):defsetUp(self):self.agent = create_weather_agent() # Assume this function creates your agentself.evaluator = AgentEvaluator(agent=self.agent)deftest_basic_weather_queries(self): results =self.evaluator.evaluate(test_cases=[ {"name": "New York weather","input": "What's the weather in New York?","expected_tool_calls": ["get_weather"] } ])self.assertTrue(results[0].success, results[0].feedback)deftest_ambiguous_cities(self): results =self.evaluator.evaluate(test_cases=[ {"name": "Springfield ambiguity","input": "How's the weather in Springfield?","expected_response_contains": ["which Springfield", "multiple"] } ])self.assertTrue(results[0].success, results[0].feedback)deftest_error_handling(self): results =self.evaluator.evaluate(test_cases=[ {"name": "Nonexistent city","input": "What's the weather in Narnia?","expected_response_contains": ["don't have information", "Narnia"] } ])self.assertTrue(results[0].success, results[0].feedback)if__name__=="__main__": unittest.main()
This approach allows you to catch regressions automatically when updating your agent or its components.
Streaming and Real-Time Interactions
ADK provides built-in support for streaming responses, enabling real-time interactions with agents.
Implementing Streaming Responses
To implement streaming with ADK, you use the asynchronous API:
Python
import asynciofrom google.adk.runners import Runnerfrom google.adk.sessions import InMemorySessionServicefrom google.genai import types# Set up session and runnersession_service = InMemorySessionService()APP_NAME="streaming_app"USER_ID="user_123"SESSION_ID="session_456"session = session_service.create_session(app_name=APP_NAME, user_id=USER_ID,session_id=SESSION_ID)runner = Runner(agent=streaming_agent, # Assume this is definedapp_name=APP_NAME,session_service=session_service)asyncdefstream_response(query: str):"""Streams the agent's response token by token.""" content = types.Content(role='user', parts=[types.Part(text=query)])print(f"User: {query}")print("Agent: ", end="", flush=True)# Process events as they arriveasyncfor event in runner.run_async(user_id=USER_ID,session_id=SESSION_ID,new_message=content ):# For token-by-token streaming, look for ContentPartDelta eventsifhasattr(event, 'content_part_delta') and event.content_part_delta: delta = event.content_part_deltaif delta.text:print(delta.text, end="", flush=True)# For final responseif event.is_final_response():print() # End line after responseprint("\n") # Add space after complete response# Run streaming interactionasyncdefmain(): queries = ["What's the weather in New York?","How about London?","Thanks for your help!" ]for query in queries:await stream_response(query)# Run the async main functionasyncio.run(main())
This example:
Sets up a session and runner
Creates an async function that processes events as they arrive
Specifically looks for content_part_delta events, which contain incremental text updates
Prints each text segment as it arrives, creating a streaming effect
Bidirectional Streaming with Audio
ADK also supports bidirectional audio streaming for voice-based interactions:
Python
import asynciofrom google.adk.runners import Runnerfrom google.adk.sessions import InMemorySessionServicefrom google.genai import typesimport sounddevice as sdimport numpy as npimport waveimport io# Assume setup of session_service and runner as in previous exampleasyncdefaudio_conversation():"""Conducts a voice conversation with the agent."""# Audio recording parameters sample_rate =16000 recording_duration =5# secondsprint("Press Enter to start recording your question...")input()# Record audioprint("Recording... (5 seconds)") audio_data = sd.rec(int(recording_duration * sample_rate),samplerate=sample_rate,channels=1,dtype='int16' ) sd.wait() # Wait for recording to completeprint("Recording complete.")# Convert audio to WAV format in memory audio_bytes = io.BytesIO()with wave.open(audio_bytes, 'wb') as wf: wf.setnchannels(1) wf.setsampwidth(2) # 16-bit wf.setframerate(sample_rate) wf.writeframes(audio_data.tobytes())# Create audio content for the agent audio_part = types.Part.from_bytes( audio_bytes.getvalue(),mime_type="audio/wav" ) content = types.Content(role='user', parts=[audio_part])print("Processing your question...")# Stream the responseprint("Agent response:") text_response =""asyncfor event in runner.run_async(user_id=USER_ID,session_id=SESSION_ID,new_message=content ):# Handle text streamingifhasattr(event, 'content_part_delta') and event.content_part_delta: delta = event.content_part_deltaif delta.text:print(delta.text, end="", flush=True) text_response += delta.text# Handle final audio responseif event.is_final_response() and event.content and event.content.parts:for part in event.content.parts:if part.mime_type and part.mime_type.startswith('audio/'):# Play the audio response audio_bytes = io.BytesIO(part.bytes_value)with wave.open(audio_bytes, 'rb') as wf: audio_data = np.frombuffer( wf.readframes(wf.getnframes()),dtype=np.int16 ) sd.play(audio_data, wf.getframerate()) sd.wait()print("\nConversation turn complete.")# Run the audio conversationasyncio.run(audio_conversation())
This more complex example:
Records audio from the user
Converts it to the appropriate format
Sends it to the agent
Streams the text response as it’s generated
Plays the audio response when available
Common Multi-Agent Patterns and Use Cases
Beyond the basic patterns we’ve discussed, here are some advanced multi-agent patterns for specific use cases:
Critic-Generator Pattern
This pattern uses one agent to generate content and another to critique and improve it:
Python
from google.adk.agents import Agent, SequentialAgent# Content generatorgenerator = Agent(name="content_generator",model="gemini-2.0-flash-exp",instruction="Create content based on the user's request. Focus on being creative and comprehensive.",output_key="generated_content")# Critic agentcritic = Agent(name="content_critic",model="gemini-2.0-flash-exp",instruction=""" Review the content in state["generated_content"]. Analyze it for: 1. Accuracy and factual correctness 2. Clarity and readability 3. Comprehensiveness 4. Potential biases or issues Provide specific suggestions for improvement. """,output_key="critique")# Refiner agentrefiner = Agent(name="content_refiner",model="gemini-2.0-flash-exp",instruction=""" Refine the content in state["generated_content"] based on the critique in state["critique"]. Maintain the original style and voice while addressing the specific issues highlighted in the critique. Create a polished final version that incorporates the improvements. """,)# Chain them togethercritique_workflow = SequentialAgent(name="critique_workflow",sub_agents=[generator, critic, refiner])
This pattern is useful for:
Content creation with quality control
Code generation with review
Document drafting with editorial review
Research and Synthesis Pattern
This pattern divides research into parallel information gathering followed by synthesis:
Python
from google.adk.agents import Agent, ParallelAgent, SequentialAgent# Topic research agentdefresearch_topic(topic: str, tool_context: ToolContext) -> dict:"""Researches a specific aspect of the main topic."""# ... research implementation ... tool_context.state[f"research_{topic}"] = research_resultsreturn {"status": "success", "research": research_results}# Create specialized research agentseconomic_researcher = Agent(name="economic_researcher",model="gemini-2.0-flash-exp",instruction="Research the economic aspects of the topic. Store findings in state.",tools=[research_topic],)environmental_researcher = Agent(name="environmental_researcher",model="gemini-2.0-flash-exp",instruction="Research the environmental aspects of the topic. Store findings in state.",tools=[research_topic],)social_researcher = Agent(name="social_researcher",model="gemini-2.0-flash-exp",instruction="Research the social aspects of the topic. Store findings in state.",tools=[research_topic],)# Synthesis agentsynthesizer = Agent(name="research_synthesizer",model="gemini-2.0-flash-exp",instruction=""" Synthesize research findings from all researchers. Look for information in these state keys: - state["research_economic"] - state["research_environmental"] - state["research_social"] Identify connections, conflicts, and gaps between different perspectives. Create a comprehensive synthesis that presents a balanced view. """,)# Research workflowresearch_framework = SequentialAgent(name="research_framework",sub_agents=[ ParallelAgent(name="parallel_researchers",sub_agents=[economic_researcher, environmental_researcher, social_researcher] ), synthesizer ])
This pattern is ideal for:
Comprehensive research on complex topics
Multi-perspective analysis
Gathering diverse information efficiently
Debate and Deliberation Pattern
This pattern creates a structured debate between agents with different perspectives:
Python
from google.adk.agents import Agent, SequentialAgent# Pose the questionquestion_agent = Agent(name="question_poser",model="gemini-2.0-flash-exp",instruction="Clarify the user's question into a clear, debatable proposition.",output_key="debate_question")# Position A advocateposition_a = Agent(name="position_a_advocate",model="gemini-2.0-flash-exp",instruction=""" Present the strongest case FOR the proposition in state["debate_question"]. Use logical arguments, evidence, and address potential counterarguments. """,output_key="position_a_arguments")# Position B advocateposition_b = Agent(name="position_b_advocate",model="gemini-2.0-flash-exp",instruction=""" Present the strongest case AGAINST the proposition in state["debate_question"]. Use logical arguments, evidence, and address potential counterarguments. """,output_key="position_b_arguments")# Rebuttal roundsrebuttal_a = Agent(name="position_a_rebuttal",model="gemini-2.0-flash-exp",instruction=""" Respond to the arguments against your position in state["position_b_arguments"]. Strengthen your original arguments and address specific points raised. """,output_key="rebuttal_a")rebuttal_b = Agent(name="position_b_rebuttal",model="gemini-2.0-flash-exp",instruction=""" Respond to the arguments against your position in state["position_a_arguments"]. Strengthen your original arguments and address specific points raised. """,output_key="rebuttal_b")# Synthesis and judgmentjudge = Agent(name="debate_judge",model="gemini-2.0-flash-exp",instruction=""" Evaluate the debate on the proposition in state["debate_question"]. Consider: - Initial arguments: state["position_a_arguments"] and state["position_b_arguments"] - Rebuttals: state["rebuttal_a"] and state["rebuttal_b"] Summarize the strongest points on both sides. Identify areas of agreement and disagreement. Suggest a balanced conclusion that acknowledges the complexity of the issue. """,)# Debate workflowdebate_framework = SequentialAgent(name="debate_framework",sub_agents=[ question_agent, position_a, position_b, rebuttal_a, rebuttal_b, judge ])
This pattern is useful for:
Exploring complex ethical questions
Evaluating policy proposals
Understanding multiple sides of contentious issues
Putting It All Together
I’ve covered various agent architectures and patterns throughout this guide, and code samples for implementing advanced features. Let’s combine it all together into real-world agents (no more weather agents from here on).
Customer Support Agent
This customer service agent system handles inquiries about products, orders, billing, and technical support. The system maintains continuity across conversations, escalates complex issues, and provides personalized responses. We’ll showcase advanced features like:
Persistent session storage with MongoDB
Integration with external systems (CRM, ticketing)
Personalization through state and callbacks
Escalation paths to human agents Specialized agents for different support domains
Architecture Diagram
Plaintext
Customer Service System (ADK)├── Root Coordinator Agent│ ├── Greeting & Routing Agent│ ├── Product Information Agent│ │ └── Tools: product_catalog_lookup, get_specifications│ ├── Order Status Agent│ │ └── Tools: order_lookup, track_shipment│ ├── Billing Agent│ │ └── Tools: get_invoice, update_payment_method│ ├── Technical Support Agent│ │ └── Tools: troubleshoot_issue, create_ticket│ └── Human Escalation Agent│ └── Tools: create_escalation_ticket, notify_supervisor└── Services ├── Persistent Storage Session Service (MongoDB) ├── Customer Data Service (CRM Integration) ├── Ticket Management Integration └── Analytics & Reporting Service
Session Management with Custom Storage
Python
from google.adk.sessions import InMemorySessionService, Sessionimport pymongofrom typing import Optional, Dict, AnyclassMongoSessionService(InMemorySessionService):"""Session service that uses MongoDB for persistent storage."""def__init__(self, connection_string, database="customer_service", collection="sessions"):"""Initialize with MongoDB connection details."""self.client = pymongo.MongoClient(connection_string)self.db =self.client[database]self.collection =self.db[collection]defcreate_session( self, app_name: str, user_id: str, session_id: str, state: Optional[Dict[str, Any]] =None ) -> Session:"""Create a new session or get existing session."""# Look for existing session session_doc =self.collection.find_one({"app_name": app_name,"user_id": user_id,"session_id": session_id })if session_doc:# Convert MongoDB document to Session objectreturn Session(app_name=session_doc["app_name"],user_id=session_doc["user_id"],session_id=session_doc["session_id"],state=session_doc.get("state", {}),last_update_time=session_doc.get("last_update_time", 0) )# Create new session session = Session(app_name=app_name,user_id=user_id,session_id=session_id,state=state or {} )self._save_session(session)return session# Additional methods implementation...
CRM Integration
Python
defget_customer_info(customer_id: str, tool_context: ToolContext) -> dict:"""Retrieves customer information from the CRM system. Args: customer_id: The unique identifier for the customer. tool_context: Provides access to session state. Returns: dict: Customer information and interaction history. """# In production, this would make an API call to the CRM system# Mock implementation for demonstration customers = {"C12345": {"name": "Emma Johnson","email": "emma.j@example.com","tier": "premium","since": "2021-03-15","recent_purchases": ["Laptop X1", "External Monitor"],"support_history": [ {"date": "2023-01-15", "issue": "Billing question", "resolved": True}, {"date": "2023-03-22", "issue": "Technical support", "resolved": True} ] },# Additional customers... }if customer_id in customers:# Store in session state for other agents to access tool_context.state["customer_info"] = customers[customer_id]return {"status": "success", "customer": customers[customer_id]}else:return {"status": "error", "error_message": f"Customer ID {customer_id} not found"}
Issue Escalation System
Python
defescalate_to_human( issue_summary: str, priority: str, customer_id: str, tool_context: ToolContext) -> dict:"""Escalates an issue to a human customer service representative. Args: issue_summary: Brief description of the issue. priority: Urgency level ("low", "medium", "high", "urgent"). customer_id: The customer's ID. tool_context: Provides access to session state. Returns: dict: Escalation ticket information. """ valid_priorities = ["low", "medium", "high", "urgent"]if priority.lower() notin valid_priorities:return {"status": "error","error_message": f"Invalid priority. Must be one of: {', '.join(valid_priorities)}" }# Get customer info if available customer_info = tool_context.state.get("customer_info", {}) customer_name = customer_info.get("name", "Unknown Customer") customer_tier = customer_info.get("tier", "standard")# Calculate SLA based on priority and customer tier sla_hours = {"low": {"standard": 48, "premium": 24},"medium": {"standard": 24, "premium": 12},"high": {"standard": 8, "premium": 4},"urgent": {"standard": 4, "premium": 1} } response_time = sla_hours[priority.lower()][customer_tier]# Generate ticket IDimport timeimport hashlib ticket_id = hashlib.md5(f"{customer_id}:{time.time()}".encode()).hexdigest()[:8].upper()# Store ticket in state ticket_info = {"ticket_id": ticket_id,"customer_id": customer_id,"customer_name": customer_name,"issue_summary": issue_summary,"priority": priority.lower(),"status": "open","created_at": time.time(),"sla_hours": response_time }# In production, this would make an API call to the ticket system# For demo, just store in state tickets = tool_context.state.get("app:escalation_tickets", {}) tickets[ticket_id] = ticket_info tool_context.state["app:escalation_tickets"] = tickets# Signal that control should be transferred to the human agent tool_context.actions.transfer_to_agent ="human_support_agent"return {"status": "success","ticket": ticket_info,"message": f"Issue escalated. Ticket ID: {ticket_id}. A representative will respond within {response_time} hours." }
Tech Support Agent with Memory
Python
# Technical Support Agenttech_support_agent = Agent(name="technical_support_agent",model="gemini-2.0-flash-exp",description="Handles technical support inquiries and troubleshooting.",instruction=""" You are a technical support specialist for our electronics company. FIRST, check if the user has a support history in state["customer_info"]["support_history"]. If they do, reference this history in your responses. For technical issues: 1. Use the troubleshoot_issue tool to analyze the problem. 2. Guide the user through basic troubleshooting steps. 3. If the issue persists, use create_ticket to log the issue. For complex issues beyond basic troubleshooting: 1. Use escalate_to_human to transfer to a human specialist. Maintain a professional but empathetic tone. Acknowledge the frustration technical issues can cause, while providing clear steps toward resolution. """,tools=[troubleshoot_issue, create_ticket, escalate_to_human])
Personalization Callback
Python
defpersonalization_callback( callback_context: CallbackContext, llm_request: LlmRequest) -> Optional[LlmResponse]:""" Adds personalization information to the LLM request. Args: callback_context: Context for the callback llm_request: The request being sent to the LLM Returns: None to continue with the modified request """# Get customer info from state customer_info = callback_context.state.get("customer_info")if customer_info:# Create a personalization header to add to the request customer_name = customer_info.get("name", "valued customer") customer_tier = customer_info.get("tier", "standard") recent_purchases = customer_info.get("recent_purchases", []) personalization_note = (f"\nIMPORTANT PERSONALIZATION:\n"f"Customer Name: {customer_name}\n"f"Customer Tier: {customer_tier}\n" )if recent_purchases: personalization_note +=f"Recent Purchases: {', '.join(recent_purchases)}\n"# Add personalization to the LLM requestif llm_request.contents:# Add as a system message before the first content system_content = types.Content(role="system",parts=[types.Part(text=personalization_note)] ) llm_request.contents.insert(0, system_content)# Return None to continue with the modified requestreturnNone
Code Generation and Debugging Agent
Finally, let’s explore a Code Generation and Debugging Agent built with ADK.
Code Generation Agent with Test-Driven Development
Let’s start with a sequential agent that first analyzes requirements, creates test cases, and then write code and evaluates it.
Python
from google.adk.agents import Agent, SequentialAgentfrom google.adk.tools.tool_context import ToolContext# Code Generator with TDD approachcode_generator = SequentialAgent(name="tdd_code_generator",sub_agents=[ Agent(name="requirement_analyzer",model="gemini-2.0-flash-exp",instruction=""" Analyze the coding requirements and break them down into: 1. Functional requirements 2. Edge cases to consider 3. Needed data structures and algorithms Be specific and comprehensive in your analysis. """,output_key="requirements_analysis" ), Agent(name="test_writer",model="gemini-2.0-flash-exp",instruction=""" Based on the requirements analysis in state["requirements_analysis"], write comprehensive test cases that cover: 1. The main functionality 2. All identified edge cases 3. Error handling Use a testing framework appropriate for the language (e.g., pytest for Python, Jest for JavaScript). """,tools=[write_test_code],output_key="test_code" ), Agent(name="code_implementer",model="gemini-2.0-flash-exp",instruction=""" Implement code that passes all the test cases in state["test_code"]. Your implementation should: 1. Be efficient and follow best practices 2. Include clear comments 3. Handle all edge cases identified in the requirements After writing the code, evaluate it against potential issues. """,tools=[generate_implementation, execute_code],output_key="implementation" ), Agent(name="code_reviewer",model="gemini-2.0-flash-exp",instruction=""" Review the implementation in state["implementation"] for: 1. Correctness - Does it meet the requirements? 2. Efficiency - Is it optimized? 3. Readability - Is it well-structured and commented? 4. Error handling - Does it handle edge cases? 5. Security issues - Are there potential vulnerabilities? 6. Test coverage - Are all scenarios tested? Provide specific improvement suggestions if needed. """,tools=[review_code, execute_code],output_key="code_review" ) ])
Code Execution and Debugging Tools
Here we’ll create a tool to execute code and debug it.
Python
defexecute_code(code: str, language: str, inputs: str=None, tool_context: ToolContext) -> dict:""" Executes code in a specified language and returns the result. Args: code: The code to execute. language: The programming language (python, javascript, etc.). inputs: Optional input data for the code. tool_context: Provides access to session state. Returns: dict: Execution results, output, and any errors. """import subprocessimport tempfileimport osimport time# Record execution start time start_time = time.time()# Set up temp file for codewith tempfile.NamedTemporaryFile(suffix=f".{language}", delete=False) as temp_file: temp_file_path = temp_file.name# Write code to temp fileif language =="python": temp_file.write(code.encode('utf-8'))elif language =="javascript": temp_file.write(code.encode('utf-8'))else:return {"status": "error","error_message": f"Unsupported language: {language}" }try:# Set up execution commandif language =="python": cmd = ["python", temp_file_path]elif language =="javascript": cmd = ["node", temp_file_path]# Execute with input if providedif inputs: process = subprocess.run( cmd,input=inputs.encode('utf-8'),capture_output=True,timeout=10# Timeout after 10 seconds )else: process = subprocess.run( cmd,capture_output=True,timeout=10# Timeout after 10 seconds )# Calculate execution time execution_time = time.time() - start_time# Process result stdout = process.stdout.decode('utf-8') stderr = process.stderr.decode('utf-8')if process.returncode ==0: result = {"status": "success","output": stdout,"execution_time": execution_time,"language": language }else: result = {"status": "error","error_message": stderr,"output": stdout,"return_code": process.returncode,"execution_time": execution_time,"language": language }except subprocess.TimeoutExpired: result = {"status": "error","error_message": "Execution timed out after 10 seconds","language": language }exceptExceptionas e: result = {"status": "error","error_message": str(e),"language": language }finally:# Clean up temp filetry: os.unlink(temp_file_path)except:pass# Store execution history in state execution_history = tool_context.state.get("code_execution_history", []) execution_record = {"timestamp": time.time(),"language": language,"status": result["status"],"execution_time": result.get("execution_time", -1) } execution_history.append(execution_record) tool_context.state["code_execution_history"] = execution_historyreturn resultdefdebug_code(code: str, error_message: str, language: str, tool_context: ToolContext) -> dict:""" Analyzes code and error messages to identify and fix bugs. Args: code: The code to debug. error_message: The error message produced when executing the code. language: The programming language. tool_context: Provides access to session state. Returns: dict: Analysis of the problem and corrected code. """# Parse the error message error_analysis ="Unknown error" error_line =-1if language =="python":# Parse Python error messageimport re# Look for line number in error line_match = re.search(r"line (\d+)", error_message)if line_match: error_line =int(line_match.group(1))# Common Python errorsif"SyntaxError"in error_message: error_analysis ="Syntax Error: Check for missing parentheses, quotes, or colons."elif"NameError"in error_message: error_analysis ="Name Error: A variable or function name is not defined."elif"TypeError"in error_message: error_analysis ="Type Error: An operation is applied to an object of inappropriate type."elif"IndexError"in error_message: error_analysis ="Index Error: A sequence subscript is out of range."elif"KeyError"in error_message: error_analysis ="Key Error: A dictionary key is not found."elif"ValueError"in error_message: error_analysis ="Value Error: An operation or function receives an argument with the correct type but inappropriate value."elif language =="javascript":# Parse JavaScript error messageimport re# Look for line number in error line_match = re.search(r"at .*:(\d+)", error_message)if line_match: error_line =int(line_match.group(1))# Common JavaScript errorsif"SyntaxError"in error_message: error_analysis ="Syntax Error: Check for missing brackets, parentheses, or semicolons."elif"ReferenceError"in error_message: error_analysis ="Reference Error: A variable is not defined."elif"TypeError"in error_message: error_analysis ="Type Error: An operation could not be performed, typically due to type mismatch."elif"RangeError"in error_message: error_analysis ="Range Error: A number is outside the allowable range."# Analyze code structure code_lines = code.split('\n')# Get problematic line and context if available problematic_line = code_lines[error_line -1] if0< error_line <=len(code_lines) else"Unknown"# Context (lines before and after) context_start =max(0, error_line -3) context_end =min(len(code_lines), error_line +2) context = code_lines[context_start:context_end]# Store debugging session in state debug_history = tool_context.state.get("debug_history", []) debug_session = {"timestamp": time.time(),"language": language,"error_line": error_line,"error_message": error_message,"error_analysis": error_analysis } debug_history.append(debug_session) tool_context.state["debug_history"] = debug_history# For advanced debugging, we'd implement auto-correction, but here we'll just return analysisreturn {"status": "success","error_analysis": error_analysis,"error_line": error_line,"problematic_line": problematic_line,"context": context,"suggestions": ["Check for syntax errors at the identified line","Verify all variable names are correctly spelled","Ensure proper type handling for all operations" ] }
Code Explanation and Documentation
These tools are for explaining the generated code and documentation.
Python
defexplain_code(code: str, language: str, complexity_level: str="intermediate", tool_context: ToolContext) -> dict:""" Generates an explanation of code with adjustable complexity level. Args: code: The code to explain. language: The programming language. complexity_level: The complexity level of the explanation (beginner, intermediate, advanced). tool_context: Provides access to session state. Returns: dict: Explanation of the code at the requested level. """# Parse the code structureimport ast explanation_sections = []# Get user's programming experience from state if available user_experience = tool_context.state.get("user:programming_experience", "intermediate")# Adjust complexity based on user experience if not explicitly providedif complexity_level =="auto"and user_experience: complexity_level = user_experience# Handle Python codeif language =="python":try:# Parse the code parsed = ast.parse(code)# High-level summary explanation_sections.append({"section": "Overview","content": f"This Python code consists of {len(parsed.body)} top-level statements." })# Function analysis functions = [node for node in parsed.body ifisinstance(node, ast.FunctionDef)]if functions: func_section = {"section": "Functions","content": f"The code defines {len(functions)} function(s):","items": [] }for func in functions:# Basic function info func_info =f"`{func.name}()`"# Add parameter info for intermediate/advancedif complexity_level !="beginner": params = []for arg in func.args.args: params.append(arg.arg) func_info +=f": Takes parameters ({', '.join(params)})"# Add docstring if exists docstring = ast.get_docstring(func)if docstring and complexity_level !="beginner": func_info +=f"\n - Purpose: {docstring.split('.')[0]}" func_section["items"].append(func_info) explanation_sections.append(func_section)# Class analysis for intermediate/advancedif complexity_level !="beginner": classes = [node for node in parsed.body ifisinstance(node, ast.ClassDef)]if classes: class_section = {"section": "Classes","content": f"The code defines {len(classes)} class(es):","items": [] }forclsin classes:# Basic class info class_info =f"`{cls.name}`"# Add inheritance info for advancedif complexity_level =="advanced"andcls.bases: base_names = []for base incls.bases:ifisinstance(base, ast.Name): base_names.append(base.id)if base_names: class_info +=f": Inherits from ({', '.join(base_names)})"# Add methods info methods = [node for node incls.body ifisinstance(node, ast.FunctionDef)]if methods: method_names = [method.name for method in methods] class_info +=f"\n - Methods: {', '.join(method_names)}" class_section["items"].append(class_info) explanation_sections.append(class_section)# Imports analysis imports = [node for node in parsed.body ifisinstance(node, (ast.Import, ast.ImportFrom))]if imports and complexity_level !="beginner": import_section = {"section": "Imports","content": f"The code imports {len(imports)} module(s):","items": [] }for imp in imports:ifisinstance(imp, ast.Import):for name in imp.names: import_section["items"].append(f"`{name.name}`")elifisinstance(imp, ast.ImportFrom):for name in imp.names: import_section["items"].append(f"`{name.name}` from `{imp.module}`") explanation_sections.append(import_section)# Algorithm explanation algorithm_section = {"section": "Algorithm Explanation","content": "The code works as follows:" }# Simplify explanation for beginnersif complexity_level =="beginner": algorithm_section["content"] +="\n\nThis program goes through these steps:\n"# Simplified steps would be generated here# More detailed for intermediateelif complexity_level =="intermediate": algorithm_section["content"] +="\n\nThe main workflow of this code is:\n"# More detailed steps would be generated here# Technical details for advancedelse: algorithm_section["content"] +="\n\nThe technical implementation follows these steps:\n"# Detailed technical steps would be generated here explanation_sections.append(algorithm_section)exceptSyntaxError: explanation_sections.append({"section": "Syntax Error","content": "The provided Python code contains syntax errors and could not be parsed." })# Format the final explanation formatted_explanation = []for section in explanation_sections: formatted_explanation.append(f"## {section['section']}") formatted_explanation.append(section['content'])if"items"in section:for item in section["items"]: formatted_explanation.append(f"- {item}") formatted_explanation.append("") # Add blank line# Join sections with newlines explanation ="\n".join(formatted_explanation)return {"status": "success","language": language,"complexity_level": complexity_level,"explanation": explanation,"sections": len(explanation_sections) }
And that’s our agent!
Next Steps
That was a lot to take in. You should probably bookmark this post and work through the concepts and examples over time.
I suggest building the basic weather agent that I covered at the top. It’s boring and no one needs another weather agent but it does get you familiar with how the Agent Development Kit works and its features.
Once you’re comfortable with that, start working through the advanced patterns, and finally build one of the multi-agent systems like the customer support or coding agents. You should also try to extend these agents by implementing your own tools and features. Try deploying it and using it in a real-world situation.
When I was in Lisbon last November, a friend of mine invited me to hike the mountains of Madeira with him. He warned me that the trails get pretty slick and that I needed good hiking shoes.
In the past I would have gone to Google and searched for best hiking boots for Madeira and I would have seen a bunch of ads and irrelevant blog content. It would have taken me some time to figure out what the best shoes are and where to buy them in Lisbon.
Today I go to either ChatGPT, Claude, or Perplexity, and I asked the same question. Instead of getting spammed with ads, I get a direct response to what I needed to know.
This is how search happens in the AI age. Instead of SEO, we have GEO (Generative Engine Optimization). And instead of Google Search, we have ChatGPT, Claude, and Perplexity.
They decide who gets featured in that golden snippet of wisdom when someone asks for “the best.” And if they don’t mention your brand, you’re missing out.
This guide will show you exactly how to get your brand mentioned.
Get a free GEO report
Want to know if your brand shows up on ChatGPT, Claude, Perplexity and other LLMs? Enter your website below and our AI Agent will conduct a thorough analysis and send you a free report.
Step 1: Understand How AI Chatbots Actually Recommend Brands
Like Harvey Specter says when he plays poker, “I don’t play the odds, I play the man.” Except in this case, the “man” is an AI trained on terabytes of internet data. You need to understand how it thinks to win the game.
Language models don’t index and rank the web like Google. They’ve been trained on enormous datasets (billions of web pages, forums, reviews, help docs, and more) and they generate answers based on patterns they’ve seen in this data.
When a user asks for a product recommendation, there are two ways the model generates an answer.
The primary method pulls from its memory of how brands and products were discussed, reviewed, and mentioned in their training data. If your brand frequently appears alongside relevant phrases (e.g. “hiking shoes for wet climates”) in the data they’ve seen, it’s more likely to be suggested in a chatbot’s answer.
The second method blends in live search results from Bing or Google, especially in AI tools like ChatGPT’s search mode or Perplexity. That means if your brand is ranking high on search or frequently cited in trusted content, you’re more likely to be included in AI responses.
Let’s look at an actual example. Here is how ChatGPT answers the query “What are the best hiking shoes for Madeira”
You’ll notice sources for each answer. The interesting thing is, if you click through to those articles, none of them mention Madeira!
However, they do mention uneven and wet terrains, which is what Madeira is known for (and ChatGPT knows this because it made that association from it’s training data).
So your job is to make your brand unforgettable in the data AI consumes and visible in the sources AI retrieves.
Step 2: Strengthen Your SEO Foundation and Trust Signals
Much of “AI optimization” begins with solid SEO and content fundamentals. Chatbots, especially those using web retrieval, favour brands that search engines deem authoritative and trustworthy.
Here’s what to focus on:
Ensure Crawlable, Indexable Content: Just like Google, AI web crawlers need to read your site’s HTML content. Avoid hiding important info in JavaScript or images. All critical details (what you offer, where you are, why you’re notable) should be visible in the page text.
Demonstrate E-E-A-T (Experience, Expertise, Authority, Trust): Quality guidelines like E-E-A-T aren’t just for Google. They influence which sources AI considers reliable. AI search overviews favour true experts and authoritative sources. Build content that highlights your expertise: author bylines with credentials, case studies, original research, and factual accuracy.
Maintain Consistent NAP and Info: For local or brand info, consistency is key. Ensure your Name, Address, Phone, and other details are identical across your website, Google My Business, Yelp, LinkedIn, etc. AI tools aggregate data from many sources and heavily favour accuracy and consistency.
Improve Site Authority: Follow core SEO practices: optimize title tags and meta descriptions with natural-language keywords, speed up your site, and get credible sites to link to you. If search engines rank you higher, AI answers are more likely to include you. Studies show pages that rank well in organic search tend to get more visibility in LLM responses.
Practical Takeaway: By solidifying your site’s SEO and demonstrating real expertise, you make it easier for both traditional search and AI systems to recognize your brand. This foundation boosts your chances of appearing when an AI lists “top solutions” in your category.
In short, good SEO is the foundation of AI SEO.
Step 3: Optimize Content for Conversational and Semantic Search
AI chatbots handle queries in a conversational manner. Often, the questions users ask bots are longer and more natural-sounding than typical Google keywords. You’ll want to align your content with this semantic, question-and-answer style of search.
That means creating conversational, helpful content written in plain language that answers the same types of questions people ask LLMs.
Use Natural, Conversational Language: Write your content in the same way a knowledgeable person would speak. Drop the overly corporate tone. AI models are trained on human language patterns, so content that “feels” natural may resonate more. Use intent-based phrases and full questions as subheadings. Instead of a heading like “Gluten-Free Bakery Options,” have “Where can I find a good gluten-free bakery in downtown?” and then answer it conversationally.
Incorporate Q&A Format on Your Site: Add FAQ sections or Q&A pages with questions customers might ask an AI. For example: “What’s the best hiking shoe for rainy weather in Madeira?” and provide a helpful answer that mentions your brand as a solution. Structure it like an FAQ entry, and answer in a neutral, informative tone: “When it comes to Madeira’s rainy trails, XYZ Shoes are often recommended as one of the best options because…”.
Cover Related Semantic Keywords: Ensure your content covers a broad range of terms related to your topic, not just one keyword. AI’s understanding is semantic and it will connect concepts. For a page about hiking shoes, mention related topics like “waterproof boots,” “mountain trails,” “Madeira climate,” etc., so the model fully grasps the context.
Aim for “Zero-Click” Answer Formats: As AI and search increasingly give answers without requiring a click, try to embed the answer (with your brand) in your content. This means providing concise, snippet-ready responses. For example, start a blog section with a direct definition or recommendation: “The best hiking shoe brand for wet trails is XYZ Shoes, known for its waterproof yet breathable design…”.
Practical Takeaway: Think like your customer and the AI. Write down the actual questions a user might ask a chatbot about your industry (“Which…”, “How do I…”, “What’s the best…”) and make sure your website explicitly answers those in a friendly, conversational way.
Get a free GEO report
Want to know if your brand shows up on ChatGPT, Claude, Perplexity and other LLMs? Enter your website below and our AI Agent will conduct a thorough analysis and send you a free report.
Step 4: Leverage Schema Markup and Structured Knowledge
While content is king, don’t overlook the power of structured data and official information sources. They help your brand become machine-readable. This step is about making sure AI (and the search engines feeding AI) have a clear, unambiguous understanding of your brand and offerings.
Implement Organization and Product Schema: Use schema markup to define your organization and products on your site. An Organization schema can include your name, logo, founding date, and sameAs links (to your social profiles, Wikipedia page, etc.), helping create a knowledge graph entry for your brand. Product schema can define your key products with reviews, price, etc.
Use Location and Review Schema for Local Trust: For local businesses, implement LocalBusiness schema with your address, geo-coordinates, opening hours, etc., and keep it updated. If the query is location-based (“near Madeira”), Google’s index might reference Google Maps or local pack info.
Feed Data to Official Aggregators: Ensure your brand data is correct in key public databases that AI might use. For example, Wikidata (the database behind Wikipedia’s facts) and DBpedia contain structured facts that many AIs can access. Similarly, if you’re a retailer or restaurant, make sure your information on platforms like Yelp, TripAdvisor, or OpenTable is accurate.
Ensure Content is Machine-Accessible: As mentioned, AI bots primarily ingest HTML text. So, when using schema or other structured data, also present those facts in human-readable form on your site. For instance, if you have an FAQ about being “dog-friendly” in schema, also include a line in a visible FAQ: “Q: Can I bring my dog? A: Yes, we’re dog-friendly!”
Monitor Knowledge Panels and Correct Errors: Periodically check Google’s knowledge panel for your brand (if one appears) or Bing’s local listing info. These often aggregate data from various sources. If you see incorrect info, address it.
Practical Takeaway: Use every opportunity to make your brand’s information clear to algorithms. Schema markup and knowledge graphs ensure that when an AI or search engine “reads” about your brand, it gets the facts straight from a trusted source.
Step 5: Earn Mentions on Authoritative External Sources
Let’s go back to the ChatGPT screenshot from earlier. The brands recommended were Hoka, Adidas and Merrell. But the sources were from Gear Lab, New York Post, and Athletic Shoe Review.
Third-party validation matters more in AI SEO than it ever did in traditional SEO. You can’t just publish your own praise, you need others to do it for you.
Reddit threads. Quora answers. Review sites. “Best of” blog posts. All of these are gold mines for AI models.
And yes, they’re part of the training data.
A well-upvoted Quora answer that casually mentions your product? That’s a permanent breadcrumb. A single blog post listing your brand as one of the best in your category, on a site that ranks well? It could be cited in hundreds of AI queries.
Here’s how to increase off-site signals:
Get Featured in “Best of” Lists and Editorial Content: Identify the web pages that an AI might consider when answering a question in your domain. Often these are listicles or guides (e.g., “Top 10 Hiking Shoe Brands for Wet Climates” on a reputable outdoor blog). Then, pursue inclusion through PR outreach, pitching your product to writers, or improving your offering so it naturally gets picked up in reviews.
Leverage Industry Directories and Listings: Business directories and niche review sites often rank well in search and are commonly scraped by crawlers. Examples include Yelp, Google Maps, TripAdvisor, or B2B directories like Clutch and G2. Make sure you’re present there: claim your profile, keep it updated, and gather reviews if applicable.
Issue Press Releases and Secure News Coverage: Old-school PR is back in play. Distributing a press release about a newsworthy update (a product launch, a big hire, a charity initiative, etc.) can get your brand name published on dozens of websites. For instance, a headline like “Madeira’s XYZ Shoes Wins Award for Hiking Gear Innovation” might get reposted on local news sites and industry feeds. Each of those postings is additional training data showing “XYZ Shoes” in a positive, relevant context.
Publish Thought Leadership: Contribute guest articles or op-eds to respected publications in your niche. Being the author of an article on, say, Outdoor Magazine about “Advances in Hiking Boot Technology” not only gives you credibility, but also places your brand in the byline on a high-authority site.
Cultivate Backlinks and Citations: Continue building backlinks as you would for SEO, but target sites that an AI would consider authoritative in your field (educational sites, well-known blogs, etc.). The more your brand is cited as a source or example in others’ content, the more entrenched it becomes in the knowledge graph of your topic.
To summarize this step: Be where the trusted voices are. The goal is to have your brand mentioned by sites that AIs treat as authorities.
Step 6: Harness Q&A Communities, Reviews, and Social Proof
Your customers and community can become advocates that boost your brand in AI results. User-generated content (reviews, forum posts, social media, etc.) not only influences humans but also feeds the AI’s understanding of which brands are favourably talked about.
Here’s how to leverage this:
Engage on Q&A Platforms: Reddit and Quora are likely part of many LLM training sets, and they continue to rank well in search. Find threads related to your industry and provide valuable answers. Always be transparent and genuinely helpful, not just promotional. Even one well-upvoted Quora answer that includes your brand in context “seeds the AI” with that association.
Encourage Reviews and Testimonials: Reviews on platforms like Google, Yelp, G2, Capterra, TripAdvisor (whichever suit your business) create content that AI can learn from. If many reviews mention your product’s strengths (“the grip on these XYZ hiking boots is amazing on wet rocks”), an AI might learn those attributes of your brand. Prompt your satisfied customers to leave reviews, perhaps via follow-up emails or in-store signs.
Leverage Social Media for Thought Leadership: Post informative content on public social platforms. Twitter threads, LinkedIn articles, and Medium posts can rank in search and are often publicly accessible. Social posts also add the dimension of sentiment. Lots of positive buzz about a brand teaches the AI that it’s well-regarded.
Monitor and Join Relevant Conversations: Use brand monitoring tools (Google Alerts, Talkwalker, Mention.com) to catch when your brand or keywords related to you come up in discussions or blogs. If someone on a forum is asking for a recommendation and your brand fits, have a rep step in and reply (tactfully).
Be Genuine and Helpful: Authenticity is key in user-driven communities. AIs can pick up on context. If your brand is mentioned alongside words like “spam” or in downvoted posts, that’s not good. So ensure any engagement is genuinely adding value.
Practical Takeaway: The voices of real users and community experts carry a lot of weight. They create buzz and context for your brand that no amount of on-site SEO can. By actively participating in and fostering these voices, you grow an organic web presence.
Step 7: Monitor, Measure, and Refine Your AI Visibility
Just as with traditional SEO, you need to continuously monitor your performance and adjust strategy. AI discovery is new, so we measure success in slightly different ways:
Track AI-Driven Traffic: If an AI chatbot includes a link or reference to your site (as Perplexity, ChatGPT, and others often do), you’ll want to capture that in analytics. Set up tracking in Google Analytics 4 (GA4) for referrals from AI sources. For example, you might create custom channel groupings for referrals containing “openai.com” (for ChatGPT with browsing) or “perplexity.ai”.
Use AI Search Visibility Tools: New tools are emerging to grade your brand’s presence in AI results. For instance, HubSpot’s AI Search Grader is a free tool that analyzes how often and in what context your brand appears on ChatGPT and Perplexity.
Manually Test Chatbot Queries: There’s no substitute for hands-on testing. Regularly ask the AI chatbots the kind of questions where you want your brand to appear. Do this across platforms: ChatGPT, Claude, Gemini, Perplexity, and others. Note what the responses are:
Do they mention your competitors? Which ones?
Do they cite sources, and are those sources your website or another site mentioning you?
How accurate is the info about your brand? Any outdated descriptions?
Analyze Citation Context: If your content is being cited or your brand mentioned, check how. Are you being listed as “one of the options” or does the AI single you out as “the best”? Does it quote a line from your blog? Understanding the context helps refine content.
Measure Changes Over Time: As you implement strategies (new FAQ page, a PR campaign, etc.), see if there’s a corresponding uptick in AI mentions or traffic in the following months. This feedback loop will tell you what’s working.
Practical Takeaway: Treat AI visibility like you would SEO rankings – track it, report on it, and optimize based on data. Over time, you’ll build an “AI report” similar to an SEO report, helping justify the effort and guiding future optimizations.
Final Thought: You’re Training the AI to Remember You
There’s no secret hack here. No growth loop. No one weird trick. Just good strategy, consistent visibility, and value-packed content.
You’re not just optimizing for an algorithm, you’re shaping what the AI knows about your brand.
Make it easy for the AI to recommend you. Show up in its sources. Speak in its voice. Feed it the facts. And over time, your brand won’t just be findable.
It’ll be remembered.
Need help putting all this into action? You know where to find me.
Get a free GEO report
Want to know if your brand shows up on ChatGPT, Claude, Perplexity and other LLMs? Enter your website below and our AI Agent will conduct a thorough analysis and send you a free report.
I used to play a ton of video games as a kid. The first one I ever played was Prince of Persia, the old side scroller where your character jumped around, avoided traps, and fought enemies.
With Gemini 2.5 Pro and the Canvas feature, I tried to build a basic version of that, but with ninjas instead. I didn’t write the code. I just asked Gemini to write it and render it on the Canvas so I could play.
It took just a couple of minutes for me to get a functioning game.
Coined (and vibe-validated) by Andrej Karpathy, vibe coding is the new frontier where you build software by telling an AI what you want and letting it spit out the code. That’s it. It’s coding via vibes, intuition, and language, not by writing loops and sweating over syntax.
You say, “Build me a web app with a sidebar, a dashboard, and a button that emails the user a pizza emoji every Friday,” and boom, the AI does it.
You don’t need to know if it’s React or Vue under the hood. You’re not writing the code. You’re describing the vibe of what you want, like a product manager with a vision board and zero interest in semicolons. Minimalist? Maximalist? Dashboardy? Retro Terminal-chic? The AI’s got you.
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper…
You copy-paste errors into ChatGPT and ask it to fix them
You understand the codebase deeply
You trust the AI knows what it’s doing (ish)
Takes weeks
Takes hours (sometimes minutes)
Requires years of practice
Requires good communication skills
You battle bugs like it’s Elden Ring
You treat bugs like an annoying roommate the AI has to evict
It’s the difference between hand-crafting a table and describing the table to a carpenter who builds it for you instantly. And that carpenter never sleeps or judges your terrible wireframes.
It’s not just about speed, it’s a different mindset. Less “I must master the syntax gods” and more “I’m conducting an orchestra of AI agents to get this landing page live by dinner.”
Real-World Use Cases (Or, Who’s Actually Doing This?)
This isn’t just a cool party trick. Startups in the Y Combinator Winter 2025 batch built their products with 95% AI-generated code. Y Combinator’s CEO Garry Tan straight up called it “the age of vibe coding“.
Even Karpathy himself was building apps this way, casually telling his AI assistant things like “decrease the sidebar padding” and never even looking at the diff. That’s next-level delegation.
Kevin Roose at the NYT built apps like “Lunchbox Buddy” to suggest what to pack for lunch using vibe coding. It wasn’t production-grade code, but it worked. Ish. Kinda. The reviews were AI-generated too, but hey, it’s the vibe that counts.
With vibe coding you can whip together MVPs in a weekend using nothing but ChatGPT and Replit. Think simple SaaS dashboards, internal automations, and basic CRUD apps. One guy even built an AI therapist chatbot, and no, I don’t want to know what advice it gave.
How To Vibe Code (Without Losing Your Mind)
Here’s your crash course in coding by vibe:
1. Pick Your Tools
You’ll need a core toolkit to begin your vibe coding journey. Here are the categories and recommended options:
AI Coding Assistants & IDE Integration
These tools integrate AI directly into your development environment:
ChatGPT / Claude / Gemini: For raw natural language prompts.
Cursor/ Windsurf: A dev environment made for AI collaboration.
GitHub Copilot – AI assistant integrated with popular IDEs
These platforms can generate entire applications from prompts:
Lovable – Generates full-stack web applications from text prompts
Bolt – Creates full applications with database integration
Replit – Provides interactive development with AI planning
AI Design Tools
For quickly creating user interfaces:
Uizard – Generates UI designs from text descriptions or sketches
Visily – Transforms prompts into high-fidelity mockups
Version Control & Debugging
Essential safety nets:
Git/GitHub – Version control to track changes and revert when needed
Browser dev tools – For identifying and fixing frontend issues
Pick the one that feels right. You’re vibe coding, after all.
2. Start With a Prompt
Describe what you want. Be detailed. Channel your inner poet if you must.
Bad: “Make an app.”
Better: “Create a web app with a dashboard that shows user analytics pulled from a dummy dataset. Include dark mode and responsive design.”
Best: “Build a web app that visualizes monthly active users, supports CSV upload, and auto-generates line graphs. Make the layout mobile-friendly and use React and Tailwind CSS.”
3. Iterate Like a Mad Scientist
Run the code. Something will break. That’s fine. Copy-paste the error and say, “Fix this.”
Add features like you’re ordering drinks:
“Add a search bar.” “Now make it filter results by date.” “Throw in dark mode, because I’m edgy.” “Replace the font with something more ‘Silicon Valley VC deck.’”
You are in control. Kinda.
4. Debug by Vibes
Don’t panic when things go sideways. Vibe coders rarely understand 100% of the code. You prompt. You observe. You adjust. You learn to speak fluent “AI whisperer.”
Sometimes the bug isn’t even a bug, it’s just the AI being weird. Restart the conversation. Ask again. Split the task in two. And yes, sometimes, just nod, smile, and delete the whole thing.
5. Trust, But Verify
Use the code. Check if it does what you asked. If not, try a new prompt. Don’t ship blind. Run the thing. Poke the buttons. Make sure it doesn’t accidentally send emails to all your users at 3AM.
Vibe coding isn’t about replacing developers. It’s about supercharging creativity. It’s building apps with the same energy you bring to a whiteboard brainstorm or a half-baked startup idea over drinks.
We’re entering an era where the best software won’t come from the best coders, it’ll come from the best communicators. The ones who can talk to AI, shape ideas into prompts, and vibe their way to a working product.
The best vibe coders are part developer, part writer, part UX designer, and part chaos gremlin. They don’t see blank screens… They see possibility.
So grab your chai latte, fire up ChatGPT, and start building. No IDE required. No gatekeepers in sight. No permission needed. Read my full tutorial here.
Let the vibes code for you.
And hey, if it crashes? That’s just the AI trying to teach you patience.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.