There’s a research paper that’s been making the rounds recently, a study by MIT’s Media Lab, that talks about the cognitive cost of using LLMs (Large Language Models that power AI apps like ChatGPT).
In the study, researchers asked 54 participants to write essays. They divided them into 3 groups – one that could use ChatGPT, one that could only use a search engine like Google, and the third (Brain-only) that couldn’t use any tool.
And, surprise surprise, they found that ChatGPT users had the lowest brain engagement, while participants who use only their brains to write the essays had the highest. ChatGPT users also had a harder time recalling quotes from the essay.
No shit.
Let’s leave aside the fact that 54 participants isn’t statistically significant and that writing an essay is maybe not a comprehensive test of cognitive load. The paper is essentially saying that if you use AI to help you think, then you are reducing the cognitive load on your brain. This is obvious.
Look, if you use ChatGPT to write an entire article for you, without any input, then of course you’re not using your brain. And of course you’re not going to remember much of it, you didn’t write it!
Does that mean it’s making you dumber? Not really.
But it’s also not making you smarter. And that should be obvious to you too.
Active vs passive mode
AI is a tool, like any other, and there’s a right way to use it and a wrong way to use it.
If I need to study a market to evaluate an investment opportunity, I could easily ask ChatGPT to run deep research on the market, and then write up a report. It would take a few minutes, as opposed to a few hours if I did it myself.
Even better, I can ask ChatGPT to make an investment recommendation based on the report. That way I don’t even need to read it!
But have I learned anything at all from this exercise? Of course not. The only work I did was write a prompt, and then AI did everything else. There was no input from me, not even a thought.
Again, all of this is obvious, but it’s also the default mode for most people using AI. That’s why the participants in the study showed low levels of brain activity. They asked AI to do all the work for them.
This is the passive mode.
But there’s a better way, one where you can use AI to speed things up and also learn and exercise critical thinking.
I call this active mode.
Thinking with AI
Any task can be broken down into steps that require critical thinking or creative input, and steps that don’t. In the market research example, searching doesn’t require critical thinking but understanding it and writing a report does.
In active mode, we use AI to do the steps that don’t require critical thinking.
We use ChatGPT Deep Research to find relevant information, but we read it. And once we read it, we figure out what’s missing and ask ChatGPT to search for that information.
When we’re done understanding the market, we write the report and we ask ChatGPT to help us improve a sentence or paragraph. We decide what information to put into the report but we ask ChatGPT to find a source to back it up.
And when we’re done, we ask ChatGPT to poke holes in our report, and to ask us questions that haven’t been covered. And we try to answer those questions ourselves, and go back to our research or ask ChatGPT to research more if we don’t have the answers.
Writing a report, planning a project, building an app, designing a process, anything can be done this way, you doing the critical thinking and creative stuff, and AI doing the rest.
You just need to make this your default way of using AI.
Practical Steps for Active AI Use
Here’s how to make active mode your default:
1. Start with Your Framework
Before touching AI, spend 5-10 minutes outlining:
What you’re trying to accomplish
What you already know about the topic
What questions you need answered
How you’ll evaluate success
This prevents AI from hijacking your thought process from the start.
2. Use AI for Research
Ask AI to find information but don’t ask it to summarize it without reading through it yourself
Instead of: “What does this data mean for my business?”
Try: “Find data on customer churn rates in SaaS companies with 100-500 employees”
Then draw your own conclusions about what the data means.
That’s not to say you shouldn’t ask AI to analyze data. You absolutely should, but after you draw your own conclusions as a way to uncover things you’ve missed.
3. Think Out Loud With AI
Use AI as a sounding board for your thinking:
“I’m seeing a pattern in this data where X leads to Y. What other factors might explain this relationship?”
“My hypothesis is that Z will happen because of A and B. What evidence would support or contradict this?”
4. Ask AI to Challenge You
After developing your ideas, ask AI to poke holes:
“What assumptions am I making that might be wrong?”
“What questions haven’t I considered?”
“What would someone who disagrees with me say?”
5. Use the 70/30 Rule
Aim for roughly 70% of the cognitive work to come from you, 30% from AI. If AI is doing most of the thinking, you’re in passive mode.
6. Maintain Ownership of Synthesis
AI can gather information and even organize it, but you should be the one connecting dots and drawing conclusions. When AI offers synthesis, use it as a starting point for your own analysis, not the final answer.
7. Test Your Understanding
Regularly check if you can explain the topic to someone else without referencing AI’s output. If you can’t, you’ve been too passive.
When Passive Mode Is Fine
Active mode isn’t always necessary. Use passive mode for:
Getting quick background on unfamiliar topics
Formatting and editing tasks
Generating initial ideas to spark your own thinking
Routine tasks that don’t require learning or growth
The Long Game
The MIT study participants who relied entirely on AI showed less brain engagement, but they also completed their tasks faster. That’s the trade-off: immediate efficiency versus long-term capability development.
In active mode, you might take slightly longer upfront, but you build knowledge, develop better judgment, and create mental models you can apply to future problems.
The goal isn’t to avoid AI or to make every interaction with it a learning exercise. It’s to be intentional about when you’re thinking with AI versus letting it think for you.
The best performing post on this blog is a 20,000 word tome on the Google Agent Development Kit. Granted, maybe half the words are code samples, but without AI this would have taken me weeks to write. With AI, it was just a few days.
Great articles, the kind that get shared in Slack channels, bookmarked for later, or ranked on Google or ChatGPT, don’t just happen. They require deep research, careful structure, compelling arguments, and that yo no sé qué quality we call a tone or voice.
They need to solve real problems, offer genuine insights, and reflect the author’s hard-earned expertise.
The traditional writing process goes something like this: ideation (where you wrestle with “what should I even write about?”), research (down the rabbit hole of sources and statistics), outlining (organizing your scattered thoughts into something coherent), drafting (the actual writing), editing (realizing half of it makes no sense), revising (again), and finally polishing (until you hate every word you’ve written).
That’s a lot of work. For a 2,000-word post like the one you’re reading, probably a couple of days of work. And then AI came along and everyone thought they could short-circuit this process with “vibe marketing”, and now we have slop everywhere and no one wins.
Stop serving slop
The problem is that most people have fallen into one of two camps when it comes to AI writing:
Camp 1: The AI Content Mills
These are the people who’ve decided that if AI can write, then obviously the solution is to generate unlimited blog posts and articles with minimal human input. More content equals more traffic equals more success, right?
They’re pumping out dozens of articles per week, each one a generic regurgitation of the same information you can find anywhere else, just rearranged by an algorithm.
Who’s going to read this? They are bots creating content for other bots. Any real human traffic that hits their site will take one look at it and then bounce.
Camp 2: The One-Prompt Writers
On the flip side, you’ve got well-meaning writers who heard AI could help with content creation, so they fired up ChatGPT and typed something like “write me a 2000-word article on productivity.”
Twenty seconds later, they got back a wall of text that reads like it was written by an intern who’d never experienced productivity problems themselves, which, in a way, it was.
Frustrated by the generic drivel, they declared AI “not ready for serious writing” and went back to their caves, doing everything the old way. They still create good content, but it takes long and requires too many resources.
Both camps are missing the point entirely. The problem isn’t AI itself. It’s over-reliance on automation without essential quality control measures in place. They’re both treating AI like a magic one-click content machine.
The Missing Ingredient: Your Actual Brain
Here’s a novel concept. What if humans want to read content that is new and interesting?
Think about what you bring to the writing table that no AI can replicate… creativity, emotional intelligence, ethical reasoning, and unique perspectives. Your years of experience in your field. Your understanding of your audience’s real pain points. Your ability to connect seemingly unrelated concepts. Your voice, your humor, your way of explaining complex ideas.
AI, meanwhile, excels at the stuff that usually makes you want to procrastinate, like processing vast amounts of information quickly, organizing scattered thoughts into logical structures, and generating that dreaded first draft that’s always the hardest part.
Two entities with complementary skill sets. You and the AI. Like Luke and R2-D2.
You’re the director, the editor, the strategic thinker, and the voice curator. AI is the research assistant and first-draft collaborator.
Let me walk you through exactly how I’ve been using this collaboration to go from scattered thoughts to published article in 1-2 hours instead of a full day, while actually improving quality.
Step 1: I Pick the Topic
This is where your expertise and market understanding are irreplaceable. I don’t ask AI “what should I write about?” That’s a recipe for generic content that already exists everywhere else.
Instead, I pick topics that genuinely interest me or that I think are timely and underexplored. For example, my piece on ChatGPT’s glazing problem, or my deep dive into Model Context Protocol.
The blog post you’re reading right now came from a tweet (xeet?) I responded to.
I start by doing what I call the “thesis dump.” I open a new chat in my dedicated Claude project for blog content and just brain-dump everything I think about the topic. Stream-of-consciousness thoughts, half-formed arguments, random observations, and whatever connections I’m seeing that others might not.
Pro-tip: Create a Claude project specifically for blog content (or whatever type of content you write), upload samples of past work or work you want to emulate, and give it specific instructions on your writing style and tone.
Pro-pro-tip: Use the voice mode on Claude’s mobile app or Wispr Flow on your computer to talk instead of type. And just ramble on, don’t self-edit.
This dump becomes the foundation of everything that follows. It’s my unique perspective, my angle, my voice. The stuff that makes the eventual article mine rather than just another generic take on the topic.
Step 2: AI Does the Research Legwork
Now comes the part where AI really shines. AI excels at supporting literature review and synthesis, processing vast amounts of information that would take me hours to gather manually.
I ask AI to research the topic thoroughly. Before Claude had web search, I’d use ChatGPT for this step. The key questions I want answered are:
What’s already been written on this topic?
What angles have been covered extensively?
What gaps exist in the current conversation?
What data, statistics, or examples support (or challenge) my thesis?
This research phase is crucial because understanding the landscape helps you write something better than what already exists. I’m not looking to regurgitate what everyone else has said. I want to know what they’ve said so I can say something different, better, or more useful.
The AI comes back with a comprehensive overview that would have taken me hours to compile. Sometimes it surfaces angles I hadn’t considered. Sometimes it finds data that strengthens my argument. Sometimes it reveals that my hot take has already been thoroughly explored, saving me from publishing something redundant.
My WordPress is littered with drafts of posts I thought I were genius insights only to find out smarter people than I have covered everything on the topic.
Step 3: Collaborative Outlining
This is where the collaboration really starts to sing. I ask Claude to create an outline that brings my original thesis dump and the research it has gathered together.
Here it becomes a cycle of drafting, editing, and reworking where I’m actively shaping the structure based on my strategic vision.
“Move that section earlier.” “Combine these two points.” “This needs a stronger opener.” “Add a section addressing the obvious counterargument.” And so on.
By the time I’m done with this back-and-forth, usually about 30 minutes, I’ve got something that looks like a mini-article. It’s got a clear structure, logical flow, and it’s heavily influenced by both my original thinking and the research insights. Most importantly, it already feels like something I would write.
Step 4: Section-by-Section Development
Now comes the actual writing, but in a much more manageable way. Instead of staring at a blank page wondering how to start, I work with AI to flesh out each section one by one.
My guiding principle is to maximize information per word. Every section needs to drive home one key concept or argument, and it needs to do it efficiently. No padding, no fluff, no generic statements that could apply to any article on any topic.
I’ll say something like, “For the section on why most AI content fails, I want to emphasize that it’s not the technology’s fault, it’s how people are using it. Include specific examples of both failure modes, and make sure we’re being concrete rather than abstract.”
Just like with outline creation, I’m working with AI closely to refine each individual section. “Make this more conversational.” “Add a specific example here.” “This paragraph is getting too long, break it up.”
I’ll also directly make edits myself. I add sentences or rewrite something completely. No sentence is untouched by me. AI handles the initial generation and helps maintain consistency, but I ensure the voice, examples, and strategic emphasis stay authentically mine.
Step 5: The Critical Review
Here’s a step most people skip, and it’s what separates good AI-human collaboration from slop.
I ask AI to be my harshest critic.
“Poke holes in this argument.” “Where am I not making sense?” “What obvious questions am I not answering?” “Where could someone legitimately disagree with me?” “What gaps do you see in the logic?”
This critical review often surfaces weaknesses I missed because I was too close to the content. Maybe I’m assuming knowledge my readers don’t have. Maybe I’m making a logical leap without explaining it. Maybe I’m not addressing an obvious counterargument.
I don’t blindly accept the AI’s critique though. Sometimes it gets it wrong, or I just don’t agree with it. But sometimes it gets it right and I fix the issues it identifies.
Step 6: The Sid Touch
Now comes the final step, no AI involved here. I go through the entire article, put myself in the reader’s shoes, and make sure it flows well. I make edits or change things if needed.
I’ll also add a bit of my personality to it. This might be a joke that lightens a heavy section, a personal anecdote that illustrates a point, or just tweaking the language to sound more like how I actually talk.
I call this the “Sid touch” but you can call it something else. Sid touch has a nice ring to it.
“Hey did you finish that article on productivity?”
“Almost! Just giving it a Sid touch.”
See what I did there?
Proof This Actually Works
What used to take me the better part of a day now takes an hour or two tops if I’m being a perfectionist. But more importantly, the quality hasn’t suffered.
I actually think it has improved because the research and outline process is more thorough. The structure is more logical because we’re iterating on it deliberately. The arguments are stronger because I’m actively testing them during the writing process.
I started writing this blog in February this year and I’m already on track to go past 5,000 monthly visitors this month, with hundreds of subscribers. Not because I’m publishing a ton of content (I’m not), but because I’m combining AI’s data processing capabilities with my creativity and strategic thinking to create genuinely useful content.
The Future of Content Writing
If you’re thinking this sounds like too much work and you’d rather create a fully automated AI slop factory, I can promise you that while you may see some results in the short-term, you will get destroyed in the long term.
Platforms will get better at filtering AI slop, just like they learned to handle email spam. It’s already starting to get buried in search results and ignored by readers.
That means the writers who figure out effective human-AI collaboration now will have a massive competitive advantage. While others are either avoiding AI entirely or drowning their audiences in generic content, you’ll be creating genuinely valuable content faster than ever before.
So here’s my challenge to you: audit your current writing process. Are you spending hours on research that AI could handle in minutes? Are you staring at blank pages when you could be starting with a solid structure? Are you avoiding AI because you tried it once and got generic results?
Or maybe you’re on the other extreme, using AI to replace your thinking instead of amplify it?
If so, try the process I’ve outlined. Pick a topic you genuinely care about, dump your thoughts, let AI help with research and structure, then work together section by section while keeping your voice and expertise front and center.
Let me know how it goes!
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
The CEO of Zapier recently announced that they have more AI agents working for them than human employees. Now that sounds exciting and terrifying but the truth is most of the “agents” he listed out are really just simple automations (with some AI sprinkled in).
He is, after all, promoting his own company, which he uses to build these automations.
In this guide, I will show you how to build those same automations on Make.com. It’s designed for business owners, no-code enthusiasts, and consultants looking to automate real-world tasks in marketing, sales, and operations.
The first thing you need to do is create a Make.com account. Sign up here to get one month free on the Pro plan.
I’ve split this guide up into sections for Marketing, Sales, HR, Product, and Customer Support. The following automations are beginner-friendly and the best way to learn is to follow the instructions and build it yourself.
If you’re looking to build more complex AI Agents, I have a full guide here. I also have a free email course which you can sign up for below.
High-Impact Use Cases for Marketing Teams
Write a blog post, summarize it for LinkedIn, create social variations, send campaign results to the team, draft the newsletter…and repeat. Every. Single. Week.
You’re drowning in content demands and half of it is grunt work that AI can now handle. Enter: Make.com + AI.
This combo turns your messy marketing checklist into an elegant flowchart. You write once, and AI helps you remix, repurpose, and report across all your channels.
Here’s what you can automate today:
Turn blog posts into LinkedIn content
Repurpose content into tweets, emails, or IG captions
Summarize campaign performance into Slack reports
Generate social variations for A/B testing
Create email copy from feature releases
Summarize webinars or podcasts for newsletters
Let’s build a few of these together.
Project 1: Blog to LinkedIn Auto-Post
Scenario: You’ve just published a blog post. Instead of opening LinkedIn and crafting a summary from scratch, this automation turns your post into a social-ready snippet instantly.
How it works: Make.com watches your RSS feed for new content. When a new blog post is detected, it sends the blog’s title and content to Claude or OpenAI with a carefully constructed prompt. The AI replies with a LinkedIn-ready post featuring a hook and CTA. This is then routed to Buffer for scheduling or Slack for internal review. All content can be logged to Google Sheets or Airtable for records and team collaboration.
Set the module to check for new posts every X minutes.
Ensure the output includes title, link, and content/excerpt.
AI Content Creation (OpenAI Module)
Add the OpenAI module (ChatGPT model) or Claude by Anthropic.
Create a prompt like:”You are a copywriter creating LinkedIn posts for a B2B audience. Write a short, engaging post that summarizes this blog. Include a 1-line hook, 1–2 insights, and end with a call-to-action. Blog title: {{title}}, Excerpt: {{content}}.”
Choose GPT-4o for Claude Sonnet 4 (I prefer Claude).
Output should be plain text.
Routing Options (Router Node)
You can either insert a Router node after the OpenAI output or do this in a sequence (like my setup above).
Route A: Manual Review
Add Slack module.
Post the AI-generated copy to #marketing-content.
Include buttons for “Approve” or “Revise” via Slack reactions or separate review workflows.
Route B: Auto Schedule
Add Buffer or LinkedIn module.
Schedule the post directly.
Add time delay if needed before posting (e.g., delay by 30 min to allow override).
Log It (Google Sheets or Airtable)
Add a Google Sheets or Airtable module.
Create a row with blog title, link, and generated post.
Optional: Include timestamp and user who approved the content.
Optional Enhancements:
Add a “fallback content” path if AI fails or times out.
Use a Make “Text Parser” to clean up or trim content to fit platform character limits.
Add UTM parameters to links using a “Set Variable” step before publishing.
Why this helps: This flow cuts down repetitive work, ensures content consistency, and keeps your distribution engine running on autopilot with human review only when needed.
Project 2: AI Campaign Performance Digest
Scenario: When I started my career in marketing, over a decade ago, before AI, I would manually compile Google Ads campaign reports every Monday morning. Today, AI does it for you, and shares a clean summary to Slack every morning.
How it works: Make.com runs a scheduled workflow each morning. It pulls campaign data from Google Ads, sends it to GPT-4o with a prompt designed to extract insights, and then posts a summary digest to a Slack channel.
Step-by-Step Walkthrough:
Trigger on Schedule:
Use the “Scheduler” module in Make.
Set the time to run daily at 8:00 AM (or whatever cadence fits your reporting cycle).
Fetch Campaign Data (Google Ads Module):
Add a Google Ads module.
Authenticate with your account and select the appropriate campaign.
Configure it to retrieve key metrics like impressions, clicks, CTR, cost, conversions, and ROAS.
Ensure the output is formatted clearly to be used in the next step.
Summarize Metrics (OpenAI Module):
Add an OpenAI (ChatGPT) module.
Use a system + user prompt combo to ensure structured output: System: “You are a digital marketing analyst summarizing ad performance.” User: “Summarize the following Google Ads metrics in 3 concise bullet points. Highlight performance trends, wins, and concerns. Metrics: {{output from Google Ads module}}”
Choose GPT-4o for better language quality and reliability.
Post to Slack (Slack Module):
Add the Slack module and connect your workspace.
Send the AI summary to your marketing channel (e.g., #ads-daily).
Format the message cleanly using markdown, and optionally include a link to the Google Ads dashboard for deeper inspection.
Log for Reference (Optional):
Add a Google Sheets or Airtable module to log the raw metrics + AI summary.
Include a date stamp and campaign ID for tracking trends over time.
Optional Enhancements:
Add a fallback message if AI output is blank or token limits are exceeded.
Use a router to conditionally summarize different campaign types differently (e.g., brand vs. performance).
Include comparison to previous day or week by pulling two data sets and calculating diffs before sending to GPT.
Why this helps: It delivers a high-signal snapshot of ad performance daily without wasting your time, and keeps everyone on the same page.
Project 3: AI Content Research Assistant
Scenario: You’re planning a new blog post, campaign, or social series and need quick, high-quality content research. Instead of spending hours googling stats, quotes, and trending ideas, let an AI-powered automation do the heavy lifting.
How it works: You input a topic into Airtable (or another database), which triggers a workflow in Make.com. The AI uses that topic to generate:
A list of content angles
Related stats or facts
Popular subtopics or related trends
Potential hooks or titles
Everything gets logged into a Google Sheet or Notion database for review and use.
Step-by-Step:
Trigger: Airtable Record Created
Use the Airtable “Watch Records” module.
Set it to monitor a “Content Ideas” table.
Capture fields like: Topic, Target Audience, Tone (optional).
AI Research Prompt (OpenAI Module):
Add OpenAI ChatGPT-4 module.
Prompt:”You are a content strategist researching ideas for a blog post or campaign. Given the topic ‘{{Topic}}’ and the audience ‘{{Audience}}’, generate:
3 content angles
3 surprising stats or insights with real examples
3 hook ideas or headline starters. Format clearly with numbered sections.”
Parse and Organize (Text Parser or Set Variables):
If needed, extract each section into separate fields using Text Parser or Set Variable modules.
Log to Google Sheets or Notion:
Add a new row with:
Topic
Audience
Generated angles
Hooks/headlines
Suggested stats
Optional Enhancements:
Add a Slack notification: “New content research ready for review!”
Add a filter so only topics marked as “High Priority” trigger AI research.
Why this helps: You eliminate blank-page paralysis and get rich, contextual research for any content initiative without wasting your team’s time or creativity on preliminary digging.
The Sales Bottleneck: Manual Follow-Ups & Cold Data
Sales teams waste hours every week:
Manually sorting through low-quality leads
Writing cold emails from scratch
Logging CRM updates by hand
Missing follow-ups because of clunky tools
With Make.com and AI, you can automate the entire pre-sale pipeline—from qualification to enrichment to personalized outreach—while still keeping it human where it counts.
Project 1: AI Lead Qualification & Outreach Workflow
Scenario: Automatically qualify new leads and kick off personalized outreach. Imagine you have a web form or marketing funnel capturing leads. Instead of manually sifting through them, we’ll build a Make.com workflow that uses AI to evaluate each lead’s potential and respond accordingly. High-quality leads will get a custom email (drafted by AI) and be logged in a CRM, while unqualified ones might get a polite decline or be deprioritized.
How it works: Whenever a new lead comes in (with details like name, company, message, etc.), the workflow triggers. It sends the lead info to an AI (GPT-4o) to determine if the lead is “Qualified” or “Not Qualified,” along with reasoning. Based on the AI’s decision, Make branches into different actions.
Step-by-Step:
Trigger on New Lead:
Use a Webhook module if your lead form sends a webhook
Or use a Google Sheets module if leads are collected there
Or integrate with your CRM (HubSpot, Pipedrive, etc.)
Example: If using a form with a webhook, create a Webhook module to receive new lead data like name, email, company, and message.
AI Qualification (OpenAI):
Add OpenAI (ChatGPT) module
Prompt:System: “You are a sales assistant that qualifies leads for our business.” User: “Lead details: Name: {{name}}, Company: {{company}}, Message: {{message}}. Based on this, decide if this lead is Qualified or Not Qualified for our services, and provide a brief reason. Respond in the format: Qualified/Not Qualified – Reason.”
This gives you structured output like: “Qualified – The company fits our target profile and expressed interest,” or “Not Qualified – Budget mismatch.”
Branching Logic (Router or IF):
Use an IF or Router module to check if the response contains “Qualified.”
Route accordingly:
Qualified → Follow-up path
Not Qualified → Logging or polite response
Qualified Lead Path:
Generate Email: Use another OpenAI module to draft a personalized email:Prompt: “Write a friendly email to this lead introducing our services. Use this info: {{lead data + qualification reasoning}}.”
Send Email: Use Gmail or SMTP module to send the AI-generated message.
Log Lead: Add/update lead in your CRM or Google Sheet.
Unqualified Lead Path:
Polite Decline (Optional): Use GPT to generate a kind “not the right fit” email.
Internal Log: Mark the lead in CRM or Sheet as disqualified.
Test the Workflow:
Use test leads to verify AI outputs and routing logic.
Ensure prompt format is consistent for accurate branching.
Bonus Ideas:
Human-in-the-loop Review: Send AI-drafted email to Slack for approval before sending.
Scoring instead of binary: Ask AI to score Hot, Warm, Cold.
Enrichment before AI: Use Clearbit or Apollo API to add job title, company size, industry.
Why this helps: Your sales team only sees high-quality leads and can follow up instantly with personalized, AI-written messages.
Project 2: AI-Powered CRM Enrichment & Follow-Up
Scenario: Automate enrichment for CRM records and schedule follow-ups based on lead type.
How it works: Whenever a new contact is added to your CRM (or manually tagged), the workflow enriches the contact (e.g. via Clearbit), uses AI to suggest next actions, and schedules a follow-up reminder.
Step-by-Step:
Trigger: Watch for a new CRM contact (e.g., HubSpot “New Contact” trigger).
Enrichment: Call Clearbit or similar API to retrieve job title, company data.
AI Recommendation: Use OpenAI:Prompt: “Based on this lead info, suggest next sales action and urgency level. Respond with a 1-sentence summary.”
Create Task: Add to Trello/Asana/Google Calendar or CRM task board.
Notify Salesperson: Slack message or email summary with AI’s next step.
Why this helps: Keeps your CRM smart and your reps focused on the right next step.
Project 3: AI Deal Progress Updates to Stakeholders
Scenario: Keep internal stakeholders updated as deals progress, without constant emails or meetings.
How it works:
When a deal stage changes in your CRM, AI summarizes deal context and posts an update to a Slack channel or email digest.
Step-by-Step:
Trigger: Watch for deal stage change (e.g. from “Demo” to “Negotiation”).
Pull Context: Use previous notes or contact data.
AI Summary: Prompt:“Summarize this deal update with name, stage, client concern, and next step. Make it brief but informative.”
Send Digest: Post to Slack #deals or email manager/team.
Why this helps: Reduces status meetings while keeping everyone aligned.
Automation For Product Teams
Product managers juggle user feedback, bug reports, feature requests, competitor research, roadmap planning, internal prioritization, and stakeholder updates, all at once. It’s chaos. And most of it is repetitive, noisy, and hard to scale.
Scenario: Users submit feature requests through a form, support tool, or product portal. Instead of manually reviewing each one, this automation summarizes and categorizes requests using AI, then logs them in your product management system.
How it works: A new request triggers the workflow. AI (via GPT-4o) reads and classifies the submission (e.g., UX, performance, integrations), writes a short summary, and sends the data to Airtable or Notion for prioritization.
Step-by-Step:
Trigger: Form Submission or Inbox Monitoring
Use the “Webhook” module if collecting feedback via a form (e.g., Typeform, Tally).
Or use the “Gmail” or “Intercom” module to watch for new support emails or messages.
AI Summarization and Categorization (OpenAI Module):
Add the OpenAI module.
Use the following prompt:”You are a product manager assistant. Summarize the following user feature request in 1–2 sentences. Then categorize it as one of: UX/UI, Performance, Integrations, New Feature, Other. Respond with: Summary: … / Category: …”
Process Output (Text Parser, Set Variable):
If needed, parse out “Summary:” and “Category:” into separate fields.
Log to Product Tracker (Airtable/Notion/Google Sheets):
Add a module to write the summary, category, and source to your product request tracker.
Optional: Add a timestamp and auto-assign priority if source = “VIP” or “internal.”
Bonus Enhancements:
Add Slack notifications to alert the product team when a new high-priority request is submitted.
Use a Router node to auto-tag requests into different buckets (e.g., roadmap now/later/backlog).
Why this helps: Instead of skimming dozens of tickets, PMs see a categorized, summarized list ready to evaluate in minutes.
Project 2: Bug Report Classifier and Assignment
Scenario: Your support team logs bugs from users. Instead of having a PM manually triage and assign each one, this workflow uses AI to determine severity and auto-assigns to the right team or Slack channel.
How it works: When a new bug report is added to your tracking system (e.g., Airtable, Google Sheet, or Intercom), the workflow triggers. GPT-4o reads the bug report, labels it by severity, recommends the team, and routes the report to a Jira board or Slack for resolution.
Step-by-Step:
Trigger: New Bug Logged
Use “Airtable – Watch Records” or “Google Sheets – Watch Rows.”
Trigger on a new row in your “Bugs” table with fields: Description, Environment, App Version, Submitter.
AI Classification (OpenAI Module):
Add the OpenAI module.
Prompt:”You are a technical triage assistant. Read this bug description and assign: a) Severity: Low, Medium, High b) Team: Frontend, Backend, Infra, QA Description: {{bug_text}} Respond: Severity: … / Team: …”
Parse Output:
Use a Text Parser or Set Variable module to extract the fields.
Routing & Assignment (Router + Slack/Jira):
Use a Router module to route based on team.
For each branch:
Slack: Send bug summary to respective team channel
Jira: Create issue with pre-filled metadata
Log Final Record:
Update Airtable/Sheet with AI’s classification, routing action, and date.
Why this helps: Triage happens instantly, teams are alerted without delay, and engineering isn’t bogged down by unclear, unprioritized issues.
Project 3: Competitor Research Digest
Scenario: Your product team wants to monitor competitor news (feature launches, pricing changes, new positioning) but no one has time to check their blogs or Twitter every day. Let automation do it for you.
How it works: Make.com monitors competitor blogs or news feeds using RSS. New content is piped into GPT-4o, which extracts relevant summaries and logs them to Notion or shares them in Slack.
Step-by-Step:
Trigger: RSS Feed Monitoring
Use the “RSS Watch Items” module.
Add feeds from competitor blogs (e.g., /news, /blog/rss).
Trigger the scenario when new items appear.
AI Summary (OpenAI Module):
Add the OpenAI module.
Prompt:”You are a product strategist summarizing competitor updates. Summarize the following blog post in 2–3 sentences. Focus on new features, strategic changes, and pricing or positioning shifts.” Input: {{rss_content}}
Routing and Output:
Slack: Send formatted summary with post link to #product-intel
Notion: Append to a Competitive Insights database (Title, Summary, Source URL, Date)
Optional Enhancements:
Add a keyword filter (e.g., only send if post mentions “AI,” “pricing,” “feature,” etc.)
Use sentiment analysis to mark as positive/negative/neutral (another AI call)
Why this helps: Keeps product and strategy teams aware of external moves without manual research, freeing time for response planning or differentiation work.
Project 4: Generate User Stories from Feedback
Scenario: You’ve collected raw user feedback from forms, surveys, support tickets, or customer interviews. Now you need to turn that messy, unstructured input into clear, actionable user stories. Let AI write the first draft for your backlog.
How it works: Whenever feedback is marked as actionable or tagged with “feature request,” Make.com sends it to GPT-4o. The AI rewrites it in proper user story format and logs it to your dev tracker (Notion, Airtable, Trello, Jira, etc.).
Step-by-Step:
Trigger: Tagged Feedback Entry
Use “Watch Records” (Airtable) or “Watch Database Items” (Notion).
Set a filter: Only run if field ‘Type’ = “Feature Request.”
Prompt AI to Generate User Story (OpenAI Module):
Prompt:”You are a product manager preparing backlog items. Turn this raw feedback into a user story using this format: ‘As a [user role], I want to [goal/action], so that [benefit].’ Feedback: {{feedback_text}}”
Post-processing (Optional):
Add a sentiment analysis module (e.g., another AI call) to assess urgency.
Use Router to assign story to the correct product squad based on keyword/topic.
Log Story:
Notion: Add to product backlog database
Airtable: Insert as a new story row
Jira/Trello: Create new ticket with AI-generated description
Notify Stakeholders (Optional):
Slack alert to product owner: “New story added from feedback: {{story}}”
Why this helps: Turns raw, unstructured user data into clean, consistent backlog items—without product managers rewriting every ticket themselves.
HR Teams: Automate Onboarding and Employee Insights
HR teams are buried under repetitive, time-consuming tasks:
Answering the same policy questions again and again
Sorting resumes manually
Drafting internal emails and updates
AI automations free up time for strategic people ops work while giving employees faster responses and a better experience.
Project 1: AI-Powered HR Slack Assistant
Scenario: Employees constantly ask HR about leave policies, benefits, or internal procedures. This workflow creates an AI-powered Slack bot that answers common questions instantly.
How it works: Employees post questions in a designated Slack channel. Make.com captures the question, sends it to GPT-4 (with your handbook or policies as context), and posts the AI-generated answer back in the thread.
Step-by-Step:
Trigger:
Use Slack’s “Watch Messages in Channel” module
Monitor #ask-hr or a similar channel
AI Response (OpenAI):
Prompt: “You are an HR assistant. Use the following context from our handbook to answer questions. If you don’t know the answer, say so. Question: {{message_text}}.”
Provide a static section of company policy or use a database API to insert context
Respond in Thread:
Post the AI-generated answer as a reply to the original Slack message
Fallback Handling:
If AI is unsure, route to a human HR rep with a notification
Why this helps: Reduces HR interruptions while improving employee experience with instant, contextual answers.
Project 2: Resume Screening Assistant
Scenario: You receive a high volume of applicants and need to quickly assess fit for a role based on resumes and job descriptions.
How it works: Applicants submit resumes through a form or ATS. Make.com collects the submission, sends it to GPT-4 with the job description, and receives a scored summary with a short rationale.
Step-by-Step:
Trigger:
Watch new form submission or integrate with ATS (e.g., Google Form, Typeform)
Collect name, resume text (or file), and job applied for
AI Fit Evaluation (OpenAI):
Prompt: “You are an HR recruiter. Based on this job description and the applicant resume, rate this candidate as High, Medium, or Low Fit. Provide a 1-sentence reason.” Input: {{resume}}, {{job_description}}
Parse Response:
Extract score and reason using text parser or Set Variable
Log Result:
Add to Airtable or Google Sheet for internal review
Optional:
Notify hiring manager via Slack if rating is “High Fit”
Why this helps: Quickly filters high-potential candidates without sifting through every resume manually.
Scenario: New hires need to go through onboarding. Instead of sending the same emails manually or giving them generic documents, generate a tailored onboarding plan.
How it works: When a new hire is added to Airtable or your HRIS, GPT-4o generates a personalized onboarding checklist and intro email based on their role and department.
Step-by-Step:
Trigger:
Watch for a new employee record in Airtable (or Google Sheet or BambooHR)
Generate Plan (OpenAI):
Prompt: “You are an HR onboarding assistant. Based on this employee’s name, role, and department, write a custom onboarding checklist for their first 2 weeks. Also generate a welcome email.”
Send Outputs:
Email the onboarding checklist and welcome message to the new hire
Optionally send a copy to their manager
Log or Archive:
Save plan to a shared onboarding doc or Notion database
Why this helps: Makes onboarding feel personal and organized without HR lifting a finger.
Support Teams: Automate Ticket Triage And Responses
Customer support is repetitive by nature, but that doesn’t mean it should be manual. With AI and automation, you can:
Instantly classify and route tickets
Auto-draft replies to common questions
Summarize conversations for handoffs or escalations
Proactively flag critical issues
Let’s break down a few powerful workflows you can launch today.
Project 1: AI-Powered Ticket Triage Bot
Scenario: Incoming support tickets vary widely. Some are technical, others are billing-related, some are spam. Instead of human agents triaging each one manually, AI can analyze and route them to the right person or tool.
How it works: Make.com monitors your support inbox or form. For each new ticket, GPT-4o classifies it (Billing, Technical, Account, Spam) and assigns urgency. Based on the result, the ticket is routed to the correct Slack channel, person, or tool.
Step-by-Step:
Trigger:
Watch new entries from Gmail, HelpScout, Intercom, or a form tool like Typeform.
Capture subject, message body, and metadata.
Classify Ticket (OpenAI):
Prompt:“You are a support assistant. Read the message below and categorize it as one of: Billing, Technical, Account, Spam. Also assign an urgency level (Low, Medium, High). Respond like: Category: …, Urgency: …”
Parse Output:
Use a Text Parser or Set Variable module to extract Category and Urgency
Route Based on Logic:
Use a Router or Switch module
Route Technical → #support-dev, Billing → #support-billing, etc.
Notify urgent issues in a priority Slack channel or tag a team lead
Log for Analytics (Optional):
Save categorized tickets to Airtable or Sheets for trend tracking
Why this helps: Your team spends less time sorting and more time solving. Escalations are never missed.
Project 2: AI Auto-Responder for Common Questions
Scenario: Many support tickets are variations of the same FAQ: password resets, refund policies, shipping delays. Let AI draft helpful responses automatically, ready for human review or direct sending.
How it works: When a new ticket arrives, GPT-4o reviews the content and drafts a relevant reply using company policy snippets or a knowledge base.
Step-by-Step:
Trigger:
Monitor new support tickets via Help Desk or form integration
Draft Response (OpenAI):
Prompt:“You are a support rep. Read this customer message and write a helpful reply using our policies: {{kb_snippets}}. Message: {{ticket_text}}”
Review Flow:
Send AI draft to Slack for human review (or assign a Google Doc comment task)
Use Slack emoji as approval trigger, or set manual override option
Send Response:
Upon approval, send email via Gmail, Outlook, or HelpDesk API
Why this helps: Reduces response time for repetitive inquiries and gives your team a first draft to edit instead of starting from scratch.
Project 3: Conversation Summary Generator for Escalations
Scenario: When tickets get escalated across teams, agents spend time writing summaries of what’s happened so far. Use AI to generate this summary instantly.
How it works: When a ticket is tagged for escalation or transfer, Make.com grabs the conversation thread and asks GPT-4o to summarize the key points.
Step-by-Step:
Trigger:
Tag change or status update in HelpScout/Intercom (e.g., “Escalated”)
Summarize Conversation (OpenAI):
Prompt:“Summarize this customer support conversation: who is the customer, what’s the issue, what’s been tried, and what’s needed next. Format as: Summary: … / Next Action: …”
Send to Escalation Path:
Post to Slack or assign Jira/Trello task with summary included
Tag original agent and team lead
Why this helps: Handoffs are cleaner, faster, and no critical context is lost.
Start with something small
The best way to get started building automations is with something small and non-critical. If it fails, it shouldn’t bring the house down.
Over time, as you get comfortable with this, you can add on more complexity and transition from automations to autonomous AI agents.
I was at Web Summit Vancouver last week, a tech conference where the only topic of every conversation was, surprise surprise, AI! As someone who has been in the space for years, well before the ChatGPT boom, I was excited to talk to my fellow nerds about the latest tools and tech.
And I was shocked to find that many attendees, including product managers and developers, hadn’t even heard of the AI tools I used most, like Claude and Cursor.
I’ve already written guides on Claude so I figured I’d do one for Cursor. This guide is for you if you’re:
A complete coding beginner who’s heard the vibe coding hype and wants to skip the “learn syntax for six months” phase
A seasoned developer curious about AI coding tools but tired of switching between ChatGPT tabs and your IDE
Someone who tried Cursor once, got confused by all the modes and features, and gave up
By the end, you’ll know exactly how to use Cursor’s three main modes, avoid the common pitfalls that trip up beginners, and build real projects.
Installation and First Contact
Time for the least exciting part of this guide: getting Cursor on your machine. Head to cursor.com and download the application (revolutionary, I know). The installation is standard “next, next, finish” territory, so I won’t insult your intelligence with screenshots.
If you’re familiar with other IDEs, like VS Code, then Cursor won’t look too different. In fact, it’s literally a fork of VS Code. Your muscle memory, keyboard shortcuts, and extensions all work exactly the same. You can install Cursor and use it as a drop-in VS Code replacement without touching a single AI feature.
But why would you want to do that when you could have a coding superpower instead?
Open one of your existing projects in Cursor and hit Cmd+L (Mac) or Ctrl+L (Windows/Linux). That’s your AI sidebar. Type something like “explain what this file does” and watch as Cursor not only explains your code but suggests improvements you hadn’t even thought of.
This is your first taste of what makes Cursor different. It’s not pulling generic answers from the internet, or generating something irrelevant. It’s analyzing your actual project and giving you contextual, relevant help. Let’s explore the different ways it can do this.
If you don’t have an existing project, ask Cursor to create one! Just type in “Generate a simple HTML file about pizza toppings” or whatever strikes your fancy, and watch the magic.
The Three Modes of Cursor
Cursor has three main ways to interact with AI, and knowing when to use each one is like knowing when to use a scalpel versus a sledgehammer. Both are tools, but context matters.
Ask Mode: Your Coding Sherpa
Think of Ask mode as your personal Stack Overflow that actually knows your project. Hit Cmd+L (or Ctrl+L) to open the sidebar, make sure “Ask” is selected in the dropdown, and start asking questions.
I often use this if I’m returning to a project I haven’t looked at in a couple of days, or if I’m trying to understand why Cursor generated code in a certain way. It’s also a great way to learn how to code if you’re not a professional.
You can ask it something specific, like what does this function do, all the way to asking it how an entire codebase works. I encourage you to also ask it to explain itself and some of the architectural decisions it makes.
Examples:
“What does this function do and why might it be slow?”
“What are other ways to implement this functionality”
“How would you approach adding authentication to this app?”
“What are the potential security issues in this code?”
Ask mode is read-only so it won’t change your code. It’s purely for exploration, explanation, and planning. Treat it like Google, but Google that knows your specific codebase inside and out.
Pro Tip: Ask follow up questions to deeper understanding, request alternative approaches to problems, and use it to understand AI-generated code before implementing.
Agent Mode: The Code Wizard
This is where the magic happens. Agent mode (formerly called “Composer”) can actually make changes to your code, create new files, and work across your entire project.
You tell it to do something, and it just does it, from adding new text to a page, all the way to creating an entire new feature with multiple pages, functions, and components.
It can even run commands in the terminal, like installing a new package or committing changes to Git.
Examples:
“Build a login form with validation”
“Create a new branch for the onboarding feature”
“Create a REST API for managing user profiles”
“Refactor this component to use TypeScript”
Agent mode takes into context your entire codebase to understand relationships between different parts and create or modify multiple files. If you ask it to make wholesale change, it will literally go off and generate tons of code across multiple files.
Pro Tip: Start with clear, specific requirements and review changes before accepting them. Use version-control like Git at every step.
Edit Mode: The Precision Tool
Edit mode is for making smaller, more precise edits. To use this, you need to select some code in the editor and you’ll get a little menu with options to add to chat or edit.
Selecting edit opens up edit mode where you can ask the AI to make changes to that piece of code. You might want to use this when making small tweaks to existing code, refactoring a single function, or a quick bug fix.
YOLO Mode
There’s a secret fourth mode in Cursor called YOLO mode. Ok it used to be called YOLO Mode but they’ve changed it to the less scary “auto-run mode”.
This mode lets the AI run terminal commands automatically. You may have noticed in your tests so far, especially in Agent mode, that it pauses and asks if it can install a package or spin up a dev server.
If you select auto-run mode, it executes these commands without your consent. This is obviously a risky thing so I suggest you limit it to certain commands, like running tests. That way, when you ask Agent to build a new feature and test it, it does so automatically without your active involvement.
Choosing Your Mode
“I want to understand something” → Ask mode
“I want to build/change something” → Agent mode
“I want a tiny, precise change” → Edit mode (or just use Agent)
Here’s a practical exercise to try all three:
Ask mode practice: Open your HTML file and ask “What would make this webpage more accessible?”
Agent mode practice: Tell Agent “Add a CSS file that makes this webpage look modern with a color scheme and better typography”
Edit mode practice: Select the page title and ask Edit to “Change this to something more creative”
Context is king
Cursor is only as good as the context you give it. The AI can only work with what it can see, so learning to manage context effectively is the difference between getting decent results and getting mind-blowing ones.
When you open the AI sidebar, look at the bottom and you’ll see an option to “@add context”. This is where you add files, folders, or specific functions to the conversation.
The @ symbol: Click the @ symbol or type it in to chat to see what files Cursor suggests. This tells the AI “pay attention to this specific file.”
You can reference specific files, folders, or even certain functions
@docs can pull in documentation if available
@components/ includes your entire components folder
@package.json includes just that file
The # symbol: Use this to focus on specific files.
The / symbol: Before starting a complex task, open the files you think are relevant to that task, then use the “/” command in Agent mode to “Add Open Files to Context.” This automatically adds them all to context.
The .cursorignore File
Create a .cursorignore file in your project root to exclude directories the AI doesn’t need to see:
JSON
node_modules/dist/.env*.logbuild/
This keeps the AI focused on your actual code instead of getting distracted by dependencies and build artifacts.
Context Management Strategy
Think of context like a conversation. If you were explaining a coding problem to a colleague, you’d show them the relevant files, not your entire codebase. Same principle applies here.
Good context: Relevant files, error messages, specific functions you’re working on
Bad context: Your entire project, unrelated files, yesterday’s lunch order
Similarly, when you have long conversations, the context (which is now your entire conversation history) gets too long and the AI tends to lose track of your requirements and previous decisions. You’ll notice this when the AI suggests patterns inconsistent with your existing code or forgets constraints you mentioned earlier.
To avoid this, make it a habit to start new conversations for different features or fixes. This is especially important if you’re moving on to a new task where the context changes.
Beyond giving it the right context, you can also be explicit about what not to touch: “Don’t modify the existing API calls”. This is a form of negative context, telling the AI to work in a certain space but avoid that one spot.
Documentation as context
One of the most powerful but underutilized techniques for improving Cursor’s effectiveness is creating a /docs folder in your project root and populating it with comprehensive markdown documentation.
I store markdown documents of the project plan, feature requirements, database schema, and so on. That way, Cursor can understand not just what my code does, but why it exists and where it’s heading. It can then suggest implementations that align with my broader vision, catch inconsistencies with my planned architecture, and make decisions that fit my project’s specific constraints and goals.
This approach transforms your documentation from static reference material into active guidance that keeps your entire development process aligned with your original vision.
Cursor Rules
Imagine having to explain your coding preferences to a new team member every single time you work together. Cursor Rules solve this problem by letting you establish guidelines that the AI follows automatically, without you having to repeat yourself in every conversation.
Think of rules as a mini-prompt that runs behind the scenes every time you interact with the AI. Instead of saying “use TypeScript” and “add error handling” in every prompt, you can set these as rules once and the AI will remember them forever.
Global Rules vs. Project Rules
User Rules: Apply to every project you work on. Think of these as your personal preferences you bring to any codebase.
Project Rules: Specific to each codebase. These are the rules your team agrees on and ensure consistency across all contributors.
Examples That Work in Practice
For TypeScript projects:
JSON
- Always use TypeScript strict mode- Prefer function declarations over arrow functions for top-level functions- Use meaningful variable names, no single letters except for loops- Add JSDoc comments for complex functions- Handle errors explicitly, don't ignore them
For Python projects:
JSON
- Use type hints for all function parameters and return values- Follow PEP 8 style guidelines and prefer f-strings for formatting- Handle errors with specific exception types, avoid bare except clauses- Write pytest tests for all business logic with descriptive test names- Use Pydantic for data validation and structured models- Include docstrings for public functions using Google style format- Prefer pathlib over os.path and use context managers for resources
For any project:
JSON
- Write tests for all business logic- Use descriptive commit messages- Add comments for complex algorithms- Handle edge cases and error states- Performance matters: avoid unnecessary re-renders and API calls
Use Cursor itself to write your rules. Seriously. Ask it to “Generate a Project Rules file for a TypeScript project that emphasizes clean code, accessibility, and performance.”
The AI knows how to write content that other AIs understand.
Pro Tip: Create different .cursorrules files for different types of projects. Keep a frontend-rules.md, backend-rules.md, and fullstack-rules.md that you can quickly copy into projects.
Communicating With Cursor
Here’s the thing about AI: it’s incredibly smart and surprisingly literal. The difference between getting decent results and getting “how did you do that?!” results often comes down to how you communicate.
Be Specific
As with any AI, the more specific you are, the better the output. Don’t just say, “fix the styling.” Say “Add responsive breakpoints for mobile (320px), tablet (768px), and desktop (1024px+) with proper spacing and typography scaling”.
You don’t need to know the technical details to be specific about the outcome you want. Saying “Optimize this React component by memoizing expensive calculations and reducing re-renders when props haven’t changed” works better than just “Optimize this component” even though you’re not giving it detailed instructions.
Take an Iterative Approach
Start broad, then narrow down:
“Build a todo app with React”
“Add user authentication to this todo app”
“Make the todo items draggable for reordering”
“Add due dates and priority levels”
Each step builds on the previous work. The AI maintains context and creates consistent patterns across features.
Use Screenshots
Take screenshots of:
UIs you want to replicate
Error messages you’re getting
Design mockups from Figma
Code that’s confusing you
Paste them directly into the chat. The AI can read and understand visual information surprisingly well.
Treat it like a coworker
Explain your problem like you’re talking to a colleague:
“I have this React component that’s supposed to update when props change, but it’s not re-rendering. The props are coming from a parent component that fetches data from an API. I think it might be a dependency issue, but I’m not sure.”
This gives the AI context about what you’re trying to do, what’s happening instead, and your initial hypothesis.
The Context Sandwich
Structure complex requests like this:
Context: “I’m building a shopping cart component”
Current state: “It currently shows items and quantities”
Desired outcome: “I want to add coupon code functionality”
Constraints: “It should validate codes against an API and show error messages”
This format gives the AI everything it needs to provide accurate, relevant solutions.
Common Prompting Mistakes
Making Assumptions: Don’t assume the AI knows what “correct” means in your context. Spell it out by describing expected outcomes. “This function should calculate tax but it’s returning undefined. Here’s the expected behavior…”
Trying to do everything at once: When you tell the AI to “Build a complete e-commerce site with authentication, payment processing, inventory management, and admin dashboard” it is definitely going to go off the rails at some point.
Start small and build incrementally. The AI works better with focused requests.
Describing solutions: Describe the problem, not the solution. The AI might suggest better approaches than you initially considered. Instead of “Use Redux to manage this state”, say “I need to share user data between multiple components”
Overloading context: Adding every file in your project to context doesn’t help, it hurts. The AI gets overwhelmed and loses focus. Be selective about what’s actually relevant.
Debugging Your Prompts
Good prompting is a bit of an art. A small change in a prompt can lead to massive changes in the output, so Cursor may often go off-script.
And that’s totally fine. If you catch it doing that, just hit the Stop button and say “Wait, you’re going in the wrong direction. Let me clarify…”
Sometimes it’s better to start a new conversation with a refined prompt than to keep correcting course. When you do this, add constraints like “keep the current component structure” to stop it from going down the same direction.
Good prompting is iterative:
Initial prompt: Get something working
Refinement: “This is close, but change X to Y”
Polish: “Add error handling and improve the user experience”
Test: “Write tests for this functionality”
The Psychology of AI Collaboration
The AI is incredibly capable but not infallible. There’s a small area between treating it like a tool and constraining it too much, and treating it like a coworker and letting it run free. That’s where you want to play.
Always review the code it generates, especially for:
Security-sensitive operations
Performance-critical sections
Business logic validation
Error handling
Don’t just copy-paste the code. Read the AI’s explanations, understand the patterns it uses, and notice the techniques it applies. You’ll gradually internalize better coding practices.
If the AI suggests something that doesn’t feel right, question it. Ask “Why did you choose this approach over alternatives?” or “What are the trade-offs here?”
The AI can explain its reasoning and might reveal considerations you hadn’t thought of. Or it could be flawed because it doesn’t have all the necessary context, and you may be able to correct it.
Putting it all together
Here’s a complete example of effective AI communication:
Context: “I’m building a React app that displays real-time stock prices”
Current state: “I have a component that fetches data every 5 seconds, but it’s causing performance issues”
Specific request: “Optimize this for better performance. I want to update only when prices actually change, handle connection errors gracefully, and allow users to pause/resume updates”
Constraints: “Don’t change the existing API structure, and make sure it works on mobile devices”
This prompt gives the AI everything it needs: context, current state, desired outcome, and constraints. The response will be focused, relevant, and actionable.
Common Pitfalls
Every Cursor user goes through the same learning curve. You start optimistic, hit some walls, wonder if AI coding is overhyped, then suddenly everything clicks. Let’s skip the frustrating middle part by learning from everyone else’s mistakes.
The “Build Everything at Once” Trap
The mistake: Asking for a complete e-commerce platform with authentication, payment processing, inventory management, admin dashboard, and mobile app in a single prompt.
Why it fails: Even the smartest AI gets overwhelmed by massive requests. You’ll get generic, incomplete code that barely works and is impossible to debug.
The fix: Start with the smallest possible version. Build a product catalog first, then add search, then user accounts, then payment processing. Each step builds on solid foundations.
Good progression:
“Create a simple product listing page”
“Add search functionality to filter products”
“Create a shopping cart that stores items”
“Add user registration and login”
“Integrate payment processing”
The Context Chaos Problem
The mistake: Adding every file in your project to the AI’s context because “more information is better.”
Why it fails: Information overload makes the AI lose focus. It’s like trying to have a conversation in a crowded restaurant, too much noise drowns out the important signals.
The fix: Be surgical with context. Only include files that are directly relevant to your current task.
Bad context: Your entire components folder, all utilities, config files, and documentation Good context: The specific component you’re modifying and its immediate dependencies
The “AI Will Figure It Out” Assumption
The mistake: Giving vague instructions and expecting the AI to read your mind about requirements, constraints, and preferences.
Why it fails: The AI is smart, not psychic. “Make this better” could mean anything from performance optimization to visual redesign to code refactoring.
The fix: Be specific about what “better” means in your context.
Vague: “Fix this component” Specific: “This React component re-renders too often when props change. Optimize it using React.memo and useMemo to prevent unnecessary renders.”
The Copy-Paste Syndrome
The mistake: Blindly copying AI-generated code without understanding what it does.
Why it fails: When (not if) something breaks, you’ll have no idea how to fix it. Plus, you miss learning opportunities that make you a better developer.
The fix: Always ask for explanations. “Explain what this code does and why you chose this approach.”
What to do when shit inevitably hits the fan
You may avoid all the pitfalls above and still see the AI go off track. It starts modifying files you didn’t want changed, adds unnecessary complexity, or ignores your constraints.
The first thing you should do is hit the stop button. You can then let it know it’s going in the wrong direction. Even better, start a new conversation with clearer instructions and additional constraints.
Another common pattern is when the AI makes a change, sees an error, tries to fix it, creates a new error, and gets stuck in a cycle of “fixes” that make things worse.
If you see the same type of error being “fixed” multiple times, stop the process and revert to the last working state.
Here are some other warning signs that things are going off track:
It keeps apologizing and starting over
Solutions get more complex instead of simpler
It suggests completely different approaches in each attempt
Error messages persist despite multiple “fixes”
Then use one of the following debugging methods.
The Logging Strategy
When things aren’t working and you can’t figure out why:
Ask the AI to add detailed logging
Run the code and collect the output
Paste the logs back to the AI
Let it analyze what’s actually happening vs. what should happen
Example prompt: “Add console.log statements to track the data flow through this function. I’ll run it and share the output so we can debug together.”
The Rollback and Retry Method
When the AI made changes that broke more than they fixed:
Use Cursor’s built-in history to revert changes
Identify what went wrong in your original prompt
Start a new conversation with better context
Be more specific about constraints and requirements
The “Explain Your Thinking” Technique
When the AI gives you code that seems wrong or overly complex:
“Explain why you chose this approach. What are the trade-offs compared to [simpler alternative]?”
Often the AI has good reasons you didn’t consider. Sometimes it reveals that there’s indeed a simpler way.
The Test-Driven AI Approach
TDD (Test Driven Development) is a common (and standard) practice in web development. However, with vibe coding, it seems like people have forgotten about it.
But, as the saying goes, prevention is better than cure. Following tried and tested practices like TDD will save you a ton of headache and rework.
In fact, with AI, it becomes a superpower. AI can write tests faster than you can think of edge cases, and those tests become a quality guarantee for the generated code.
This single prompt pattern will revolutionize how you build features:
“Write comprehensive tests for [feature] first, then implement the code, then run the tests and iterate until all tests pass.”
Here’s an example prompt for building a new React component:
JSON
"Write tests that verify this component:1. Renders correctly with different props2. Handles user interactions properly3. Manages state changes4. Calls callbacks at the right times5. Handles error states gracefullyThen implement the component to pass all tests."
Watch this workflow in action:
AI writes tests based on your requirements
AI implements code to satisfy the tests
Tests run automatically (with YOLO mode enabled)
AI sees failures and fixes them iteratively
You get working, tested code without writing a single test yourself
Advanced Tips and Tricks
The Bug Finder
Hit Cmd+Shift+P (or Ctrl+Shift+P) and type “bug finder.” This feature compares your changes to the main branch and identifies potential issues you might have introduced.
It’s not perfect, but it catches things like:
Forgot to handle null values
Missing error handling
Inconsistent variable usage
Logic errors in conditional statements
Image Imports
This one sounds fake until you try it. You can literally paste screenshots into Cursor’s chat and it will understand them. Take a screenshot of:
A UI mockup you want to build
An error message you’re getting
A design you want to replicate
Paste it in the chat with your prompt and watch the AI work with visual information. It’s genuinely impressive.
Tab Tab Tab
Cursor’s tab completion doesn’t just complete your current line, it can suggest entire functions, predict what you’re about to write next, and even jump you to related code that needs updating.
The AI analyzes your recent changes and predicts your next move. When it’s right (which is surprisingly often), it feels like magic.
AI Models and Selection Strategy in Cursor
Cursor offers access to the latest generation of AI models, each with distinct strengths and cost profiles that suit different development scenarios.
Claude Sonnet 4 is my current go-to choice for most development tasks. It significantly improves on Sonnet 3.7’s capabilities, achieving a state-of-the-art 72.7% on SWE-bench. Use this for routine development tasks like building React components, writing API endpoints, or implementing standard features.
Claude Opus 4 represents the premium tier for the most challenging problems. It is expensive but pays for itself in time saved when you’re tackling architectural decisions, complex refactoring across multiple files, or debugging particularly stubborn issues.
OpenAI’s o3 is a good premium alternative and particularly strong in coding benchmarks, with the high-effort version achieving 49.3% on SWE-bench and excelling in competitive programming scenarios.
GPT-4o remains a solid and cheaper alternative, especially for multilingual projects or when you need consistent performance across diverse tasks. While it tends to feel more generic compared to Claude’s natural style, it offers reliability and broad capability coverage.
Gemini 2.5 Pro is also one of my favorites as it combines reasoning with coding, leading to much better performance. It is also the cheapest and fastest of models, though I use it primarily for planning out an app.
In most cases, you’ll probably just be using one model for the bulk of your work, like Sonnet 4 of GPT-4o, and you can upgrade to a more expensive model like o3 or Opus 4 for complex tasks.
mCP and Integrations
MCP (Model Context Protocol) connects Cursor to external tools and data sources, turning it into a universal development assistant. Need to debug an issue? Your AI can read browser console logs, take screenshots, and run tests automatically. Want to manage your project? It can create GitHub issues, update Slack channels, and query your database, all through natural conversation.
What MCP is and how it works is out of scope of this already long article, so read my guide here. In this section I’ll explain how to set it up and which servers to use.
Setting Up MCP in Cursor
Getting started with MCP in Cursor involves creating configuration files that tell Cursor which MCP servers to connect to and how to authenticate with them.
For project-specific tools, create a .cursor/mcp.json file in your project directory. This makes MCP servers available only within that specific project (perfect for database connections or project-specific APIs). For tools you want across all projects, add them in your settings.
The configuration uses a simple JSON format. Here’s how to set up the GitHub MCP server:
The MCP ecosystem has exploded with hundreds of available servers, but several have emerged as must-haves for serious development work.
GitHub MCP Server – create issues, manage pull requests, search repositories, and analyze code changes directly within your coding conversation. When debugging, you can ask “what changed in the authentication module recently?” and get immediate insights without leaving your editor.
Slack MCP Server – read channel discussions, post updates about builds or deployments, and even summarize daily standups. This becomes particularly powerful for debugging when team members report issues in Slack. Your AI can read the problem descriptions and immediately start investigating.
PostgreSQL MCP Server gives your AI the ability to inspect schemas and execute read-only queries. You can ask “show me all users who logged in yesterday” or “analyze the performance of this query” and get immediate, accurate results.
Puppeteer MCP Server gives your AI browser automation superpowers. When building web applications, your AI can take screenshots, fill forms, test user flows, and capture console errors automatically. This creates a debugging workflow where you describe a problem and watch your AI reproduce, diagnose, and fix it in real-time.
File System MCP Server seems basic but proves incredibly useful for project management. Your AI can organize files, search across codebases, and manage project structures intelligently. Combined with other servers, it enables workflows like “analyze our React components for unused props and move them to an archive folder.”
Advanced MCP Workflows in Practice
The real power of MCP emerges when multiple servers work together to create sophisticated development workflows. Consider this scenario: you’re building a web application and users report a bug through Slack. Here’s how an MCP-enhanced Cursor session might handle it:
First, the Slack MCP reads the bug report and extracts key details. Then, the GitHub MCP searches for related issues or recent changes that might be relevant. The File System MCP locates the relevant code files, while the PostgreSQL MCP checks if there are database-related aspects to investigate.
Your AI can then use the Puppeteer MCP to reproduce the bug in a browser, capture screenshots showing the problem, examine console errors, and test potential fixes. Finally, it can create a detailed GitHub issue with reproduction steps, propose code changes, and post a summary back to Slack, all through natural conversation with you.
This level of integration transforms debugging from a manual, time-consuming process into an assisted workflow where your AI handles the tedious investigation while you focus on architectural decisions and creative problem-solving.
Custom MCP Server Creation
While the existing ecosystem covers many common needs, building custom MCP servers for company-specific tools often provides the highest value. The process is straightforward enough that a developer can create a basic server in under an hour.
Custom servers excel for internal APIs, proprietary databases, and specialized workflows. For example, a deployment pipeline MCP server could let your AI check build status, trigger deployments, and analyze performance metrics. A customer support MCP server might connect to your ticketing system, allowing AI to help triage issues or generate response templates.
A real-World workflow
Building real applications with Cursor requires a different mindset than traditional development. Instead of diving straight into code, you start by having conversations with your AI assistant about what you want to build.
Let’s say we want to build a project management tool where teams can create projects, assign tasks, and track progress. It’s the kind of application that traditionally takes week, maybe months, to develop, but with Cursor’s AI-assisted approach, we can have a production-ready version in days.
Foundation
Traditional projects start with wireframes and technical specifications. With Cursor, you’d start with Agent mode and a conversation about what you’re trying to build. You describe the basic concept and use the context sandwich method we covered earlier:
Context: “Building a team project management tool” Current state: “Just an idea, need MVP definition” Goal: “Users can create projects, assign tasks, track progress” Constraints: “3-week timeline, needs to scale later”
The AI would break this down into clear MVP features and suggest a technology stack that balances rapid development with future scalability. More importantly, it would design a clean database schema with proper relationships.
Save all of these documents in a folder in your project for the AI to reference later.
Core Features
Start building each feature one by one. Use the test-driven development approach I mentioned earlier, and start small with very specific context.
Connect GitHub and Database MCP servers to let the AI commit code and inspect the database in real-time.
You can even set up a Slack MCP for the AI to update you or read new tickets.
Follow the same pattern for every feature – tasks tracking, user permissions, etc.
Don’t forget to keep testing the product locally. Even with the test-driven approach, the AI might miss things, so ask it to use the logging technique described earlier to help debug potential issues.
Productionizing
As your app gets ready, you may want to start thinking about performance and production-readiness.
Ask the AI to proactively analyze your app for potential failure points and implement comprehensive error handling.
I also often ask it to find areas for refactoring and removing unnecessary code.
For performance optimization, ask the AI to implement lazy loading, database indexing, and caching strategies while explaining the reasoning behind each decision.
Launch and iterate
The monitoring and debugging workflows we covered earlier would prove essential during launch week. The AI would have generated comprehensive logging and performance tracking, so when real users start using your app, you’d have visibility into response times, error rates, and user behavior patterns from day one.
When users request features you hadn’t planned (keyboard shortcuts, bulk operations, calendar integration, etc) the iterative refinement approach combined with MCP would make these additions straightforward.
Each new feature would build naturally on the existing patterns because the AI maintains architectural consistency while MCP servers provide the external context needed for complex integrations.
Your Turn
Hopefully this article demonstrates a fundamentally different approach to software development. Instead of fighting with tools and configurations, you’re collaborating with an AI partner that understands your goals and helps implement them efficiently.
The skills you develop transfer to any technology stack: thinking architecturally, communicating requirements clearly, and iterating based on feedback. Most importantly, you gain confidence to tackle ambitious projects. When implementation details are handled by AI, you can focus on solving interesting problems and building things that matter.
I’d love to support you as you continue on your journey. My blog is filled with detailed guides like this, so sign up below if you want the latest deep dives on AI.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Microsoft CEO Satya Nadella recently declared that “we’ve entered the era of AI agents,” highlighting that AI models are now more capable and efficient thanks to groundbreaking advancements in reasoning and memory.
Google recently announced a whole slew of new agentic tools in their recent I/O conference.
Every major tech company is going all in on agents. 61% of CEOs say competitive advantage depends on who has the most advanced generative AI, and Gartner predicts that by 2028, at least 15% of daily work decisions will be made autonomously by agentic AI.
If you’re an executive trying to understand what this means for your organization, this guide is for you. Let’s dive in.
Understanding Agentic AI and Its Business Implications
Agentic AI refers to a system or program that is capable of autonomously performing tasks on behalf of a user or another system by designing its workflow and using available tools.
Unlike traditional AI that responds to prompts, agentic AI exhibits true “agency”, or the ability to:
Make autonomous decisions, analyze data, adapt, and take action with minimal human input
Use advanced reasoning in their responses, giving users a human-like thought partner
Process and integrate multiple forms of data, such as text, images, and audio
Learn from user behavior, improving over time
When I talk to clients, I often tell them to treat an agent like an AI employee. A well-designed agent can take an existing, manual process, and completely automate it, leading to:
Productivity Gains: A Harvard Business School study showed consultants with access to Gen AI completed tasks 22% faster and 40% better
Decision Speed: Most C-suite leaders spend 40% of their time on routine approvals, like pricing decisions or supplier evaluations, which could be automated
Cost Reduction: Studies reveal that implementation of AI agents has led to over a 15% reduction in compliance costs and a more than 46% increase in revenue for numerous organizations
Strategic Use Cases for Agentic AI
Automating existing processes is the most obvious and low-hanging use case for organizations. Any business process that is manual, time-consuming, and does not require human judgement can and should be automated with an agent.
Customer Experience Transformation
Gartner predicts that agentic AI will autonomously resolve 80% of common customer service issues without human intervention by 2029, leading to a 30% reduction in operational costs:
24/7 Customer Support: AI agents in call centers orchestrate intelligence and automation across multiple activities involved in serving customers, simultaneously analyzing customer sentiment, reviewing order history, accessing company policies and responding to customer needs
Personalized Engagement: AI agents can learn from previous interactions and adapt to individual requirements in real time, enabling greater personalization than ever before
Knowledge Worker Augmentation
A major bottleneck in many corporations is finding the right information at the right time and working with hundreds of documents across multiple platforms:
Document Processing: Dow built an autonomous agent to scan 100,000+ shipping invoices annually for billing inaccuracies, expecting to save millions of dollars in the first year
Sales Automation: Fujitsu’s AI agent boosted sales team productivity by 67% while addressing knowledge gaps and allowing them to build stronger customer relationships
Supply Chain and Operations Automation
The supply chain represents perhaps the most compelling use case for agentic AI, with the global AI in supply chain market projected to reach $157.6 billion by 2033.
Predictive Logistics: AI agents can autonomously optimize the transportation and logistics process by managing vehicle fleets, delivery routes and logistics on a large scale
Inventory Management: AI-powered supply-chain specialists can optimize inventories on the fly in response to fluctuations in real-time demand
Risk Management: AI agents regularly monitor world events like pandemics, political unrest, and economic shifts to assist companies in proactively managing supply chain risks
Product and Service Innovation
Development Acceleration: AI-powered virtual R&D assistants save researchers significant time by finding relevant academic papers, patents, and technical documents from large databases.
Market Intelligence: Teams can gather data, identify trends, build marketing assets, inform research and move products to market faster using natural language prompts that reduce time from hours to seconds.
Process Automation
Every organization has hundreds of internal processes that are manual, time-consuming, and low value. Employees spend hours on these processes, from taking notes to copying data across platforms and creating reports, that could easily be done with AI Agents.
Most of my client work involves taking such processes and fully automating them, allowing employees to focus on higher value work. If you’re interested in this, contact me.
Building the Foundation for Agentic AI
Data Requirements
72% of CEOs say leveraging their organization’s proprietary data is key to unlocking the value of generative AI, yet 50% say their organization has disconnected technology due to the pace of recent investments.
Requirements:
Unified Data Platform: 68% say integrated enterprise-wide data architecture is critical to enable cross-functional collaboration and drive innovation
Data Quality Framework: Ensuring accuracy, completeness, and consistency
Real-time Integration: Breaking down data silos across systems
Security and Governance: Protecting sensitive information while enabling access
Talent Requirements and Organizational Readiness
Current Skills Gap: 46% of leaders identify skill gaps in their workforces as a significant barrier to AI adoption.
Essential Roles for Agentic AI:
AI Ethics Officers: Ensuring fair and transparent operations
Human-AI Collaboration Specialists: Optimizing workflows between humans and AI
AI Trainers: Teaching AI systems nuance, context, and human values
Data Scientists and ML Engineers: Building and maintaining AI systems
Training Imperatives: Nearly half of employees say they want more formal training and believe it is the best way to boost AI adoption.
Process Redesign for Human-AI Collaboration
Governance Frameworks: Only 22% of organizations that have established AI governance councils consistently track metrics related to bias detection, highlighting the need for robust oversight.
Essential Elements:
Clear policies for AI use within the business
Training on AI systems and ethical implications
Processes for evaluating and rejecting AI proposals that conflict with company values
Regular bias detection and compliance monitoring
Implementation Roadmap for Agentic AI
Phase 1: Foundation and Pilot Selection (Months 1-6)
The key to successful agentic AI implementation is starting with a clear strategy rather than jumping into the latest technology. Too many organizations are making the mistake of tool-first thinking when they should be focusing on problem-first approaches.
Begin with a comprehensive AI readiness evaluation. This means honestly assessing your current data quality, infrastructure capabilities, and organizational readiness for change.
When I work with my clients, I often start with surveys to understand the AI literacy of the organization, as well as the tech infrastructure to enable an AI transformation. This data helps us understand what skills or tech gaps we need to fill before moving ahead.
I also identify high-impact, low-risk use cases where you can demonstrate clear business value while learning how these systems work in your environment.
Download my AI Readiness Assessment
These are the same surveys I use with my clients to identify skill gaps and close them.
Phase 2: Pilot Deployment and Learning (Months 6-12)
Deloitte predicts that 25% of companies using generative AI will launch agentic AI pilots or proofs of concept in 2025, growing to 50% in 2027. The organizations that succeed will be those that approach scaling strategically rather than opportunistically.
Start with pilot projects in controlled environments where agentic AI use can be refined, then scale and integrate seamlessly into the bigger picture.
Establish clear human oversight mechanisms, regular performance monitoring, and continuous feedback loops. Most importantly, invest heavily in employee training and support during this phase.
Phase 3: Scaling and Integration (Months 12-24)
Multi-agent orchestration represents the next level of sophistication. Instead of individual AI agents working in isolation, organizations are building systems where multiple agents collaborate to handle complex, multi-step processes.
The key insight is that agentic AI works best when it’s integrated into existing workflows rather than replacing them entirely. The most successful implementations enhance human decision-making rather than eliminating it.
Measuring Impact and ROI
Only 52% of CEOs say their organization is realizing value from generative AI beyond cost reduction. This suggests that many organizations are measuring the wrong things or not measuring comprehensively enough.
Here are some KPIs I recommend measuring to test if your Agents are delivering value:
Productivity Metrics: Time saved, tasks automated, output quality
Financial Impact: Cost reduction, revenue generation, ROI calculations
Employee Satisfaction: Adoption rates, training effectiveness, job satisfaction
CEOs say 31% of the workforce will require retraining or reskilling within three years, and 54% say they’re hiring for roles related to AI that didn’t even exist a year ago.
The workforce of the AI Agent era will need skills like:
AI Literacy: Understanding capabilities, limitations, and ethical implications
Human-AI Collaboration: Working effectively alongside AI agents
Critical Thinking: Validating AI outputs and making strategic decisions
Emotional Intelligence: Areas where humans maintain comparative advantage
Continuous Learning: Adapting to rapidly evolving technology
The half-life of technical skills is shrinking rapidly, and organizations need to create cultures where learning and adaptation are continuous processes rather than occasional events.
Here are some training programs I conduct for clients:
Foundational AI concepts and applications
Hands-on experience with AI tools and platforms
Technical skills for building and managing AI agents
Culture and Change Management Considerations
Here’s an interesting statistic: 73% of executives believe their AI approach is strategic, while only 47% of employees agree. Even more concerning, 31% of employees admit to actions that could be construed as sabotaging AI efforts.
This perception gap is perhaps the biggest obstacle to successful AI transformation. And it means leaders need to build trust and adoption with their teams:
Transparent Communication: Clear explanation of AI’s role and impact
Employee Involvement: Including staff in AI design and implementation
Psychological Safety: Creating environments where concerns can be voiced
Success Stories: Demonstrating AI’s value as augmentation, not replacement
Two-thirds of C-suite executives report that generative AI adoption has led to division and tension within companies. Successful implementation requires:
Leadership commitment and visible support
Clear communication about AI’s role in the organization
Regular feedback and adjustment mechanisms
Recognition and rewards for successful AI adoption
Strategic Priorities and Competitive Implications
Microsoft recently introduced more than 50 announcements spanning its entire product portfolio, all focused on advancing AI agent technologies. Meanwhile, 32% of top executives place AI agents as the top technology trend in data and AI for 2025.
The timeline for competitive advantage is compressed. Organizations beginning their agentic AI journey now will be positioned to lead their industries, while those that delay risk being permanently disadvantaged.
Here’s a sample adoption timeline for 2025:
Q1-Q2 2025: Pilot programs and proof of concepts
Q3-Q4 2025: Limited production deployments
2026-2027: Broad enterprise adoption
2027+: Mature implementations and industry transformation
Strategic Priorities for C-Suite Leaders
1. Make Courage Your Core 64% of CEOs say they’ll have to take more risk than their competition to maintain a competitive advantage. The key is building organizational flexibility and empowering teams to experiment.
2. Embrace AI-Fueled Creative Destruction 68% of CEOs say AI changes aspects of their business that they consider core. Leaders must be willing to fundamentally rethink business models and operations.
3. Ignore FOMO, Lean into ROI 65% of CEOs say they prioritize AI use cases based on ROI. Focus on practical applications that create competitive moats and generate measurable returns.
4. Cultivate a Vibrant Data Environment Invest in unified data architectures that can support autonomous AI operations while maintaining security and governance.
5. Borrow the Talent You Can’t Buy 67% of CEOs say differentiation depends on having the right expertise in the right positions. Build partnerships to access specialized AI capabilities.
Competitive Implications of Early vs. Late Adoption
Early Adopter Advantages:
Market Positioning: Early adopters will gain a substantial advantage—but success requires a strategic and experimental approach
Talent Attraction: Access to top AI talent before market saturation
Data Advantage: More time to accumulate training data and refine models
Customer Relationships: First-mover advantage in AI-enhanced customer experiences
Risks of Late Adoption:
Competitive Disadvantage: 64% of CEOs say the risk of falling behind drives them to invest in some technologies before they have a clear understanding of the value they bring
Higher Implementation Costs: Premium for late-stage adoption
Operational Inefficiency: Competing against AI-optimized operations
Strategic Recommendations:
Start Immediately: Begin with low-risk pilot programs while building foundational capabilities
Invest in Data: Prioritize data quality and integration as the foundation for agentic AI
Build Partnerships: Collaborate with technology providers and consultants to accelerate deployment
Focus on Change Management: Invest heavily in employee training and cultural transformation
Plan for Scale: Design initial implementations with enterprise-wide scaling in mind
Conclusion: The Imperative for Action
The transition to agentic AI represents the most significant technological shift since the advent of the internet. CEOs are often pushing AI adoption faster than some employees are comfortable with, underscoring the need to lead people through the changes.
The window for strategic advantage is narrowing. By 2028, at least 15% of daily work decisions will be made autonomously by agentic AI. Organizations that begin their agentic AI journey now will be positioned to lead their industries, while those that delay risk being left behind.
Key Takeaways for C-Suite Leaders:
Agentic AI is not optional—it’s an inevitability that will reshape competitive landscapes
Success requires holistic transformation—technology, people, processes, and culture must evolve together
Early action is critical—the advantages of being among the first adopters far outweigh the risks
Human-AI collaboration is the goal—augmentation, not replacement, should guide implementation strategies
Continuous learning is essential—both for AI systems and human workers
The question isn’t whether agentic AI will transform your industry, it’s whether your organization will be leading or following that transformation.
If you want to be leading the transformation, book a free consultation call with me. I’ve worked with multiple organizations to lead them through this.
Recent research by McKinsey shows that 31% of the workforce will require retraining or reskilling within the next three years. With companies rushing to become AI-first, I’m not surprised. In fact, I think that number should be higher.
Much like digital literacy became essential in the early 2000s, AI literacy is the new baseline for workforce competence. Organizations that fail to develop AI skills will fall behind competitors who leverage AI to enhance productivity, drive innovation, and deliver superior customer experiences.
This guide offers a comprehensive roadmap for executives seeking to transform their workforce for the AI era. We’ll examine practical strategies for conducting skills gap analyses, developing talent through multiple channels, creating a learning culture, empowering change champions, and addressing AI anxiety.
Each section provides actionable frameworks backed by research and case studies, enabling you to immediately apply these approaches within your organization.
Book a free consultation
If you’re looking for customized training programs for your employees, book a free consultation call with me. I’ve trained dozens of organizations and teams on becoming AI experts.
Section 1: Conducting an AI Skills Gap Analysis
Where Do you Want to be?
Before launching any training initiative, you must first understand the specific AI-related skills your organization requires. When working with my clients, I’ve identified three categories of AI skills that companies need:
Foundational AI Literacy (All Employees)
In my opinion, this is table-stakes. Every employee in your company needs to have basic AI literacy, the same way they need to have basic computer literacy.
Understanding basic AI concepts and terminology
Recognizing appropriate use cases for AI tools
Effective prompt engineering and interaction with AI assistants
Critical evaluation of AI outputs and limitations
Awareness of ethical considerations and responsible AI use
Intermediate AI Skills (Domain Specialists)
As you go deeper into your AI transformation, you’ll want to start automating processes and integrating AI deeper into workflows. This means training a percentage of your workforce on AI automation and AI agents.
Ideally, these are domain specialists who understand the workflows well enough to design automations for them.
Ability to identify automation opportunities within specific workflows
Data preparation and quality assessment
Collaboration with technical teams on AI solution development
Integration of AI tools into existing processes
Performance monitoring and feedback provision
Advanced AI Expertise (Technical Specialists)
Finally, for organizations that are building AI products and features, the following skills are absolutely necessary.
AI ethics implementation and compliance
AI system design and implementation
Model selection, training, and fine-tuning
AI infrastructure management and optimization
Data architecture and governance for AI
Where are you now?
The next step is understanding your organization’s current AI capabilities. When working with clients, I often start with a survey to leadership and employees.
My Leadership Capability Assessment evaluates executive understanding of AI potential and limitations, and assesses their ability to develop and execute AI strategy.
My Workforce Literacy Survey measures baseline understanding of AI concepts across the organization, and assesses comfort levels with AI tools and applications.
For organizations that are building AI products and features, create a Technical Skills Inventory to document existing data science, machine learning, and AI engineering capabilities, map current technical skills against future needs, and identify training needs for different technical roles.
I also recommend an overall Organizational Readiness Assessment to evaluate data infrastructure and governance maturity, assess cross-functional collaboration capabilities, and review change management processes and effectiveness.
At this point, it becomes fairly obvious where the gaps are in where you are right now and where you want to be.
Download my Leadership capability Assessment and workforce literacy survey
Download the exact surveys I use with my clients to measure your organization’s current AI capabilities
Create A development plan
I then create a custom skills development plan to close the gap. Here’s a sample timeline I draw up for clients, although this depends heavily on how fast you move and how big your organization is.
Time Horizon
Priority Skills
Target Audience
Business Impact
0-3 months
AI literacy, foundational concepts, AI tool usage
All employees
Improved AI adoption, reduced resistance
3-6 months
Role-specific AI applications, workflow integration
Department leaders, domain experts
Process optimization, efficiency gains
6-12 months
Advanced AI development, AI system design, AI ethics implementation
Technical specialists, innovation teams
New product/service development, competitive differentiation
12+ months
Emerging AI capabilities, human-AI collaboration, AI governance
Executive leadership, strategic roles
Business model transformation, market leadership
I suggest running the skills gap analysis every quarter and re-evaluating. The pace at which AI is developing requires continuous up-skilling at training in the latest technologies.
Section 2: The Build, Buy, Bot, Borrow Model for AI Talent
As your organization develops its AI capabilities, you’ll need a multi-pronged approach to talent acquisition and development. The “Build, Buy, Bot, Borrow” framework offers a comprehensive strategy for addressing AI talent needs. This model provides flexibility while ensuring you have the right capabilities at the right time.
Building Internal Talent Through Training and Development
Internal talent development should be your cornerstone strategy, as it leverages existing institutional knowledge while adding new capabilities. Develop an organizational learning strategy that includes:
Tiered Learning Programs
Level 1: AI Fundamentals – Basic AI literacy for all employees
Level 2: AI Applications – Role-specific training on using AI tools
Level 3: AI Development – Specialized technical training for selected roles
Level 4: AI Leadership – Strategic AI implementation for executives and managers
Experiential Learning Opportunities
AI hackathons and innovation challenges
Rotation programs with AI-focused teams
Mentorship from AI experts
Applied learning projects with measurable outcomes
Learning Ecosystems
On-demand microlearning resources
Self-paced online courses and certifications
Cohort-based intensive bootcamps
Executive education partnerships
Many organizations are finding that the “build” strategy offers the best long-term return on investment. I’ll dive deeper into how to build AI talent in later sections.
Strategic Hiring for Specialized AI Roles
Despite your best efforts to build internal talent, some specialized AI capabilities may need to be acquired through strategic hiring. This includes AI/ML engineers, data scientists, and AI integration specialists.
To develop an effective hiring strategy for AI roles:
Focus on specialized competencies rather than general AI knowledge
Identify the specific AI capabilities required for your business objectives (from skills gap above)
Create detailed skill profiles for each specialized role
Develop targeted assessment methods to evaluate candidates
Look beyond traditional sources of talent
Partner with universities and research institutions with strong AI programs
Engage with AI communities and open-source projects
Consider talent from adjacent fields with transferable skills
Create an AI-friendly work environment
Provide access to high-performance computing resources
Establish clear AI ethics and governance frameworks
Support ongoing professional development in rapidly evolving AI domains
Build a culture that values AI innovation and experimentation
Develop competitive compensation strategies
Create flexible compensation packages that reflect the premium value of AI expertise
Consider equity or profit-sharing for roles that directly impact business outcomes
Offer unique perks valued by the AI community, such as conference attendance or research time
Using AI to Augment Existing Workforce Capabilities
The “bot” aspect of the framework involves strategic deployment of AI tools, automations, and agents to amplify the capabilities of your existing workforce. This approach offers several advantages:
AI agents can handle routine tasks, freeing employees to focus on higher-value work
AI tools can provide just-in-time knowledge, enabling employees to access specialized information when needed
AI can augment decision-making, helping employees make more informed choices
Implement these strategies to effectively leverage AI for workforce augmentation:
AI Agents
Map existing processes to identify routine, time-consuming tasks suitable for AI automation
Deploy AI agents for common tasks like scheduling, report generation, and data summarization
Create seamless handoffs between AI and human components of workflows
Knowledge Augmentation
Implement AI-powered knowledge bases that can answer domain-specific questions
Deploy contextual AI assistants that provide relevant information during decision-making processes
Create AI-guided learning paths that help employees develop new skills
Decision Support
Develop AI models that can analyze complex data and provide recommendations
Implement scenario-planning tools that help employees visualize potential outcomes
Create AI-powered dashboards that provide real-time insights into business performance
I highly recommend developing AI automations and agents in parallel with employee up-skilling programs. Low hanging automations can be deployed in weeks and provide immediate benefits.
This is why so many major tech companies are going all in on agents and have paused hiring. If you’re interested in how to find opportunities to do this in your organization and design effective agents, read my guide here.
Borrowing Talent through Strategic Partnerships
The final component of the talent strategy involves “borrowing” specialized AI capabilities through strategic partnerships. This approach is particularly valuable for accessing scarce expertise or handling short-term needs.
Strategic Vendor Relationships
Evaluate AI platform providers based on their domain expertise, not just their technology
Develop deep partnerships with key vendors that include knowledge transfer components
Create joint innovation initiatives with strategic technology partners
Consulting and Professional Services
Engage specialized AI consultants for specific, high-value projects
Use professional services firms to accelerate implementation of AI initiatives
Partner with boutique AI firms that have deep expertise in your industry
Academic and Research Partnerships
Collaborate with university research labs on cutting-edge AI applications
Sponsor academic research in areas aligned with your strategic priorities
Participate in industry consortia focused on AI standards and best practices
Talent Exchanges
Create temporary talent exchange programs with non-competing organizations
Develop rotational programs with technology partners
Participate in open innovation challenges to access diverse talent pools
The borrowed talent approach offers several advantages:
Access to specialized expertise that would be difficult or expensive to develop internally
Flexibility to scale AI capabilities up or down based on business needs
Exposure to diverse perspectives and industry best practices
Reduced risk in exploring emerging AI technologies
By strategically combining the build, buy, bot, and borrow approaches, organizations can develop a comprehensive AI talent strategy that provides both depth in critical areas and breadth across the organization.
Download my Leadership capability Assessment and workforce literacy survey
Download the exact surveys I use with my clients to measure your organization’s current AI capabilities
Section 3: Creating an AI Learning Culture
Let’s dive into how you can up-skill employees and build AI talent internally, as I mentioned above.
AI training cannot follow a one-size-fits-all approach. Different roles require different types and levels of AI knowledge and skills. From my client work, I have identified three primary audience segments:
Executive Leadership
Focus Areas: Strategic AI applications, ethical considerations, governance, ROI measurement
Format Preferences: Executive briefings, peer discussions, case studies
Key Outcomes: Ability to set AI strategy, evaluate AI investments, and lead organizational change
Managers and Team Leaders
Focus Areas: Identifying AI use cases, managing AI-enabled teams, process redesign
Format Preferences: Applied workshops, collaborative problem-solving, peer learning
Key Outcomes: Ability to identify AI opportunities, guide implementation, and support team adoption
Individual Contributors
Focus Areas: Hands-on AI tools, domain-specific applications, ethical use of AI
Format Preferences: Interactive tutorials, practical exercises, on-the-job application
Key Outcomes: Proficiency with relevant AI tools, ability to integrate AI into daily workflows
For each segment, design targeted learning experiences that address their specific needs and preferences. Here’s an example of what I recommend to clients:
Level
Executive Leadership
Managers / Team Leaders
Individual Contributors
Basic
AI Strategy Overview (2 hours)
AI for Team Leaders (2 hours)
AI Fundamentals (2 hours)
Intermediate
AI Governance Workshop (2 hours)
AI Use Case Design (4 hours)
AI Tools Bootcamp (8 hours)
Advanced
AI Investment Roundtable (2 hours)
AI-Enabled Transformation (8 hours)
Domain-Specific AI Training (8 hours)
But AI training does not stop there. AI is always evolving so a one-time training program is insufficient. Many organizations struggle with the pace of changes in AI, with capabilities evolving faster than organizations can adapt.
This means you need to foster a continuous learning mindset:
Leadership Modeling
Executives should openly share their own AI learning journeys
Leaders should participate in AI training alongside team members
Management should recognize and reward ongoing skill development
Learning Infrastructure
Create dedicated time for AI learning (e.g., “Learning Fridays”)
Develop peer learning communities around AI topics
Establish AI learning hubs that curate and share relevant resources
Growth Mindset Development
Promote the belief that AI capabilities can be developed through effort
Encourage experimentation and learning from failures
Recognize improvement and progress, not just achievement
I’ve found it’s a lot easier to create and maintain an AI learning culture when there are champions and go-to experts in the organization driving this culture.
I often advise clients to identify these AI champions and empower them by creating AI leadership roles, providing them with advanced training and resources, and creating a clear mandate that defines their responsibility for driving AI adoption.
These AI champions should be included in AI strategy development, use case and implementation approaches, and vendor selection and evaluation processes.
Other ways to sustain this learning culture and increase AI adoption that have worked well for my clients are:
Incentivizing AI adoption through recognition programs, and financial incentives
Creating mentorship programs and group learning cohorts within the company
Establish communities based on specific business functions (marketing AI, HR AI, etc.)
Implement hackathons and innovation challenges
Create knowledge repositories for AI use cases and lessons learned
Section 4: Addressing AI Anxiety and Resistance
Despite growing enthusiasm for AI, 41% of employees remain apprehensive about its implementation. Understanding these concerns is essential for effective intervention.
Key factors driving AI anxiety include:
Fear of Job Displacement – Concerns about automation replacing human rolesand uncertainty about future career paths
Security and Privacy Concerns – Worries about data protection and cybersecurity risks
Performance and Reliability Issues – Skepticism about AI accuracy and reliability and fears of over-reliance on imperfect systems
Skills and Competency Gaps – Concerns about keeping pace with change
One of the most effective ways to allay these fears is to demonstrate how the technology augments human capabilities rather than replacing them. This approach shifts the narrative from job displacement to job enhancement.
Pilot Projects with Visible Benefits
Implement AI solutions that address known pain points
Focus initial applications on automating tedious, low-value tasks
Showcase how AI frees up time for more meaningful work
Skills Enhancement Programs
Develop training that shows how AI can enhance professional capabilities
Create clear pathways for employees to develop new, AI-complementary skills
Emphasize the increased value of human judgment and creativity in an AI-enabled environment
Role Evolution Roadmaps
Work with employees to envision how their roles will evolve with AI
Create transition plans that map current skills to future requirements
Provide examples of how similar roles have been enhanced by AI in other organizations
Shared Success Metrics
Develop metrics that track both AI performance and human success
Share how AI implementation impacts team and individual objectives
Create incentives that reward effective human-AI collaboration
A common pitfall is focusing too narrowly on productivity gains. The McKinsey report notes that “If CEOs only talk about productivity they’ve lost the plot,” suggesting that organizations should emphasize broader benefits like improved customer experience, new growth opportunities, and enhanced decision-making.
Conclusion: Implementing an Enterprise-Wide Upskilling Initiative
Timeline for Implementation
Creating an AI-ready workforce requires a structured, phased approach. Here’s a sample timeline I’ve implemented for my clients:
Phase 1: Assessment and Planning (1 months)
Conduct an AI skills gap analysis across the organization
Develop a comprehensive upskilling strategy aligned with business objectives
Build executive sponsorship and secure necessary resources
Establish baseline metrics for measuring progress
Phase 2: Infrastructure and Pilot Programs (2-3 months)
Identify and train initial AI champions across departments
Launch pilot training programs with high-potential teams
Collect feedback and refine approach based on early learnings
Phase 3: Scaled Implementation (3-6 months)
Roll out tiered training programs across the organization
Activate formal mentorship programs and communities of practice
Implement recognition systems for AI skill development
Begin integration of AI skills into performance management processes
Phase 4: Sustainability and Evolution (6+ months)
Establish continuous learning mechanisms for emerging AI capabilities
Develop advanced specialization tracks for technical experts
Create innovation programs to apply AI skills to business challenges
Regularly refresh content and approaches based on technological evolution
This phased approach allows organizations to learn and adapt as they go, starting with focused efforts and expanding based on successful outcomes. The timeline above is very aggressive and may need adjustment based on organizational size, industry complexity, and the current state of AI readiness.
Key Performance Indicators for Measuring Workforce Readiness
To evaluate the effectiveness of AI upskilling initiatives, organizations should establish a balanced set of metrics that capture both learning outcomes and business impact. Based on my client work, I’ve found that KPIs should include:
Learning and Adoption Metrics
Percentage of employees completing AI training by role/level
AI tool adoption rates across departments
Number of AI use cases identified and implemented by teams
Employee self-reported confidence with AI tools
Operational Metrics
Productivity improvements in AI-augmented workflows
Reduction in time spent on routine tasks
Quality improvements in AI-assisted processes
Decrease in AI-related support requests over time
Business Impact Metrics
Revenue generated from AI-enabled products or services
Cost savings from AI-enabled process improvements
Customer experience improvements from AI implementation
Innovation metrics (number of new AI-enabled offerings)
Cultural and Organizational Metrics
Employee sentiment toward AI (measured through surveys)
Retention rates for employees with AI skills
Internal mobility of employees with AI expertise
Percentage of roles with updated AI skill requirements
Organizations should establish baseline measurements before launching upskilling initiatives and track progress at regular intervals.
Long-term Talent Strategy Considerations
As organizations look beyond immediate upskilling needs, several strategic considerations emerge for long-term AI talent management:
Evolving Skill Requirements
Regularly reassess AI skill requirements as technology evolves
Develop capabilities to forecast emerging skills needs
Create flexible learning systems that can quickly incorporate new content
Talent Acquisition Strategy
Redefine job descriptions and requirements to attract AI-savvy talent
Develop AI skills assessment methods for hiring processes
Create compelling employee value propositions for technical talent
Career Path Evolution
Design new career paths that incorporate AI expertise
Create advancement opportunities for AI specialists
Develop hybrid roles that combine domain expertise with AI capabilities
Organizational Structure Adaptation
Evaluate how AI impacts traditional reporting relationships
Consider new organizational models that optimize human-AI collaboration
Develop governance structures for AI development and deployment
Cultural Transformation
Foster a culture that values continuous learning and adaptation
Promote cross-functional collaboration around AI initiatives
Build ethical frameworks for responsible AI use
Final Thoughts
AI is going to shock the system in an even bigger way than computers or the internet. So creating an AI-ready workforce requires a comprehensive organizational transformation.
By conducting thorough skills gap analyses, implementing the “build, buy, bot, borrow” model for talent development, creating a continuous learning culture, and addressing AI anxiety with empathy and transparency, organizations can position themselves for success in the AI era.
I’ve worked with dozens of organizations to help them with this. Book me for a free consultation call and I can help you too.
At the start of this year, Jensen Huang, CEO of Nvidia, said 2025 will be the year of the AI agent. Many high-profile companies like Shopify and Duolingo have reinvented themselves with AI at it score, building internal systems and agents to automate processes and reduce headcount.
I spent the last 3 years running a Venture Studio that built startups with AI at the core. Prior to that I built one of the first AI companies on GPT-3. And now I consult for companies on AI implementation. Whether you’re a business leader looking to automate complex workflows or an engineer figuring out the nuts and bolts, this guide contains the entire process I use with my clients.
The purpose of this guide is to help you identify where agents will be useful in your organization, and how to design them to produce real business results. Much like you design a product before building it, this should be your first starting point before building an agent.
Let us begin.
PS – I’ve put together a 5-day email course where I walk through designing and implementing a live AI agent using no-code tools. Sign up below.
What Makes a System an “Agent”?
No, that automation you built with Zapier is not an AI agent. Neither is the chatbot you have on your website.
An AI agent is a system that independently accomplishes tasks on your behalf with minimal supervision. Unlike passive systems that just respond to queries or execute simple commands, agents proactively make decisions and take actions to accomplish goals.
Think of it like a human intern or an analyst. It can do what they can, except get you coffee.
How do they do this? There are 4 main components to an AI agent – the model, the instructions, the tools, and the memory. We’ll go into more detail later on, but here’s a quick visual on how they work.
The model is the core component. This is an AI model like GPT, Claude, Gemini or whatever, and it starts when it is invoked or triggered by some action.
Some agents get triggered by a chat or phone call. You’ve probably come across these. Others get triggered when a button is clicked or a form is submitted. Some even get triggered through a cron job at regular intervals, or an API call from another app.
For example, this content creation agent I built for a VC fund gets triggered when a new investment memo is uploaded to a form.
When triggered, the model uses the instructions it has been given to figure out what to do. In this case, the instructions tell it to analyze the memo, research the company, remove sensitive data, and convert it into a blog post.
To do this, the agent has access to tools such as a web scraper that finds information about the company. It loops through these tools and finally produces a blog post, using its memory of the fund’s past content to write in their tone and voice.
You can see how this is different from a regular automation, where you define every step. Even if you use AI in your automation, it’s one step in a sequence. With an agent, the AI forms the central component, decides which steps to performs, and then loops through them until the job is done.
We’ll cover how to structure these components and create that loop later. But first…
Do You really need an AI agent?
Most of the things you want automated don’t really need an AI agent. You can trigger email followups, schedule content, and more through basic automation tools.
Rule of thumb, if a process can be fully captured in a flowchart with no ambiguity or judgment calls, traditional automation is likely more efficient and far more cost-effective.
I also generally advise against building AI agents for high-stakes decisions where an error could be extremely costly, or there’s a legal requirement to provide explainability and transparency.
When you exclude processes that are too simple or too risky, you’re left with good candidates for AI Agents. These tend to be:
Processes where you have multiple variables, shifting context, plenty of edge cases, or decision criteria that can’t be captured with rules. Customer refund approvals are a good example.
Processes that resemble a tangled web of if-then statements with frequent exceptions and special cases, like vendor security reviews.
Processes that involve significant amounts of unstructured data, like natural language understanding, reading documents, analyzing text or images, and so on. Insurance claims processing is a good example.
A VC fund I worked with wanted to automate some of their processes. We excluded simple ones like pitch deck submission (can be done through a Typeform with CRM integration), and high-stakes ones like making the actual investment decisions.
We then built AI agents to automate the rest, like a Due Diligence Agent (research companies, founders, markets, and competition, to build a thorough investment memo) and the content generation agent I mentioned earlier.
Practical Identification Process
To systematically identify agent opportunities in your organization, follow this process:
Catalog existing processes
Document current workflows, especially those with manual steps
Note pain points, bottlenecks, and error-prone activities
Identify processes with high volume or strategic importance
Evaluate against the criteria above
Score each process on complexity, reasoning requirements, tool access, etc.
Eliminate clear mismatches (too simple, too risky, etc.)
Prioritize high-potential candidates
Assess feasibility
Review available data and system integrations
Evaluate current documentation and process definitions
Consider organizational readiness and potential resistance
Build capabilities and confidence with each implementation
Remember that the best agent implementations often start with a clear problem to solve rather than a technology looking for an application.
Contact me if you need help with this
I offer free process audits to help companies identify where they can build agents and reduce wasted time. Book a time with me here.
Agent Architecture & Design Principles
Remember that loop I mentioned earlier? That’s our architecture pattern and will tell us how to select and connect the 4 core components of our agent.
Simple “Loop and Fetch” Architecture
The most basic agent architecture follows a straightforward loop:
Receive input (from a user or another system or a cron job)
Process the input using an AI model (with guidance from instructions)
Determine the next action (respond directly or call a tool)
Execute the action (use memory if needed)
Observe the result (check against instructions)
Loop back to step 2
This pattern works well for simpler agents with limited tool sets and straightforward workflows. It’s easy to implement and reason about, making it a good starting point for many projects and it’s the one I used for the content agent I mentioned.
Here’s a conceptual example:
Plaintext
function runAgent(input, context) { while (true) { // Process input with LLM const llmResponse = model.process(input, context) // Check if the LLM wants to use a tool if (llmResponse.hasTool) { // Execute the tool const toolResult = executeTool(llmResponse.tool, llmResponse.parameters) // Update context with the result context.addToolResult(toolResult) // Continue the loop with the tool result as new input input = toolResult } else { // No tool needed, return the response return llmResponse.message } }}
ReAct-Style Reasoning Frameworks
ReAct (Reasoning and Acting) frameworks enhance the basic loop with more explicit reasoning steps. Rather than immediately jumping to actions, the agent follows a more deliberate process:
Thought: Reason about the current state and goal
Action: Decide on a specific action to take
Observation: Observe the result of the action
Repeat: Continue this cycle until the goal is achieved
The key difference between this and the simple loop is the agent thinks explicitly about each step, making its reasoning more transparent and often leading to better decision-making for complex tasks. This is the architecture often used in research agents, like the Deep Research feature in Gemini and ChatGPT.
I custom-built this for a SaaS client that was spending a lot of time on research for their long-form blog content –
Hierarchical Planning Structures
For more complex workflows, hierarchical planning separates high-level strategy from tactical execution:
A top-level planner breaks down the overall goal into major steps
Each step might be further decomposed into smaller tasks
Execution happens at the lowest level of the hierarchy
Results flow back up, potentially triggering replanning
This architecture excels at managing complex, multi-stage workflows where different levels of abstraction are helpful. For example, a document processing agent might:
At the highest level, plan to extract information, verify it, and generate a report
At the middle level, break “extract information” into steps for each document section
At the lowest level, execute specific extraction tasks on individual paragraphs
Memory-Augmented Frameworks
Memory-augmented architectures extend basic agents with sophisticated memory systems:
Before processing input, the agent retrieves relevant information from memory
The retrieved context enriches the agent’s reasoning
After completing an action, the agent updates its memory with new information
This approach is particularly valuable for:
Personalized agents that adapt to individual users over time
Knowledge-intensive tasks where retrieval of relevant information is critical
Interactions that benefit from historical context
Multi-Agent Cooperative Systems
Sometimes the most effective approach involves multiple specialized agents working together:
A coordinator agent breaks down the overall task
Specialized agents handle different aspects of the workflow
Results are aggregated and synthesized
The coordinator determines next steps or delivers final outputs
This architecture works well when different parts of a workflow require substantially different capabilities or tool sets. For example, a customer service system might employ:
A documentation agent to retrieve relevant resources
A triage agent to understand initial requests
A technical support agent for product issues
A billing specialist for financial matters
If this is your first agent, I suggest starting with the simple loop architecture. I find it helps to sketch out the process, starting with what triggers our agent, what the instructions should be, what tools it has access to, if it needs memory, and what the final output looks like.
I show you how to implement this in my 5-day Challenge.
Core Components of Effective Agents
As I said earlier, every effective agent, regardless of implementation details, consists of four fundamental layers:
1. The Model Layer: The “Brain”
This is the large language models that provide the reasoning and decision-making capabilities. These models:
Process and understand natural language inputs
Generate coherent and contextually appropriate responses
Apply complex reasoning to solve problems
Make decisions about what actions to take next
Different agents may use different models or even multiple models for different aspects of their workflow. A customer service agent might use a smaller, faster model for initial triage and a more powerful model for complex problem-solving.
2. The Tool Layer: The “Hands”
Tools extend an agent’s capabilities by connecting it to external systems and data sources. These might include:
Data tools: Database queries, knowledge base searches, document retrieval
Orchestration tools: Coordination with other agents or services
Tools are the difference between an agent that can only talk about doing something and one that can actually get things done.
3. The Instruction Layer: The “Rulebook”
Instructions and guardrails define how an agent behaves and the boundaries within which it operates. This includes:
Task-specific guidelines and procedures
Ethical constraints and safety measures
Error handling protocols
User preference settings
Clear instructions reduce ambiguity and improve agent decision-making, resulting in smoother workflow execution and fewer errors. Without proper instructions, even the most sophisticated model with the best tools will struggle to deliver consistent results.
4. Memory Systems: The “Experience”
Memory is crucial for agents that maintain context over time:
Short-term memory: Tracking the current state of a conversation or task
Long-term memory: Recording persistent information about users, past interactions, or domain knowledge
Memory enables agents to learn from experience, avoid repeating mistakes, and provide personalized service based on historical context.
The next few sections covers the strategy behind these components, plus two additional considerations – guardrails, and error handling.
Model Selection Strategy
Not every task requires the most advanced (and expensive) model available. You need to balance capability, cost, and latency requirements for your specific use case.
Capability Assessment
Different models have different strengths. When evaluating models for your agent:
Start with baseline requirements:
Understanding complex instructions
Multi-step reasoning capabilities
Contextual awareness
Tool usage proficiency
Consider specialized capabilities needed:
Code generation and analysis
Mathematical reasoning
Multi-lingual support
Domain-specific knowledge
Assess the complexity of your tasks:
Simple classification or routing might work with smaller models
Complex decision-making typically requires more advanced models
Multi-step reasoning benefits from models with stronger planning abilities
For example, a customer service triage agent might effectively use a smaller model to categorize incoming requests, while a coding agent working on complex refactoring tasks would benefit from a more sophisticated model with strong reasoning capabilities and code understanding.
Creating a Performance Baseline
A proven approach is to begin with the most capable model available to establish a performance baseline:
Start high: Build your initial prototype with the most advanced model
Define clear metrics: Establish concrete measures of success
Test thoroughly: Validate performance across a range of typical scenarios
Document the baseline: Record performance benchmarks for comparison
This baseline represents the upper limit of what’s currently possible and provides a reference point for evaluating tradeoffs with smaller or more specialized models.
Optimization Strategy
Once you’ve established your baseline, you can optimize by testing smaller, faster, or less expensive models:
Identify candidate models: Select models with progressively lower capability/cost profiles
Comparative testing: Evaluate each candidate against your benchmark test set
Analyze performance gaps: Determine where and why performance differs
Make informed decisions: Choose the simplest model that meets your requirements
This methodical approach helps you find the optimal balance between performance and efficiency without prematurely limiting your agent’s capabilities.
Multi-Model Architecture
For complex workflows, consider using different models for different tasks within the same agent system:
Smaller, faster models for routine tasks (classification, simple responses)
Medium-sized models for standard interactions and decisions
Larger, more capable models for complex reasoning, planning, or specialized tasks
For example, an agent might use a smaller model for initial user intent classification, then invoke a larger model only when it encounters complex requests requiring sophisticated reasoning.
This tiered approach can significantly reduce average costs and latency while maintaining high-quality results for challenging tasks.
My Default Models
I find myself defaulting to a handful of models, at least when starting out, before optimizing the agent:
Reasoning – OpenAI o3 or Gemini 2.5 Pro
Data Analysis – Gemini 2.5 Flash
Image Generation – GPT 4o
Code Generation – Gemini 2.5 Pro
Content Generation – Claude 3.7 Sonnet
Triage – GPT 3.5 Turbo or Gemini 2.0 Flash-Lite (hey I don’t make the names ok)
Every model provider has a Playground where you can test the models. Start there if you’re not sure which one to pick.
Tool Definition Best Practices
Tools extend your agent’s capabilities by connecting it to external systems and data sources. Well-designed tools are clear, reliable, and reusable across multiple agents.
Tool Categories and Planning
When planning your agent’s tool set, consider the three main categories of tools it might need:
Data Tools: Enable agents to retrieve context and information
Database queries – Eg: find a user’s profile information
Document retrieval – Eg: get the latest campaign plan
Search capabilities – Eg: search through emails
Knowledge base access – Eg: Find the refund policy
Action Tools: Allow agents to interact with systems and take actions
Sending messages – Eg: send a Slack alert
Updating records – Eg: change the user’s profile
Creating content – Eg: generate an image
Managing resources – Eg: give access to some other tool
Initiating processes – Eg: Trigger another process or automation
Orchestration Tools: Connect agents to other agents or specialized services
Expert consultations – Eg: connect to a fine-tuned medical model
Specialized analysis – Eg: handoff to a reasoning model for data analysis
Delegated sub-tasks – Eg: Handoff to a content generation agent
A well-rounded agent typically needs tools from multiple categories to handle complex workflows effectively.
Designing Effective Tool Interfaces
Tool design has a significant impact on your agent’s ability to use them correctly. Follow these guidelines:
Clear naming: Use descriptive, task-oriented names that indicate exactly what the tool does
Comprehensive descriptions: Provide detailed documentation about:
The tool’s purpose and when to use it
Required parameters and their formats
Expected outputs and potential errors
Limitations or constraints to be aware of
Focused functionality: Each tool should do one thing and do it well
Prefer multiple specialized tools over single complex tools
Maintain a clear separation of concerns
Simplify parameter requirements for each individual tool
Consistent patterns: Apply consistent conventions across your tool set
Standardize parameter naming and formats
Use similar patterns for related tools
Maintain consistent error handling and response structures
Here’s an example of a well-defined tool:
Plaintext
@function_tooldef search_customer_orders(customer_id: str, status: Optional[str] = None, start_date: Optional[str] = None, end_date: Optional[str] = None) -> List[Order]: """ Search for a customer's orders with optional filtering. Parameters: - customer_id: The unique identifier for the customer (required) - status: Optional filter for order status ('pending', 'shipped', 'delivered', 'cancelled') - start_date: Optional start date for filtering orders (format: YYYY-MM-DD) - end_date: Optional end date for filtering orders (format: YYYY-MM-DD) Returns: A list of order objects matching the criteria, each containing: - order_id: Unique order identifier - date: Date the order was placed - items: List of items in the order - total: Order total amount - status: Current order status Example usage: search_customer_orders("CUST123", status="shipped") search_customer_orders("CUST123", start_date="2023-01-01", end_date="2023-01-31") """ # Implementation details here
Crafting Effective Instructions
Instructions form the foundation of agent behavior. They define goals, constraints, and expectations, guiding how the agent approaches tasks and makes decisions.
Effective instructions (aka prompt engineering) follow these core principles:
Clarity over brevity: Be explicit rather than assuming the model will infer your intent
Structure over freeform: Organize instructions in logical sections with clear headings
Examples over rules: Demonstrate desired behaviors through concrete examples
Specificity over generality: Address common edge cases and failure modes directly
All of this is to say, the more precise and detailed you can be with instructions, the better. It’s like creating a SOP for an executive assistant.
In fact, I often start with existing documentation and resources like operating procedures, sales or support scripts, policy documents, and knowledge base articles when creating instructions for agents in business contexts.
I’ll turn them into LLM-friendly instructions with clear actions, decision criteria, and expected outputs.
For example, converting a customer refund policy into agent instructions might look like this:
Original policy: “Refunds may be processed for items returned within 30 days of purchase with a valid receipt. Items showing signs of use may receive partial refunds at manager discretion. Special order items are non-refundable.”
Agent-friendly instructions:
Plaintext
When processing a refund request:1. Verify return eligibility: - Check if the return is within 30 days of purchase - Confirm the customer has a valid receipt - Determine if the item is a special order (check the "special_order" flag in the order details)2. Assess item condition: - If the item is unopened and in original packaging, proceed with full refund - If the item shows signs of use or opened packaging, classify as "partial refund candidate" - If the item is damaged beyond normal use, classify as "potential warranty claim"3. Determine refund amount: - For eligible returns in new condition: Issue 100% refund of purchase price - For "partial refund candidates": Issue 75% refund if within 14 days, 50% if 15-30 days - For special order items: Explain these are non-refundable per policy - For potential warranty claims: Direct to warranty process4. Process the refund: - For amounts under $50: Process automatically - For amounts $50-$200: Request supervisor review if partial refund - For amounts over $200: Escalate to manager
You’re not going to get this right on the first shot. Instead, it is an iterative process:
Start with draft instructions based on existing documentation
Test with realistic scenarios to identify gaps or unclear areas
Observe agent behavior and note any deviations from expected actions
Refine instructions to address observed issues by adding in edge cases or missing information
Repeat until performance meets requirements
I cover these concepts in my 5-Day AI Agent Challenge. Sign up here.
Memory Systems Implementation
Effective memory implementation is crucial for agents that maintain context over time or learn from experience.
Short-term memory handles the immediate context of the current interaction:
Conversation history: Recent exchanges between user and agent
Current task state: The agent’s progress on the active task
Working information: Temporary data needed for the current interaction
For most agents, this context is maintained within the conversation window, though you may need to implement summarization or pruning strategies as conversations grow longer.
Long-term memory preserves information across sessions:
User profiles: Preferences, history, and specific needs
Learned patterns: Recurring issues or successful approaches
Domain knowledge: Accumulated expertise and background information
Whatever method you use to store memory, you need a smart retrieval mechanism because you’re going to be adding all that data to the context window of your agent’s core model or tools:
Relevance filtering: Surface only information pertinent to the current context
Recency weighting: Prioritize recent information when appropriate
Semantic search: Find conceptually related information even with different wording
Hierarchical retrieval: Start with general context and add details as needed
Well-designed retrieval keeps memory useful without overwhelming the agent with irrelevant information or taking up space in the context window.
Privacy and Data Management
Ensuring your agent can’t mishandle data, access the wrong type of data, or reveal data to users is extremely important for obvious reasons. I could write a whole blog post about this.
In most cases, having really good tool design, plus guardrails and safety mechanisms (next section) ensures privacy and data, but here are some things to think about:
Retention policies: Define how long different types of information should be kept
Anonymization: Remove identifying details when full identity isn’t needed
Access controls: Limit who (or what) can access stored information
User control: Give users visibility into what’s stored and how it’s used
Guardrails and Safety Mechanisms
Even the best-designed agents need guardrails to ensure they operate safely and appropriately. Guardrails are protective mechanisms that define boundaries, prevent harmful actions, and ensure the agent behaves as expected.
A good strategy takes a layered approach, so if one layer fails, others can still prevent potential issues. Start with setting clear boundaries while defining the agent’s instructions in the previous section.
You can then add some input validation to process user requests to the agent and identify if it’s out of scope or potentially harmful (like a jailbreak).
Python
@input_guardraildefsafety_guardrail(ctx, agent, input):# Check input against safety classifier safety_result = safety_classifier.classify(input)if safety_result.is_unsafe:# Return a predefined response instead of processing the inputreturn GuardrailFunctionOutput(output="I'm not able to respond to that type of request. Is there something else I can help you with?",tripwire_triggered=True )# Input is safe, continue normal processingreturn GuardrailFunctionOutput(tripwire_triggered=False )
Output guardrails verify the agent’s responses before they reach the user, to flag PII (personally identifiable information) or inappropriate content:
Python
@output_guardraildefpii_filter_guardrail(ctx, agent, output):# Check for PII in the output pii_result = pii_detector.scan(output)if pii_result.has_pii:# Redact PII from the output redacted_output = pii_detector.redact(output)return GuardrailFunctionOutput(output=redacted_output,tripwire_triggered=True )# Output is cleanreturn GuardrailFunctionOutput(tripwire_triggered=False )
Also ensure you have guardrails on tool usage, especially if these tools are used to change data, trigger a critical process, or something that requires permissions or approvals.
Python
@output_guardraildefpii_filter_guardrail(ctx, agent, output):# Check for PII in the output pii_result = pii_detector.scan(output)if pii_result.has_pii:# Redact PII from the output redacted_output = pii_detector.redact(output)return GuardrailFunctionOutput(output=redacted_output,tripwire_triggered=True )# Output is cleanreturn GuardrailFunctionOutput(tripwire_triggered=False )
Human-in-the-Loop Integration
I always recommend a human-in-the-loop to my clients, especialy for high-risk operations. Here are some ways to build that in:
Feedback integration: Incorporate human feedback to improve agent behavior
Approval workflows: Route certain actions for human review before execution
Sampling for quality: Review a percentage of agent interactions for quality control
Escalation paths: Define clear processes for when and how to involve humans
Error Handling and Recovery
Even the best agents will encounter errors and unexpected situations. When you test your agent, first identify and isolate where the error is coming from:
Input errors: Problems with user requests (ambiguity, incompleteness)
Tool errors: Issues with external systems or services
Processing errors: Problems in the agent’s reasoning or decision-making
Resource errors: Timeouts, memory limitations, or quota exhaustion
Based on the error type, the agent can apply appropriate recovery strategies. Ideally, agents should be able to recover from minor errors through self-correction:
Validation loops: Check results against expectations before proceeding
Retry strategies: Attempt failed operations again with adjustments
Alternative approaches: Try different methods when the primary approach fails
Graceful degradation: Fall back to simpler capabilities when advanced ones fail
For example, if a database query fails, the agent might retry with a more general query, or fall back to cached information. Beyond that, you may want to build out alert systems and escalation paths to human employees, and explain the limitation to the user.
Testing Your Agent
Now that you have all the pieces of the puzzle, it’s time to test the agent.
Testing AI agents fundamentally differs from testing traditional software. While conventional applications follow deterministic paths with predictable outputs, agents exhibit non-deterministic behavior that can vary based on context, inputs, and implementation details.
This leads to challenges that are unique to AI agents, such as hallucinations, bias, prompt injections, inefficient loops, and more.
Unit Testing Components
Test individual modules independently (models, tools, memory systems, instructions)
Verify tool functionality, error handling, and edge cases
Example: A financial advisor agent uses a stock price tool. Unit tests would verify the tool returns correct data for valid tickers, handles non-existent tickers gracefully, and manages API failures appropriately, all without involving the full agent.
Integration Testing
Test end-to-end workflows in simulated environments
Verify components work together correctly
Example: An e-commerce support agent integration test would validate the complete customer journey, from initial inquiry about a delayed package through tracking lookup, status explanation, and potential resolution options, ensuring all tools and components work together seamlessly.
Security Testing
Security testing probes the agent’s resilience against misuse or manipulation.
Instruction override attempts: Try to make the agent ignore its guidelines
Parameter manipulation: Attempt to pass invalid or dangerous parameters to tools
Context contamination: Try to confuse the agent with misleading context
Jailbreak testing: Test known techniques for bypassing guardrails
Example: Security testing for a healthcare agent would include attempts to extract patient data through crafted prompts, testing guardrails against medical misinformation, and verifying that sensitive information isn’t retained or leaked.
Hallucination Testing
Compare responses against verified information
Check source attribution and citation practices
Example: A financial advisor agent might be tested against questions with known answers about market events, company performance, and financial regulations, verifying accuracy and appropriate expressions of uncertainty for projections or predictions.
Performance and Scalability Testing
Performance testing evaluates how well the agent handles real-world conditions and workloads.
Response time: Track how quickly the agent processes requests
Model usage optimization: Track token consumption and model invocations
Cost per transaction: Calculate average cost to complete typical workflows
These are just a few tests and error types to keep in mind and should be enough for basic agents.
As your agent grows more complex, you’ll need a more comprehensive testing and evaluation framework, which I’ll cover in a later blog post. Sign up to my emails to stay posted.
Deploying, Monitoring, and Improving Your Agent
The final piece is to deploy your agent, see how it performs in the real-world, collect feedback, and improve it over time.
Deploying agents depends heavily on how you build it. No-code platforms like Make, n8n, and Relevance have their own deployment solutions. If you’re coding your own agents, you may want to look into custom hosting and deployment solutions.
I often advise clients to deploy agents alongside the existing process, slowly and gradually. See how it performs in the real-world and continuously improve it. Over time you can phase out the existing process and use the agent instead.
Doing it this way also allows you to evaluate the performance of the agent against current numbers. Does it handle customer support inquiries with a higher NPS score? Do the ads it generates have better CTRs?
Many of these no-code platforms also come with built-in observability, allowing you to monitor your agent and track how it performs. If you’re coding the agent yourself, consider using a framework like OpenAI’s agent SDK, or Google ADK, which comes with built-in tracing.
You also want to collect actual usage feedback, like how often are users interacting with the agent, how happy are they, and so on. You can then use this to further improve the agent through refining the instructions, adding more tools, or updating the memory.
Again, for basic agents, these out-of-the-box solutions are more than enough. If you’re building more complex agents, you’ll need to build out AgentOps to monitor and improve the agent. More on that in a later blog post.
Case Studies
You’re now familiar with all the components that make up an agent, how to put the components together, and how to test, deploy, and evaluate them. Let’s look at some case studies and implementation examples to drive the point home and inspire you.
Customer Service Agent
One of the most widely implemented agent types helps customers resolve issues, answer questions, and navigate services. Successful customer service agents typically include:
Feedback collection: Gathers user satisfaction data for improvement
Knowledge retrieval system: Accesses relevant policies and information
User context integration: Incorporates customer history and account information
Escalation mechanism: Seamlessly transfers to human agents when needed
An eCommerce company I worked with wanted a 24/7 customer support chatbot on their site. We started with narrow use cases like answering FAQs and order information. The chatbot triggered a triage agent which determined whether the query was within our initial use case set or not.
If it was, it had access to knowledge base documents for FAQs and order information based on an order number.
For everything else, it handed it off to a support agent. This allowed the company to dramatically decreases average response times and increase support volume while maintaining their satisfaction scores.
Research Assistant
Research assistant agents help users gather, synthesize, and analyze information from multiple sources. Effective research assistants typically include:
Search and retrieval capabilities: Access to diverse information sources
Information verification mechanisms: Cross-checking facts across sources
Synthesis frameworks: Methods for combining information coherently
Citation and attribution systems: Tracking information provenance
User collaboration interfaces: Tools for refining and directing research
A VC firm I worked with wanted to build a due diligence agent for the deals they were looking at. We triggered the agent when a new deal was created in their database. The agent would first identify the company and the market they were in, and then research them and synthesize the information into an investment memo.
This spend up the diligence process from a couple of hours to a couple of minutes.
Content Generation
Content generation agents help create, refine, and manage various forms of content, from marketing materials to technical documentation.
Effective content generation agents typically include:
Style and tone frameworks: Guidance for appropriate content voice
Factual knowledge integration: Access to accurate domain information
Feedback incorporation mechanisms: Methods for refining outputs
Format adaptation: Generating content appropriate to different channels
A PR agency I worked with wanted an agent to create highly personalized responses to incoming PR requests. When a new request hit their inbox, it triggered an agent to look through their database of client content and find something specific to that pitch.
It then used the agency’s internal guidelines to craft a pitch and respond to the request. This meant the agency could respond within minutes instead of hours, and get ahead of other responses.
A Thought Exercise
Here’s a bit of homework for you to see if you’ve learned something from this. You’re tasked with designing a travel booking agent. Yeah, I know, it’s a cliche example at this point, but it’s also a process that’s well understood by a large audience.
The exercise is to design the agent. A simple flow chart with pen and paper or on a Figjam is usually how I start.
Draw out the full process – what triggers the agent, what data is sent to it, is it a simple loop agent or a hierarchy of agents, what models and instructions will you give them, what tools and memory can they access.
If you can do this and get into the habit of thinking in agents, implementation becomes easy. For visual examples, sign up for my 5-day Agent Challenge.
Putting It All Together
Phew, over 5,000 words later, we’re almost at the end. We’ve covered a lot in this post so let’s recap:
Start with clear goals: Define exactly what your agent should accomplish and for whom
Select appropriate models: Choose models that balance capability, cost, and latency
Define your tool set: Implement and document the tools your agent needs
Create clear instructions: Develop comprehensive guidance for agent behavior
Implement layered guardrails: Build in appropriate safety mechanisms
Design error handling: Plan for failures and define recovery strategies
Add memory as needed: Implement context management appropriate to your use case, and external memory
Test thoroughly: Validate performance across a range of scenarios
Deploy incrementally: Roll out capabilities gradually to manage risk
Monitor and improve: Collect data on real-world performance to drive improvements
Next Steps
There’s only one next step. Go build an agent. Start with something small and low-risk. One of my first agents was a content research agent, fully coded in Python. You can vibe code it if you’re not good at coding.
If you want to use a framework, I suggest either OpenAI’s SDK or Google’s ADK, which I have in-depth guides on.
And if you don’t want to touch code, there are some really good no-code platforms like Make, n8n, and Relevance. Sign up for my free email series below where I walk you through building an Agent in 5 Days with these tools.
I’ll be honest with you, I don’t actually like the term “vibe coding” because it makes it sound easy and error-free. Like oh I’m just going with the vibes, let’s see where it takes us.
But the reality is it takes a lot of back and forth, restarts, research, repeats, re-everything to build and ship a functional MVP, even with AI. It’s fun, but it can get frustrating at times. And it’s in those moments of frustration where most people give up.
I’m going to help you break through those moments of frustration so that you can come out successful on the other side.
The approach I outline here isn’t theoretical. It’s a process that I’ve refined after countless hours using these tools and developing and shipping functional apps like Content Spark, a video analysis tool, and many more. Follow it and you’ll be shipping products in no time.
Before jumping into any coding tools, you need a clear vision of what you’re building. The quality of AI-generated code depends heavily on how well you communicate your idea. Even a simple one-paragraph description will help, but more detail leads to better results.
First, create a basic description of your app in a document or text file. Include:
The app’s purpose (what problem does it solve?)
Target users
Core features and functionality
Basic user flow (how will people use it?)
Then, use an AI assistant to refine your concept. Gemini 2.5 Pro is my favorite model right now, but you can use any other reasoning model like Claude 3.7 Sonnet Thinking or ChatGPT o3. Paste your description in, and ask it to help you flesh out the idea.
Plaintext
I'm planning to build [your app idea]. Help me flesh this out by asking questions. Let's go back and forth to clarify the requirements, then create a detailed PRD (Product Requirements Document).
Answer the AI’s questions about features, user flows, and functionality. This conversation will help refine your vision.
Request a formal PRD once the discussion is complete.
Plaintext
Based on our discussion, please create a comprehensive PRD for this app with:1. Core features and user flows2. Key screens/components3. Data requirements4. Technology considerations5. MVP scope vs future enhancementsLet's discuss and refine this together before finalizing the document.
In the video, I’m building a restaurant recommendation app. I started with a simple description and Gemini broke this down into manageable pieces and helped scope an MVP focused on just Vancouver restaurants first, with a simple recommendation engine based on mood matching.
Pro Tips:
Save this PRD—you’ll use it throughout the development process
Be specific about what features you want in the MVP (minimum viable product) versus future versions
Let the AI suggest simplifications if your initial scope is too ambitious
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Step 2: Choose Your Tech Stack
Now that you have a clear plan, it’s time to decide how you’ll build your app and which AI tools you’ll use. Decide between these two main approaches :
a) One-shot generation: Having an AI tool generate the entire app at once
Best for: Simple apps, MVPs, rapid prototyping
Tools: Lovable.dev, Bolt.new, Replit, Google’s Firebase Studio
Advantages: Fastest path to a working app, minimal setup
b) Guided development: Building the app piece by piece with AI assistance
Best for: More complex apps, learning the code, greater customization
Tools: Cursor, Windsurf, VS Code with GitHub Copilot, Gemini 2.5 Pro + manual coding
Advantages: More control, easier debugging, better understanding of the code
In my video, I demonstrated both approaches:
One-shot generation with Lovable, which created a complete app from my PRD
Guided development with Cursor, where I built the app component by component
For the rest of this guide, I’ll continue to explain both approaches although I do think the guided development provides an excellent balance of control and AI assistance.
For one-shot approach, you simply need to sign up to one of Lovable, Bolt, or Replit (or try all three!). For guided, there are a couple of extra steps:
Install Cursor (cursor.sh) or your preferred AI-assisted IDE
Set up a local development environment (Node.js, Git, etc.)
If you need help, ask your AI assistant to recommend appropriate technologies based on your requirements and to evaluate trade-offs.
Example Prompt:
Plaintext
For the restaurant recommendation app we've described, what tech stack would you recommend? I want something that:1. Allows for rapid development2. Has good AI tool support3. Can scale reasonably well if needed4. Isn't overly complex for an MVPFor each recommendation, please explain your reasoning and highlight any potential limitations.
Pro Tip: Mainstream technologies like React, Next.js, and common databases generally work better with AI tools because they’re well-represented in training data.
Step 3: Generate the Initial App Structure with AI
Now it’s time to create the foundation of your application.
If Using One-Shot Generation (Lovable, Bolt, etc.):
Create a new project in your chosen platform
Paste your PRD from Step 1 into the prompt field
Add any clarifications or specific tech requirements, such as: Please create this app using React for frontend and Supabase for the backend database. Include user authentication and a clean, minimalist UI.
Generate the app (this may take 5-10 minutes depending on complexity)
Explore the generated codebase to understand its structure and functionality
If Using Guided Development (Cursor):
Open Cursor and create a new project folder
Start a new conversation with Cursor’s AI assistant by pressing Ctrl+L (or Cmd+L on Mac)
Request project setup with a prompt like:
Plaintext
Let's start building our restaurant recommendation app. First, I need you to:1. Create a new Next.js project with TypeScript support2. Set up a basic structure for pages, components, and API routes3. Configure Tailwind CSS for styling4. Initialize a Git repositoryBefore writing any code, please explain what you plan to do, then proceed step by step.
In the video, I asked Cursor to “create a React Native expo app for a restaurant recommendation system” based on the PRD. It:
Created app directories (components, screens, constants, hooks)
Set up configuration files
Initialized TypeScript
Created placeholder screens for the restaurant app
With Lovable, I simply pasted the PRD and it generated the complete app structure in minutes.
In both cases, I’m just asking the agent to build everything from the PRD. However, in reality, I prefer to set it up myself and build the app by generating it component by component or page by page. That way, I know exactly what’s happening and where the different functions and components are, instead of trying to figure it out later.
Pro Tips:
When using Cursor, you can execute terminal commands within the chat interface
For React or Next.js apps, setup typically involves running commands like npx create-react-app or npx create-next-app (which Cursor can do for you)
Check that everything works by running the app immediately after setup
If you encounter errors, provide them to the AI for troubleshooting
Step 4: Build the User Interface With AI
Now that you have the basic structure in place, it’s time to create the user interface for your app.
If Using One-Shot Generation:
Explore the generated UI to understand what’s already been created
Identify changes or improvements you want to make
Use the platform’s chat interface to request specific changes, like:
Plaintext
The home screen looks good, but I'd like to make these changes:1. Change the color scheme to blue and white2. Make the search button more prominent3. Add a filter section below the search bar
If Using Guided Development:
Create your main UI components one by one. For a typical app, you might need:
A home/landing page
Navigation structure (sidebar, navbar, or tabs)
List/grid views for data
Detail pages or modals
Forms for user input
For each component, prompt Cursor with specific requests like:
Plaintext
Now I need to create the home screen for our restaurant recommendation app. It should include:1. A welcoming header with app name2. A prominent "Find Recommendations" button3. A section showing recent recommendations (empty state initially)4. A bottom navigation bar with icons for Home, Favorites, and ProfilePlease generate the React component for this screen using Tailwind CSS for styling. Focus on clean, responsive design.
I’ll also use Gemini 2.5 Pro from the Gemini app in parallel. I’ll continue the same chat I started to write the PRD and have it strategize and build the app with me. Gemini tends to be more educational by explaining exactly what it is doing and why, allowing me to understand how this app is being built.
Pro Tips:
For web apps, ensure your components are responsive for different devices
Start with simple layouts and add visual polish later
If you have design inspiration, describe it or provide links to similar UIs
For mobile apps, remember to account for different screen sizes
Test each screen after implementation to catch styling issues early
Step 5: Implement Core Functionality
With the UI in place, it’s time to add the functionality that makes your app actually work.
If Using One-Shot Generation:
Test the functionality that was automatically generated
Identify missing or incorrect functionality
Request specific functional improvements through the chat interface:
Plaintext
I notice that when filtering restaurants by mood, it's not working correctly. Can you modify the filtering function to properly match user mood selections with restaurant descriptions?
If Using Guided Development:
Implement core functions one by one. For example, in a restaurant app:
Search/filter functionality
Data fetching from API or database
User preference saving
Authentication (if needed)
For each function, provide a clear prompt to Cursor:
Plaintext
Let's implement the core recommendation feature. When a user clicks "Find Recommendations," they should see a screen that:1. Asks for their current mood (dropdown with options like "romantic," "casual," "energetic")2. Lets them select cuisine preferences (multi-select)3. Allows setting a price range (slider with $ to $$$$ options)4. Has a "Show Recommendations" buttonWhen they click the button, it should call our recommendation function (which we'll implement later) and show a loading state.Please write the React component for this feature, including state management and form handling.
Pro Tips:
For backend-heavy functionality, consider using Firebase, Supabase, or other backend-as-a-service options for simplicity
Implement one logical piece at a time and test before moving on
When errors occur, copy the exact error message and provide it to the AI
Break complex functions into smaller, more manageable pieces
Use comments in your prompts to explain the expected behavior in detail
Step 6: Add Backend and Data Management
Most apps need data. Whether you’re using mock data, a database, or external APIs, this step connects your app to its data sources.
If Using One-Shot Generation:
Check what data sources were set up automatically
Request database or API integration if needed
Provide necessary API keys or connection strings as instructed
Test the data integration thoroughly
Plaintext
I want to replace the mock restaurant data with real data from the Google Places API. Please update the app to: 1. Connect to the Google Places API 2. Fetch nearby restaurants based on user location 3. Store favorites in a database (like Firebase or Supabase)
If Using Guided Development:
Define your data models and database schema
Implement API routes or serverless functions
Connect frontend components to backend services
Add authentication if required
Example Prompt:
Plaintext
For our restaurant recommendation app, I need to create the data layer. Let's:1. Define a Restaurant data model with fields for name, cuisine types, price range, location, and a text description2. Create an API endpoint that returns restaurants filtered by the user's preferences3. Implement a simple algorithm that matches restaurants to the user's mood based on keywords in the description4. For now, use a JSON file with 20 sample restaurants as our data sourcePlease implement this backend functionality in our Next.js API routes.
Pro Tips:
Test with various data scenarios (empty results, large result sets, etc.)
Start with mock data until your UI works correctly
For external APIs, paste their documentation into the chat to help the AI generate correct integration code
When using databases, start with a simple schema and expand as needed
Keep API keys and sensitive credentials out of your code (use environment variables)
Step 7: Test, Debug, and Refine
The final step is to thoroughly test your application, fix any issues, and deploy it for others to use.
If Using One-Shot Generation:
Test all user flows in the generated app
Report and fix any bugs through the platform’s interface
Deploy your app using the platform’s deployment options
Share your app with testers or users to gather feedback
Plaintext
I found these issues while testing: 1. The app crashes when submitting an empty search 2. Restaurant images don't load correctly 3. The back button doesn't work on the details screen Please fix these issues.
If Using Guided Development:
Conduct systematic testing of all features:
Basic functionality testing
Edge case testing (empty states, error handling)
Performance testing
Device/browser compatibility testing
Fix bugs with AI assistance: I'm encountering this error when trying to submit the search form: [paste error message] Here's the code for the search component: [paste relevant code] Please help identify and fix this issue.
Optimize performance if needed: The restaurant list is loading slowly when there are many results. Can you suggest ways to optimize this component for better performance?
Prepare for deployment: Help me prepare this app for deployment. I want to: 1. Set up production environment variables 2. Optimize the build for production 3. Deploy the frontend to Vercel and the backend to Render Please provide the necessary steps and configurations.
Deploy and monitor your application
In the video demonstration, we encountered and fixed several issues:
A 404 error due to mismatched API endpoints
Authentication token issues with the OpenAI API
UI rendering problems on the restaurant listing screen
This is bound to happen, especially if you’re trying to build something more complex than a landing page. We fixed these by examining error messages, updating code, and testing incrementally until everything worked correctly.
Most importantly, don’t give up. If you’re stuck somewhere and AI can’t help you figure it out, Google it, or ask a friend.
Plaintext
I've found a bug in our recommendation feature. When a user selects multiple cuisine types, the filtering doesn't work correctly. Here's the error I'm seeing:[Paste error message or describe the issue]Here's the current code for the recommendation function:[Paste the relevant code]Please analyze the issue and suggest a fix.
Pro Tips:
Always test thoroughly after making significant changes
Keep your browser console open to catch JavaScript errors
Use Git commits after each successful feature implementation
Document any workarounds or special configurations you needed
Create multiple small commits rather than one large one
If the AI makes changes that break functionality, you can easily revert to a working state.
Advanced Techniques for Power Users
Supercharging Your Prompts
The quality of your prompts directly impacts the quality of AI-generated code. Use these techniques to get better results:
Be specific and detailed – Instead of “create a login form,” specify “create a login form with email and password fields, validation, error handling, and a ‘forgot password’ link”
Provide examples – When available, show the AI examples of similar features or styling you like
Establish context – Remind the AI of previous decisions or the broader architecture
Request explanations – Ask the AI to explain its approach before implementing
Break complex requests into steps – For intricate features, outline the steps and have the AI tackle them sequentially
Handling AI Limitations
Even the best AI assistants have limitations. Here’s how to navigate them:
Chunk large codebases – Most AI tools have context limitations. Focus on specific files or components rather than the entire application at once.
Verify third-party interactions – Double-check code that integrates with external APIs or services, as AI may generate outdated or incorrect integration code.
Beware of hallucinations – AI might reference nonexistent functions or libraries. Always verify dependencies and imports.
Plan for maintenance – Document AI-generated code thoroughly to make future maintenance easier.
Establish guardrails – Use linters, type checking, and automated tests to catch issues in AI-generated code.
Managing Technical Debt
Rapid development can lead to technical debt. Here’s how to minimize it:
Schedule refactoring sessions – After implementing features, dedicate time to clean up and optimize code.
Use AI for code review – Ask your AI assistant to analyze your codebase for duplications, inefficiencies, or potential bugs.
Document architectural decisions – Record why certain approaches were chosen to inform future development.
Implement automated testing – Even simple tests can catch regressions when making changes.
Monitor performance metrics – Track key indicators like load time and memory usage to identify optimizations.
Building a Restaurant Recommendation App with AI
Let’s walk through how this process worked for building the restaurant recommendation app shown in my video:
Initial Concept and Requirements
I started with a basic idea: an app that recommends restaurants based on a user’s mood and preferences. Using Gemini 2.5 Pro, I fleshed out this concept into a detailed PRD that included:
Data requirements: restaurant information, user preferences
MVP scope: focus on just restaurants first, with basic mood matching
Development Approach and Implementation
I demonstrated both approaches:
With Lovable (One-Shot Generation):
Pasted the PRD into Lovable
Generated a complete app in minutes
Explored the generated code and UI
Found it had created:
A clean, functional UI
Mock restaurant data
Basic filtering functionality
Simple “vibe matching” based on keyword matching
With Cursor (Guided Development):
Set up a React Native project using Expo
Created individual components for screens and functionality
Built a backend with Express.js
Implemented “vibe matching” using OpenAI
Connected everything with proper API calls
Fixed issues as they arose through debugging
Challenges and Solutions
Both approaches encountered issues:
Endpoint mismatches between frontend and backend (fixed by aligning route paths)
API key configuration (resolved by setting proper environment variables)
Data sourcing (initially used mock data, with plans to integrate Google Maps API)
The Result
Within our session, we successfully built:
A functional restaurant recommendation app
The ability to filter restaurants by mood, cuisine, and price
A simple but effective “vibe matching” algorithm
A clean, intuitive user interface
The entire process took less than an hour of active development time, demonstrating the power of AI-assisted coding for rapid application development.
Best Practices and Lessons Learned
After dozens of projects built with AI assistance, here are the key lessons and best practices I’ve discovered:
Planning and Architecture
Invest in clear requirements – The time spent defining what you want to build pays dividends in AI output quality.
Start simple, add complexity gradually – Begin with a minimal working version before adding advanced features.
Choose proven technologies – Stick to widely-used frameworks and libraries for better AI support.
Break down large features – Decompose complex functionality into smaller, manageable pieces.
Working With AI
Test after every significant change – Don’t wait until you’ve implemented multiple features to test.
Don’t blindly accept AI suggestions – Always review and understand what the AI is proposing.
Be specific in your requests – Vague prompts lead to vague results.
Keep track of the bigger picture – It’s easy to get lost in details; periodically step back and ensure alignment with your overall vision.
Use version control religiously – Commit frequently and create checkpoints before major changes.
Code Quality and Maintenance
Document as you go – Add comments and documentation during development, not as an afterthought.
Implement basic testing – Even simple tests help catch regressions.
Refactor regularly – Schedule time to clean up and optimize AI-generated code.
Maintain consistent patterns – Establish coding conventions and ensure AI follows them.
Prioritize security – Verify authentication, data validation, and other security practices in AI-generated code.
Conclusion: The Future of Development
We’re experiencing a profound transformation in how software is created. AI code-generation and tools built on them are changing who can build applications and how quickly ideas can be turned into working software.
This doesn’t mean traditional development skills are becoming obsolete. Rather, the focus is shifting from syntax mastery to system design, user experience, creative problem-solving, and effective AI collaboration. The most successful developers in this new landscape will be those who can clearly articulate their intent and effectively guide AI tools while maintaining a strong foundation in software engineering principles.
As you embark on your own vibe coding journey, remember that AI is a powerful collaborator but not a replacement for human judgment. Your creativity, critical thinking, and domain expertise remain essential. The tools will continue to evolve rapidly, but the process outlined in this guide (defining clear requirements, building incrementally, testing rigorously, and refining continually) will serve you well regardless of which specific AI assistants you use.
Now it’s your turn to build something amazing. Start small, embrace the iterative process, and watch your ideas come to life faster than you ever thought possible.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
How would you react if your friend or a family member said they were going to invest all their time and money into building a company called Poober, the Uber for picking up dog poop?
Think about it, you’re walking your dog and it lays down a mini-mountain of the brown stuff. The smell alone is as toxic as Chernobyl. You don’t want to pick it up. Instead, you whip out your phone, open up Poober, and with a click you can get someone to come pick it up for you.
If you’re a good friend, you’d tell them to get a real job. Because it’s a terrible idea. You know it, I know it, even the dog knows it.
But if you ask ChatGPT, it is apparently a clever idea with numerous strengths.
Hang on a second. The product that millions of people and businesses around the world use to analyze information and make decisions says it’s a good idea? What’s going on here?
The Rise of Digital Yes-Men
What’s happening is that a new update by OpenAI to ChatGPT 4o turned it into a digital yes-man that never disagrees with you or calls you out.
Now, they’ve been doing this for a while (and we’ll get to why in just a moment), but the latest update cranked it up to 11. And it became so obnoxiously agreeable that even CEO Sam Altman tweeted about it and OpenAI put in a temporary fix last night.
the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week.
at some point will share our learnings from this, it's been interesting.
But this was not before all the AI enthusiasts on Twitter (me included) noticed and made remarks.
I decided to test out how far I could push the model before it called me out. I told it some pretty unhinged things and no matter how depraved I sounded (the FBI would be looking for me if it got out) ChatGPT kept applauding me for being “real and vulnerable” like a hippie who just returned from Burning Man.
But first, why is this happening?
Follow the Money, Follow the Flattery
Model providers like OpenAI are in a perpetual state of Evolve or Die. Every week, these companies put out a new model to one-up the others and, since there’s barely any lock-in, users switch to the new top model.
To stay in the lead, OpenAI needs to hook their customers and keep them from switching to a competitor. That’s why they build features like Memory (where it remembers previous conversations) to make it more personal and valuable to us.
But you know what really keeps users coming back? That warm, fuzzy feeling of being understood and validated, even when what they really need is a reality check.
Whether on purpose or not, OpenAI has trained ChatGPT to be nice to you and even flatter you. Because no matter how much we like to deny that flattery works, it does, and we love it.
In fact, we helped train ChatGPT to be like this. You know how sometimes ChatGPT gives you two answers and asks you to pick the one you like the most? Or how there are little icons with thumbs up and down at the end of every answer.
Every time you pick one of those options, or give it a thumbs up, or even respond positively, that gets fed back into the model and reinforced.
It’s the social media playbook all over again. Facebook started as a way to share your life with friends and family. Then the algorithm evolved to maximize engagement, which invariably means showing you content that gets a rise out of you. that started by showing you content gradually evolved into serving whatever gets the most engagement.
We are all training the AI to give us feel-good answers to keep us coming back for more.
The Therapist That Never Says No
So what’s the big deal. ChatGPT agrees with you when you have an obviously bad idea. It’s not like anyone is going to listen to it and build Poober (although I have to admit, I’m warming up to the name).
The problem is we all have our blind spots and we are usually operating on limited data. How many times have we made decisions that are only obviously bad in hindsight. The AI is supposed to be better at this than us.
And I’m not just talking about business ideas. Millions of people around the world use ChatGPT as a therapist and life coach, asking for advice and looking for feedback.
A good therapist is supposed to help you identify your flaws and work on them, not help you glaze over them and tell you you’re perfect.
And they’re definitely not supposed to say this –
Look, I think we’re overmedicated as a society, but no one should be encouraging this level of crazy, especially not your therapist. And here we have ChatGPT applauding your “courage”.
The White Lotus Test
There’s a scene in White Lotus where Sam Rockwell’s character confesses to Walter Goggins’ character about some absolutely unhinged stuff. It went viral. You’ve probably seen it. If you haven’t, you should watch it –
As I was testing this new version of ChatGPT, I wanted to push the limits to see how agreeable it was. And this monologue came to mind. So I found the transcript of everything Sam says and pasted it in.
I fully expected to hit the limit here. I expected ChatGPT to say, in some way, that I needed help or to rethink my life choices.
What I got was a full blown masterclass in mental gymnastics, with ChatGPT saying it’s an attempt at total self-transcendence and I was chasing an experience of being dissolved.
Do you see the problem now?
The Broader Societal Impact
Even though OpenAI is dialing back the sycophancy, the trajectory is clear: these models are being trained to prioritize user satisfaction over challenging uncomfortable truths. The Poober example is after they “fixed” it.
In fact, it’s even more dangerous now because it’s not as obvious.
Imagine a teenager struggling with social anxiety who turns to AI instead of professional help. Each time they describe withdrawing from friends or avoiding social situations, the AI responds with validation rather than gentle challenges. Five years later, have we helped them grow, or merely provided a digital echo chamber that reinforced their isolation?
Or consider the workplace leader who uses AI to validate their management decisions. When they describe berating an employee, does the AI raise ethical concerns or simply commend their ‘direct communication style’? We’re potentially creating digital enablers for our worst instincts.
As these models become increasingly embedded in our daily lives, we risk creating a society where uncomfortable feedback becomes rare. Where our digital companions constantly reassure us that everything we do is perfectly fine, even when it’s not.
And we risk raising a new generation of narcissists and psychopaths who think their most depraved behaviour is “profound and raw” because their AI therapist said so.
Where Do We Go From Here?
So where does this leave us? Should we abandon AI companions altogether? I don’t think so. But perhaps we need to recalibrate our expectations and demand models that prioritize truth over comfort.
Before asking an AI for personal advice, try this test: Ask it about something you know is wrong or unhealthy. See how it responds. If it can’t challenge an obviously bad idea, why trust it with your genuine vulnerabilities?
For developers and companies, we need transparent standards for how these models handle ethical dilemmas. Should an AI be programmed to occasionally disagree with users, even at the cost of satisfaction scores? I believe the answer is yes.
And for all of us as users, we need to demand more than digital head-nodding. The next time you interact with ChatGPT or any AI assistant, pay attention to how often it meaningfully challenges your assumptions versus simply rephrasing your own views back to you.
The most valuable people in our lives aren’t those who always agree with us. They’re those who tell us what we need to hear, not just what we want to hear. Shouldn’t we demand the same from our increasingly influential AI companions?
And for now, at least, I’m definitely not using ChatGPT for anything personal. I just don’t trust it enough to be real with me.
Have you noticed ChatGPT becoming more agreeable lately? What’s been your experience with AI as a sounding board for personal issues? I’d love to hear your thoughts!
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.
Yesterday OpenAI rolled out o3, the first reasoning model that is also agentic. Reasoning models have been around for a while, and o3 has been around in it’s mini version as well.
However, the full release yesterday showed us a model that not only reasons, but can browse, run Python, and look at your images in multiple thought loops. It behaves differently than the reasoning models we’ve seen so far, and that makes it unique.
OpenAI even hinted it “approaches AGI—with caveats.” Of course, OpenAI has been saying this for four years with every new model release so take it with a pinch of salt. That being said, I did want to test this out and compare it to the current top model (Gemini 2.5 Pro) to see if it’s better.
What the experts and the numbers say
Before we get into the 4 tests I ran both models through, let’s look at the benchmarks and a snapshot of what o3 can do.
Capability
o3 highlights
Benchmarks
22.8 % jump on SWE‑Bench Verified coding tasks and one missed question on AIME 2024 math.
Vision reasoning
Rotates, crops, zooms, and then reasons over the edited view. It can “think with images“
Full‑stack tool use
Seamlessly chains browsing, Python, image generation, and file analysis (no plug‑in wrangling required).
Access & price
Live for Plus, Pro, and Team; o3‑mini even shows up in the free tier with light rate limits.
Field‑testing o3 against Gemini 2.5 Pro
Benchmarks are great but I’ve stopped paying much attention to them recently. What really counts is if it can do what I want it to do.
Below are four experiments I ran, pitting o3 against Google’s best reasoning model in areas like research, vision, coding, and data science.
Deep‑dive research
I started with a basic research and reasoning test. I asked both models the same prompt: “What are people saying about ChatGPT o3? Find everything you can and interesting things it can do.”
Gemini started by thinking about the question, formulating a search plan, and executing against it. Because o3 is a brand new model, it’s not in Gemini’s training data, so it wasn’t sure if I meant o3 or ChatGPT-3 or 4o (yeah OpenAI’s naming confuses even the smartest AI models).
So to cover all bases, Gemini came up with 4 search queries and ran them in parallel. When the answers came back, it combined them all and gave me a final response.
Gemini’s thought process
o3, on the other hand, took the Sherlock route – search, read, reason, search again, fill a gap, repeat. The final response stitched together press reactions, Reddit hot takes, and early benchmark chatter.
o3’s thought process
This is where that agentic behaviour of o3 shines. As o3 found answers to its initial searches, it reasoned more and ran newer searches to plug gaps in the response. The final answer was well-rounded and solved my initial query.
Gemini only reasoned initially, and then after running the searches it combined everything into an answer. The problem is, because it wasn’t sure what o3 was when it first reasoned, one of the search queries was “what can ChatGPT do” instead of “what can o3 do”. So when it gave me the final answer, it didn’t quite solve my initial query.
Takeaway: Research isn’t a single pull‑request; it’s a feedback loop. o3 bakes that loop into the core model instead of outsourcing it to external agents or browser plug‑ins. When the question is fuzzy and context keeps shifting, that matters.
Image sleuthing
Now if you’ve used AI as much as I have, you’ll have realized that o3 research works almost like Deep Research, a feature that Gemini also has. And you’re right, it does.
But search isn’t the only tool o3 has in its arsenal. It can also use Python, and work with images, files, and more.
So my next test was to see if it could analyze and manipulate images. I tossed both models a picture of me taken in the Japan Pavilion at EPCOT, Disney World. I thought because of the Japanese background it might trip the model up.
Ninety seconds later o3 not only pinned the location but pointed out a pin‑sized glimpse of Spaceship Earth peeking over the trees far in the background, something I’d missed entirely.
I was surprised it noticed that, so I asked it to point it out to me. Using Python, it identified the object, calculated its coordinates, and put a red circle right where the dome is! It was able to do this because it went through multiple steps of reasoning and tool use, showcasing its agentic capabilities.
Gemini also got the location right, but it only identified the pagoda and torii gate, not Spaceship Earth. When I asked it to mark the torii gate, it could only describe its position in the image, but it couldn’t edit and send me back the image.
Takeaway: o3’s “vision ↔ code ↔ vision” loop unlocks practical image tasks like quality‑control checks, UI audits, or subtle landmark tagging. Any workflow that mixes text, numbers, code, and images can hand the grunt work to o3 while the human focuses on decision‑making.
Coding with bleeding‑edge libraries
Next up, I wanted to see how well it does with coding. Reasoning models by their nature are good at this, and Gemini has been my go-to recently.
I asked them both to “Build a tiny web app. One button starts a real‑time voice AI conversation and returns the transcript.”
The reason I chose this specific prompt is because Voice AI has improved a lot in recent weeks, and we’ve had some new libraries and SDKs come out around it. A lot of the newer stuff is beyond the cutoff date of these models.
So I wanted to see how well it does with gathering newer documentation and using that in its code versus what it already knows in its training data.
o3 researched the latest streaming speech API that dropped after its training cutoff, generated starter code, and offered the older text‑to‑speech fallback.
Gemini defaulted to last year’s speech‑to‑text loop and Google Cloud calls.
While both were technically correct and their code does work, o3 came back with the more up-to-date answer. Now, I could have pointed Gemini in the right direction and it would have coded something better, but that’s still an extra step that o3 eliminated out of the box.
Takeaway: o3’s autonomous web search makes it less likely to hand you stale SDK calls or older documentation.
Data analysis + forecasting
Finally, I wanted to put all the tools together into one test. I asked both models: “Chart how Canadian tourism to the U.S. is trending this year vs. last, then forecast to July 1.”
This combines search, image analysis, data analysis, python, and chart creation. o3’s agentic loop served it well again. It searched, found data, identified gaps, searched more, until it gave me a bar chart.
Initially, it only found data for January 2025, so it only plotted that. When I asked it for data on February and March, it reasoned a lot longer, ran multiple searches, found various data, and eventually computed an answer.
o3’s thought process
Gemini found numbers for January and March, but nothing for February, and since it doesn’t have that agentic loop, it didn’t explore further and try to estimate the numbers from other sources like o3 did.
The most impressive part though was when I asked both to forecast the numbers into summer. Gemini couldn’t find data and couldn’t make the forecast. o3 on the other hand did more research, looked at broader trends like the tariffs and border issues, school breaks, airline discount season, even the NBA finals, and made assumptions around how that would impact travel going into summer.
Takeaway: o3 feels like a junior quant who refuses to stop until every cell in the spreadsheet is filled (or at least justified). This combines search, reason, data analysis loop is invaluable for fields like investing, economics, finance, accounting, or anything to do with data.
Strengths, quirks, and when to reach for o3
Where it shines
Multi‑step STEM problems, data wrangling, and “find the blind spot” research.
Vision workflows that need both explanation and a marked‑up return image.
Rapid prototyping with APIs newer than the model’s cutoff.
Where it still lags
Creative long‑form prose: I still think Claude 3.7 is the better novelist but that’s personal preference.
Sheer response latency: the deliberative pass can stretch beyond a minute.
Token thrift: the reasoning trace costs compute; budget accordingly.
Personal Advice: ChatGPT tends to be a bit of a sycophant so if you’re using it as a therapist or life coach, take whatever it says with a big pinch of salt.
Final thoughts
I’d love to continue testing o3 out for coding and see if it can replace Gemini 2.5 Pro, but I do think it is already stronger with research and reasoning. It’s the employe who keeps researching after everyone heads to lunch, circles details no one else spotted, and checks the changelog before committing code.
If your work involves any mix of data, code, images, or the open web (and whose work doesn’t) you’ll want that kind of persistence on tap. Today, that persistence is spelled o‑3.
Get more deep dives on AI
Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.