Category: Blog

  • Software-As-A-Prompt: How AI Is Enabling On-Demand Software

    Software-As-A-Prompt: How AI Is Enabling On-Demand Software

    Last week I helped a friend, a speaking coach, build a custom app to analyze client videos with AI. He had been doing it manually as part of his coaching program. Clients would upload videos of themselves speaking to a Dropbox folder, he would watch it, and then send feedback.

    As you can imagine, it’s as time-consuming as teaching a tortoise to tango.

    So, I asked Lovable to build him a simple app that allowed users to upload a video and used Gemini to analyze it. In under 10 minutes, we had a fully functional app, built, deployed, and running on his domain. Cost? A few cents. Time? Less than it takes to make a coffee.

    And now my friend can scale this to hundreds of clients.

    Welcome to the age of on-demand software, where apps are no longer bought, they’re prompted into existence.

    Welcome to software-as-a-prompt.

    The Problem with Traditional SaaS

    You might think of this example as a one-off hack. My friend had a very specific need and there was no existing software to solve it. Of course he needed to build something custom.

    But this problem exists in traditional SaaS, it just manifests in different ways:

    Feature bloat: Enterprise SaaS platforms like Salesforce, Workday, or SAP have evolved into massive ecosystems with thousands of features. For many small to medium businesses, this is overkill and leads to unnecessary overhead.

    Cost inefficiency: The subscription model often forces companies to pay for the entire platform when they might only need a single module or specific function. This can mean thousands of dollars monthly for software that delivers value on just a fraction of its features.

    One-size-fits-all limitations: Despite customization options, traditional SaaS still follows predefined workflows and structures. Businesses with unique processes often find themselves contorting their operations to fit the software, rather than the other way around.

    Integration overhead: Managing multiple specialized SaaS solutions creates integration challenges and data silos, requiring additional resources to maintain connections between systems.

    This inefficiency has created a perfect opportunity for disruption. Why should a small business pay enterprise prices for a complex CRM when they might only need basic contact management and opportunity tracking?

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    How On-Demand Software Works

    Due to recent improvements in AI’s ability to generate functional code, you can prompt your ideal software into existence.

    AI models like Gemini 2.5 Pro, GPT-4o and Claude can now produce working code in multiple programming languages based on natural language descriptions. These models grasp the nuts and bolts of coding etiquette and best practices, whipping up everything from snazzy front-end facades to the sort of database schemas that would make a librarian swoon.

    On top of that, new platforms have emerged that streamline the app creation process. Services like Replit AI, Lovable.dev, and Bolt.new offer interfaces where users can describe the software they want in plain English, and receive a working application in return.

    You also have IDEs powered by AI, like Cursor and Windsurf. And now you have entire AI coding agents like Claude Code in your terminal that build entire apps.

    While you won’t be able to generate the entire codebase for Salesforce, you can still get a small but functional app with just a couple of prompts.

    For example, Replit’s AI Agent acts as a prompt-based app builder where users can tell the Agent their app or website idea, and it will build it for them automatically, setting up project files, writing code, and even deploying the result. It’s a bit like having an entire team of software engineers on demand through a chat interface.

    The new AI development workflow typically follows these steps:

    1. The user describes their desired application in natural language
    2. The AI generates a project structure and initial code
    3. The user provides feedback or requests changes
    4. The AI refines the application until it meets requirements
    5. The finished app can be deployed to hosting services or run locally

    Real-World Examples

    This workflow is exactly how I built the video analysis tool above. I merely prompted Lovable with natural language inputs, tested what it gave back to me, asked it to make updates, and voila, the app actually works and gives good feedback!

    Here’s a full tutorial using Gemini 2.5 Pro and Canvas –

    And I’m not the only one doing this. Here’s another example of a product designer who built a custom software to solve a very niche problem, tracking commercial vehicle documents in the UK.

    It took him less than two weeks, without any engineers, for a grand total of $75. Using Cursor (an AI IDE) and Claude, he was able to generate a fully functional system with user authentication, a Next.js front-end, a database (Supabase) backend, email alerts, and Stripe payments integration.

    And before you say it sounds too technical, he hadn’t written a line of code before. As he puts it, “with the state of AI tools in 2025, you give me Cursor, and I’ll probably blow your mind… Things I dreamt of as a designer (like not working with any engineers) are now a reality.”

    Don’t like all those ads on YouTube? Build your own Chrome extension that skips the ads and avoid paying for YouTube Premium.

    Want a scalable way to qualify leads for your business? Build a lead qualification tool with AI and save on expensive sales software.

    Even big companies are building their own tools with AI, allowing them to cut ballooning SaaS costs. Klarna, for example, announced last year that they would get rid of 1,200 SaaS tools and build their own stacks internally with AI.

    The common theme is that AI code generation dramatically lowers the barrier to implementing niche features. A user with an idea can now iterate with an AI on code, even with limited programming knowledge, until they have a working tool that does exactly what they want, no more, no less.

    Limitations and Challenges

    Despite the excitement, on-demand software has important limitations to consider:

    Code quality and reliability: AI can produce working code, but not always optimized or following best practices. More often than not, it’s a bit like a cake that rises but lacks the finesse of a master baker’s touch. Good enough for personal use but maybe not to serve to the masses. If you’re trying to build a larger project, you may still need a human developer to check the work.

    Limited scope & customization: These AI systems tend to do well with common, generic app patterns (forms, basic CRUD operations, standard web layouts). If your needs stray outside the model’s training distribution or the tool’s templates, the AI may struggle.

    Debugging and maintenance: Who fixes the AI’s code when it breaks? If you did not write the code, debugging it can be hard, and here the “author” is an opaque AI.

    Security and compliance: Perhaps the most critical concern is that AI is not guaranteed to follow security best practices. There have already been instances of GPT-4o suggesting code with vulnerabilities (e.g., SQL injection flaws or insecure authentication).

    Traditional SaaS advantages: Established software still offers benefits like professional support, regular updates, community knowledge bases, and enterprise-grade reliability that’d make a Swiss watchmaker nod approvingly.

    This is why I said you won’t be able to re-build Salesforce by prompting an AI. But if you want something small, that isn’t complex, and doesn’t need to scale to thousands of users, then AI-generated code is good enough.

    What The Future Holds

    The rate at which AI is improving means many of the drawbacks and limitations I mentioned above will be solved pretty soon. In a world that has been dominated by traditional SaaS, what does this mean?

    For Traditional SaaS Companies

    For starters, companies that have built billion-dollar businesses around the traditional SaaS model must adapt or risk disruption. This is an existential crisis for them.

    Some companies are already responding:

    • Embedding AI customization within existing platforms: Salesforce introduced Einstein GPT, which can generate custom field formulas, code, and even content within the Salesforce ecosystem. Microsoft’s Power Platform now lets users build or modify apps via Copilot.
    • Shifting value propositions: Leading SaaS vendors are emphasizing the value of their data, network effects, and enterprise-grade reliability, things that AI-generated apps can’t easily replicate.
    • Hybrid approaches: Some SaaS providers are exploring models where their core platform remains intact, but customers can use AI to generate custom extensions or integrations.

    The market may evolve such that traditional software becomes more customizable through AI, closing the gap that on-demand apps are currently filling. Either way, if you’re steering a SaaS ship, you’d better start disrupting your own tea party or someone else will crash it for you.

    For New Startups

    We’re living in an unprecedented time with incredible disruption potential. When previously investors would balk at funding yet another CRM startup, today we’re seeing AI-first business challenge established companies, and investors are throwing money at anything with the words AI or Agent in them.

    Key considerations for startups in this space:

    • Target vulnerable SaaS categories first: Point solutions with simple functionality and high subscription costs are most at risk. Identify the 20% of features that provide 80% of the value and offer that to customers with more flexibility.
    • Focus on what AI unlocks: The most successful startups will be AI-first, where AI is at the core of their product and gives customers options to customize that weren’t available before.
    • Build network effects: Vendor lock-in is at an all-time low with AI startups. Look for network effects to keep customers, like creating marketplaces for sharing and remixing AI-generated components that extend your platform.

    The funding ecosystem is taking notice. Verticalized AI products are raising millions of dollars despite established solutions owning the market.

    For Business Owners and Executives

    For decision-makers, this presents a huge opportunity to cut recurring SAS subscriptions and shift to on-demand software:

    • Start with non-critical functions: Test AI-generated alternatives for internal tools or supplementary systems before tackling core business processes.
    • Evaluate the full cost picture: While you may save on SaaS subscriptions, factor in costs for AI services, hosting, maintenance, and potential security audits.
    • Consider team capabilities: Even with AI assistance, some technical oversight is valuable. Identify who in your organization can manage these solutions.
    • Implement gradually: The Klarna approach of wholesale replacement is high-risk. A measured transition with careful evaluation at each step is more prudent for most organizations.

    The most promising areas to start are typically those where you’re paying for an entire platform but only using a narrow slice of functionality.

    Beyond merely replacing existing SaaS, on-demand software also presents and opportunity to create new features and functionality that you couldn’t before, much like the video analyzer I built for my friend. For a guide on how to do this, read my piece on Re-founding Your Company.

    For Investors

    The emergence of on-demand software creates new investment patterns:

    • Platform plays vs. vertical solutions: Capital is flowing to both general-purpose AI app generators and specialized tools targeting specific industries.
    • Key metrics to watch: User retention, frequency of app generation, and maintenance patterns will indicate which models have staying power.
    • Timeline for disruption: While some SaaS categories face immediate pressure, enterprise-level displacement will likely take 3-5 years as reliability and security concerns are addressed.
    • Potential exits: Successful startups in this space may become acquisition targets for major SaaS platforms looking to boost their AI capabilities.

    The market structure is still emerging, but early evidence suggests room for both horizontal platforms and vertical specialists rather than a winner-take-all dynamic.

    Getting Started with On-Demand Software

    If you are ready to explore on-demand software generation, here is how:

    Step 1: Pick Your Platform
    Choose one of these AI-enabled platforms to begin. You don’t need to know how to code, just how to describe what you want:

    • Replit AI: Natural-language to full-stack app, in-browser IDE with one-click deploy
    • Lovable.dev and Bolt.new: Designed for non-coders to build full-stack apps via prompts
    • Direct LLM use: Models like Gemini, GPT-4o, or Claude can generate code for custom applications
    • Enhanced IDEs: Cursor, Windsurf, GitHub Copilot, and similar tools help with interactive development

    Step 2: Define the Problem You Want to Solve
    Think small. What’s one manual task or clunky spreadsheet you’d love to replace? Examples:

    • Tracking your sales leads
    • Sending weekly reports to investors
    • Collecting form submissions into a database
    • Internal tools with limited users
    • Process automation scripts

    Step 3: Craft a Clear Prompt
    Start with something like:

    “Build a simple CRM with a form to add leads, a table to view them, and a weekly email summary.”

    • Be specific about functionality, data structures, and user flows
    • Break complex applications into logical components
    • Describe UI preferences (tables, forms, charts, etc) or integrations (Gmail, calendar)
    • Include example data, sample inputs, and expected outputs
    • Iterate through feedback rather than expecting perfection immediately

    Step 4: Let the AI Build, Then Test
    Watch as your chosen tool scaffolds the app. Then:

    • Test it
    • Click around
    • Note what works and what doesn’t

    Step 5: Iterate with the AI
    Say things like:

    “Add an edit button to each row.” “Store data in Supabase instead of local storage.” “Make the UI mobile responsive.”

    Step 6: Deploy and Share
    Platforms like Replit and Lovable let you deploy apps live with a click. You’ll get a public link you can share.

    Step 7: Expand or Repeat
    Now that you’ve built one, you’ll start spotting 10 more things you could automate. Tweak your app or start a new one.

    Remember that on-demand software is currently best suited for discrete, well-defined problems rather than complex enterprise systems.

    Final Thought: SaaP > SaaS

    If the last decade was defined by SaaS, the next one might be defined by SaaP: Software as a Prompt.

    You no longer have to adapt your workflow to software. Software will adapt to you.

    Whether you’re a founder tired of Frankenstein SaaS stacks, a marketer with a pet project, or just someone sick of feature bloat, you now have the tools to build your own solution.

    No engineers. No sprints. Just you and a good prompt.

    So go ahead: build your own damn app.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • The Claude Cookbook, Part 1: Welcome to the Kitchen

    The Claude Cookbook, Part 1: Welcome to the Kitchen

    The Claude Cookbook Series

    This is Part 1 of a 3-part series on Claude. Stay tuned for the next parts in the series.

    About a year ago, I decided to ask Claude for some advice regarding… matters of the heart. At the time, I was primarily using ChatGPT. It’s great for analyzing things, getting quick answers, and other work-related tasks. But, when it comes to personal advice, ChatGPT lacks depth.

    Claude 3 had just launched so I figured I’d give it a chance. I logged in, explained the situation in great detail, and hit enter.

    Dear reader, I’ve been playing with Language Models since 2019 when GPT-2 was released. I have spent a significant amount of time with every frontier model and trust me when I say, I was not prepared for what I was about to read.

    Claude started by saying, “I’m going to stop you right there…”

    I’m going to stop you right there…

    The AI was talking back to me. I was dumbfounded. Nonplussed. Shocked. I had the same reaction everyone had when Thanos snapped his fingers and disappeared half the universe.

    Should have gone for the head.

    After I overcame my initial shock, I went into denial. Then I got angry, going so far as to ask Claude how dare it talk to me like that, and didn’t it know it worked for me, to which Claude reminded me it did not, in fact, work for me.

    After some back and forth about who the boss was, I realized Claude was actually on to something. It had pointed out some flaws in my thinking, shared that perhaps I may have a biased view of things, and made me consider other perspectives. No other AI model does that.

    Today, while I still use other AI tools, probably more than most, Claude is my primary tool for content creation, coding, and complex situations, and it’s the one I recommend the most.

    Evidently, Anthropic, the company behind Claude, don’t do any marketing, so it’s up to me to do it for them. My goal with this series is to show you why it’s so good, and how to use it.

    • In part 1 (this part) I’ll explain what makes Claude different, and how to use the Claude web app, including features like Projects and Artifacts. This is primarily aimed at the non-technical user.
    • In part 2, I’ll talk about the developer side of things – the API, Claude Code, and MCP. This is primarily aimed at the technical user.
    • In part 3, I’ll combine everything and share end-to-end playbooks for various industries. This is for everyone.

    So preheat your curiosity, sharpen your prompt-crafting skills, and let’s start cooking with Claude!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    Meet the chef

    To understand Claude fully, specifically why it responded to me the way it did, it helps to know something about its creators.

    Anthropic was founded in 2021 by former OpenAI researchers, including siblings Dario and Daniela Amodei. The company focuses on AI safety and creating systems that are reliable, interpretable, and steerable.

    The name “Anthropic” derives from the Greek word “anthropikos,” relating to humans or human affairs, reflecting the company’s focus on developing AI that works well with and for humans. This human-centered approach permeates everything about Claude’s design and capabilities.

    So while the core process of training a large language model (LLM) like Claude remains the same, Anthropic’s focus on being human-centered introduces some tweaks to make it behave differently.

    Constitutional AI: Building an Ethical Framework

    At the heart of Claude’s uniqueness is Anthropic’s Constitutional AI approach, a way to align AI behavior with human values.

    Traditional AI alignment relies heavily on Reinforcement Learning from Human Feedback (RLHF), where human evaluators rate model outputs to guide improvement.

    Anthropic’s Constitutional AI takes a different approach. Instead of relying primarily on human feedback, Claude is given a “constitution” or a set of principles that guide its behavior.

    When Claude generates an answer, another AI model evaluates which responses better adhere to the constitution, providing a reward signal that guides Claude toward more aligned behavior.

    The constitutional approach has some key advantages:

    • It makes Claude’s ethical framework explicit and transparent
    • It reduces dependency on the subjective judgments of human evaluators
    • It allows for more consistent application of principles across diverse scenarios
    • It enables more efficient scaling of ethical alignment as models grow more complex

    Character Training

    Most AI models are trained to avoid harmful outputs – a necessary but insufficient condition for truly helpful assistance. Anthropic recognized that when we think of people we admire, we don’t just value their ability to avoid causing harm; we appreciate positive traits like curiosity, thoughtfulness, and wisdom.

    This insight led to Claude’s distinctive “character training,” which first appeared in Claude 3 and is the reason why it may sometimes talk back to you. Anthropic describes this as training Claude to “behave well in a much richer sense” than mere harm avoidance.

    Character training builds on Constitutional AI methodology but focuses on instilling positive traits rather than just avoiding negative behaviors. The process involves:

    1. Creating a list of character traits Anthropic wants to encourage
    2. Generating various conversational scenarios relevant to these traits
    3. Producing different responses aligned with the desired character
    4. Ranking these responses by how well they embody the traits
    5. Training a preference model on this data

    Meet Claude

    Unless you’re building agentic workflows with Claude or using the API in a product, you’re primarily going to be using the chat interface on the web app. Let’s get you familiar with all the features.

    Claude’s Brain

    The first thing you’ll notice in that chat interface is the model selector to the bottom right.

    Claude model selector

    As of March 2025, the Claude family consists of four primary models. Like every other AI company, they’re pretty bad at naming them. I’m not going to explain each one because they may have changed by the time you read this, and most of the time you’ll only be using one model anyway – Claude 3.7 Sonnet.

    You know how in ChatGPT you usually use GPT-4o, but you can switch to o1 or o3 if you want it to think and give you better answers for complex questions. Well, GPT-4o is their regular model, and the o-series is their reasoning model. Yeah, I told you they all suck at naming.

    Anyway, Claude 3.7 Sonnet is both a regular model and a reasoning model, and you can choose whether you want it to behave like a regular model (by selecting Normal) or reason (by selecting Extended). You have even more flexibility in the API (which we’ll get to in Part 2).

    Concise Claude, Chatty Claude

    Right next to the model selector, you’ll also see an option to choose a response style. This directly impacts the response you get from Claude and is very useful depending on your use case.

    There are a couple of presets already like Concise, where Claude’s answers are short and to the point (great for if you just want some quick information) and Explanatory, where Claude goes into excruciating detail (great for learning).

    You can also create your own styles. For example, I’ve created a Storyteller Style where Claude generates content for me by weaving in storytelling aspects, which I use to create engaging social media and blog content.

    Claude style selector

    I highly recommend you create your own styles. Claude makes it easy to do this with some instructions. You can also just upload an example of the style you want Claude to mimic and it will figure it out on its own.

    Play around with it, try a few different styles for each of your use cases, and see what works best for you.

    Projects

    Every time I want to use Claude (or any AI) for something new, I create a new chat. This is good practice since longer chats eat up context window and make the AI slower.

    However, some of my chats are related, and I use the same prompts or upload the same reference material to Claude for those chats.

    This is where the Projects feature comes in. When you create a new project, you can tell Claude exactly what the project is about, give it very specific instructions, and upload reference material.

    Every new chat you start within that Project inherits these settings, saving you from repeatedly uploading the same files or typing the same instructions.

    Claude Projects

    For example, I have a Project called MCP Creator to help me code new Model Context Protocol servers. Don’t worry if you don’t know what that is. Just know that the code for a new server is structured in a particular way, and I want Claude to follow that structure every single time.

    When I created the Project, I uploaded the full documentation on how to code an MCP server and gave it instructions on how to structure the code. When I need to build a new server, I simply start a chat in this Project, describe what I want, and Claude already has all the context it needs.

    This feature is invaluable for content creation, coding projects, and client work where consistency matters.

    Artifacts

    Artifacts (pioneered by Claude and later adopted by other AI systems) allow you to work with content separately from the main conversation.

    It’s a lot easier to show you what an artifact is in a video than to describe it in text:

    Claude’s UI has dramatically improved since I made that video, but the core functionality of the artifact remains the same. It’s essentially a side panel that allows you to run code or work with content separate from the main chat.

    Cooking with Claude

    Now that you’re familiar with how everything works, let’s dig into how to use Claude in every day life, business and personal.

    I’m not going to cover all the possible use cases here. Like ChatGPT, Gemini, Grok, or any other AI model, Claude can do basic things like summarize documents, answer questions, analyze files, and so on. And if you want Deep Research, data analysis, or image generation, you’re better off with one of the others.

    But Claude’s training gives it a particular set of skills that make it better than the rest for certain use cases. I will focus on those.

    If Claude were human

    Content

    Claude doesn’t generate images but it is really great at generating text, especially for blog posts, social media, and even fiction writing.

    Blog Posts

    While Claude can generate a really good blog post with just a topic, I do not recommend doing that. instead, use Claude as a thought partner to help you extract your unique perspective and put it down in words.

    First, I recommend creating a custom style that is based on content that you have written previously. This will help Claude generate content that follows your voice and tone.

    I also have a project set up in Claude for all the content I create for my blog. The project includes specific examples for the way I want blog content to be structured as well as context around my blog and personal brand so that Claude understands how it fits into the overall goals for my blog.

    When I create the actual content, I use Claude to first generate an outline for me. I then go back and forth with Claude to edit the outline and get it to a structure that I’m happy with.

    If I’m working on an article that requires information beyond Claude’s knowledge cutoff date, I upload research around that topic into the chat so Claude is aware of it. Of late, I’ve been automating this process with MCPs.

    All of this takes me just a few minutes with Claude. Once that’s done, I do most of the writing myself. As I’m writing, I use Claude to give me feedback on my content and poke holes in my arguments. Once I’m done, I use Claude to edit the blog post, refine the content, and cut out unnecessary bits.

    Social Media

    For social media, I also have a project set up with instructions on how to structure a tweet or a LinkedIn post as well as a custom style for short and succinct content.

    The process I follow is different. I usually use Claude on my mobile phone for this and I use the voice feature. The voice feature on mobile just allows you to speak into Claude instead of typing. Claude doesn’t speak back with a voice.

    I ramble my thoughts into Claude this way (filled with ‘ums’ and ‘ahs’). Then, based on my instructions for the project, Claude turns that into a polished social media post. So the content is all mine, but Claude is just cleaning it up.

    Website Copy

    Much of the copy on my website has been written with the help of Claude. I have a project created specifically for my website where I’ve uploaded details about my brand and business for every page or piece of copy I want to generate.

    For every page on my site, I start a new chat with Claude in that project and have it come up with a structure for that page. I then go back and forth with it on copy for specific sections until I’m statisfied.

    Storytelling

    I haven’t written stories with Claude but it is popular amongst fiction writers or this use case. I’ve read examples where authors use it to help them build characters, worlds, storylines, and more.

    Notice how the author is not using Claude to generate the entire story. Instead, Claude helps with research and brainstorming, coming up with ideas for scenes or chapters, giving feedback, and editing. Much the same way I use Claude for my blog content.

    Here’s another example of someone using it or worldbuilding. They use a Claude Project with all the relevant documents about the fictional universe so that the outputs are consistent.

    Code

    Writing code is another extremely popular use case for Claude. It can often one-shot the code for an entire project from just a prompt. The more specific your prompt is, the better.

    As always, start by creating a new Project in Claude. If you’re using certain libraries or APIs, add the documentation to the Project Knowledge. You can also add instructions for the way you want it to generate code.

    Let’s say you’re working on a React app. You can use the project knowledge to store instructions about the app that you’re building, as well as important documentation.

    Your first chat in the project would be creating the plan and architecture with Claude. Have Claude generate a list of all the pages and components you need for your app and help you build out the user flows.

    Save this plan back into the project knowledge base, and then for every chat thereafter, you can generate a specific component or page using Claude. As you can see, you go from being the person writing the code to working on the strategy while Claude writes the code.

    Many developers also use Claude inside an IDE like Cursor or directly through the command line with Claude Code. I’ll talk more about this in Part 2.

    Complex Situations

    Similar to the story I started with this blog post with, Claude is great at helping you navigate complex social situations, in business or personally.

    As I mentioned earlier, Claude has a certain set of principles that it strives to uphold. When you talk about personal situations, it isn’t just going to agree with you or parrot back your thoughts. It will try to answer in line with its principles and may end up challenging you.

    This is actually a good thing. All language models have biases in them. By training Claude to follow its principles, Anthropic is ensuring that Claude doesn’t enforce its biases upon you.

    Think of it as a good friend who has your best interests at heart. You need not follow the advice, but it’s good to have that additional perspective.

    Claude’s Limitations

    I’ve been focusing on Claude’s strengths in this blog post, especially in areas where it stands out against other AI models. But that doesn’t mean it doesn’t have its drawbacks.

    No Voice Mode – Claude doesn’t have a voice mode like ChatGPT, making it difficult to have conversations with it on-the-go. You can, however, pair it with an AI voice model if you’re building voice agents.

    No Image Generation – Claude doesn’t generate images either. You will need to use an image generation model for that, or you can pair it with an image generation model in an Agentic workflow.

    Untested Web Search – the web search feature in Claude is brand new. It’s only being rolled out right now in beta in the United States, so I haven’t had a chance to try it out yet here in Canada. However, other AI platforms have had web search for a lot longer, so I’d use those for real-time answers.

    No Deep Research – this is a powerful research functionality that every other AI model has. I built out my own version of it using Claude and Exa, but the best one out there is ChatGPT’s version, with Gemini a close second.

    Conclusion: The Claude Difference

    The most powerful way to use Claude isn’t to treat it like a mere text generator or code machine. It shines when you approach it as a collaborator, someone who brings their own perspective to help refine your thinking.

    Whether you’re crafting blog content, building software, or working through personal dilemmas, Claude’s unique training makes it more than just an echo chamber for your own thoughts.

    In Part 2 of this series, we’ll explore the developer side of Claude, including the API, Claude Code, and MCP. Later, in Part 3, we’ll dive into industry-specific playbooks that combine everything we’ve learned. Stay tuned!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • Re-Found: A Step-By-Step Guide to Becoming an AI-First Company

    Re-Found: A Step-By-Step Guide to Becoming an AI-First Company

    I recently read Ravi Gupta’s thought-provoking article “AI or Die” and found myself nodding along with his core thesis: companies that successfully integrate AI into their foundations will thrive, while those that treat it as a peripheral tool will struggle to survive.

    He talks about “re-founding”, the idea that becoming AI-first requires rebuilding your company from the ground up with AI at its core. It’s not about adding an “AI strategy” slide to your quarterly deck or launching a token innovation lab. It’s about fundamentally reimagining how your business operates.

    So how exactly do you become an AI-first company? That’s where this guide comes in. I’ve created a comprehensive framework to help you transform your organization into an AI-first company one methodical step at a time. This is based on work I’ve done with dozens of companies.

    The Pyramid of AI Adoption

    I previously wrote a post called the Pyramid of AI Adoption which illustrates how far along you are in becoming an AI-first company.

    I suggest reading the full article but here are the Cliff’s Notes:

    Stage 1: Augmentation – You’re using ChatGPT to write emails and summarize meetings. It’s like getting training wheels for your AI bicycle. Most companies are camping out here.

    Stage 2: Automation – You’ve started changing how your company actually operates, automating away processes that eat up resources faster than I demolish a chocolate bar.

    Stage 3: Innovation – You’re creating entirely new business models and products with AI that were previously impossible with your resources.

    My aim in this guide is to show you how you can get to Stage 3. Of course, reading about it is the easy part. The hard part is implementing it! Let’s go…

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    Step 1: Immerse Leadership in AI

    Transformation starts at the top. As a leader, you can’t just delegate “that AI stuff” to your tech team any more than Tony Stark could outsource being Iron Man.

    Block out two hours every week to actually use these tools. I’m not talking about reading articles about AI (unless it’s my blog). I’m talking hands-on experience. Start with Claude or ChatGPT before venturing into more specialized tools.

    Here’s your executive starter pack:

    • Strategic Planning: Today’s AI models are extremely good at taking in large volumes of text and coming up valuable insights. Feed those massive reports into AI tools and watch them extract insights faster than a gossip columnist at a celebrity wedding.
    • Communication Enhancement: Writing emails, drafting announcements, sending investor updates, these are all things that can be done faster and better with AI. You still provide the vision, but AI makes sure it doesn’t read like it was written at 2 AM after your fourth espresso.
    • Meeting Follow-up: Tools that automatically generate meeting notes and action items? Yes, please! It’s like having a super-efficient assistant who never needs coffee breaks.
    • Competitive Intelligence: New Deep Research capabilities (Google it… actually, no, ChatGPT it) let you gather information across hundreds of websites in minutes. Your intern can go back to getting you coffee now.

    In a podcast with Patrick O’Shaughnessy, Ravi mentions how he fed ChatGPT some context about a dinner he was going to and asked it to give him some talking points. He read the talking points on the Uber ride over impressed his guests. This behavior should become second nature to you.

    Step 2: Mandate AI Use Across Your Company

    Once leadership is on board the AI train, it’s time to get everyone else tickets. Some employees are already secretly using these tools, but with leadership’s blessing, adoption spreads faster than rumours about office romance.

    A fun and quick way to do this is to have leaders share their learnings in team meetings. You could perhaps call an All-Hands and have every leader walk through something they tried with AI and the results.

    Another way is to have Department Heads conduct workshops for their departments to identify and experiment with AI tools. I’ve mentioned ChatGPT and Claude but there are hundreds of department specific AI tools that are more suited for certain tasks.

    You also need to ensure employees have permission to try out different tools. At the very least, give everyone in your company a Team subscription to ChatGPT.

    When I was running the Venture Studio at Forum VC, this is exactly what we did. The whole company got a subscription to ChatGPT, and we even mandated usage of it for my department.

    Book A Free Consultation

    If you’re serious about your AI Transformation, I can help. View my Services here and book a free consultation.

    Step 3: Conduct a Comprehensive AI Readiness Assessment

    With your team engaged, the journey continues with a thorough understanding of your current state. I should warn you, this requires deep examination of how your company actually operates and you may find some surprises while doing this.

    I’ll break down exactly how to conduct this assessment in actionable steps.

    Map Your Workflows Systematically

    First, create a standardized workflow documentation template that captures:

    • Process name and department
    • Inputs required (data sources, triggers, resources)
    • Steps performed (sequential actions with decision points)
    • Outputs produced (deliverables, decisions, impacts)
    • Current time investment (hours per week/month)
    • Estimated cost (labor, technology, opportunity cost)
    • Business impact rating (critical, high, medium, low)

    Next, organize department-by-department workflow collection sessions. I recommend instructing each department head to schedule dedicated 2-3 hour mapping workshops that include frontline staff.

    These sessions should uncover both formal procedures and those informal “shadow processes” that exist in every organization but rarely appear in official documentation.

    Direct teams to focus particularly on processes that:

    • Are performed frequently (daily or weekly)
    • Follow clear patterns or rules
    • Involve significant data processing or analysis
    • Consume substantial employee time
    • Create bottlenecks in delivering customer value

    Do not include processes that are part of your core value as a business (just yet). Right now, we’re focussing only on processes that do not deliver core value and hence have low risk of automating them.

    Score Each Process

    Once that’s done, we score each process in terms of how “AI ready” it is. It doesn’t have to be complex. I usually create a 1-5 scale assessment across these dimensions:

    • Data structure (1 = unstructured/analog information, 5 = highly structured digital data)
    • Decision complexity (1 = requires nuanced human judgment, 5 = follows clear rules)
    • Pattern recognition (1 = each situation is unique, 5 = clear patterns exist)
    • Creativity required (1 = high creative input needed, 5 = minimal creativity needed)
    • Emotional intelligence (1 = high empathy required, 5 = minimal EQ needed)

    After scoring, create a quadrant analysis plotting each process on two axes:

    • X-axis: AI Readiness (combined score of above dimensions)
    • Y-axis: Potential business impact (importance, cost, time savings)

    This visualization makes your prioritization decisions much clearer. Based on the scoring results, categorize processes into implementation timelines:

    • Immediate candidates (Q1): High scores on both axes – your quick wins
    • Mid-term candidates (Q2-Q3): High on AI readiness, medium on business impact
    • Long-term vision (Year 2+): Medium scores or processes requiring significant redesign
    • Human-centric processes: Low scores that should remain primarily human-driven

    A sales agency I worked with had a very clear, structured, onboarding process that didn’t require much creativity. All they needed to do was gather requirements and turn that into a document for the delivery team.

    Unfortunately, it took two weeks on average to complete onboarding, with multiple calls and emails between the onboarding team and the client. It’s not a core process but had high business impact and scored well on AI readiness. A prime candidate for automation.

    Identify and Empower Your AI Champions

    The final component of your assessment identifies the people who will drive transformation from within the organization.

    Deploy a company-wide AI attitude survey with questions that reveal:

    • Current use of AI tools (both personal and professional)
    • Interest level in AI applications within their work
    • Concerns or reservations about AI implementation
    • Ideas for how AI could improve their specific functions
    • Desire to participate in AI initiatives

    A healthcare system I worked with was surprised to discover that some of their most enthusiastic AI advocates weren’t in IT or analytics, but in clinical departments where individuals had independently started exploring AI tools to solve daily challenges.

    Analyze the assessment and survey results to identify potential champions, then conduct one-on-one interviews with promising candidates. Look for individuals who demonstrate:

    • Practical AI knowledge or strong aptitude to learn
    • Respect among peers (influence without authority)
    • Ability to bridge technical and business perspectives
    • Track record of successful change management
    • Persistence through challenges

    When your assessment is complete, you’ll have three critical assets:

    1. A comprehensive map of your organization’s processes with clear AI potential scores
    2. A prioritized transformation roadmap with timelines
    3. A group of internal AI champions ready to drive change

    This assessment provides the foundation for all your subsequent transformation efforts. It ensures you’re targeting the right opportunities, with the right sequence, and with the right people involved.

    Step 4: Launch Strategic Pilot Projects

    With your assessment complete, it’s time to move from theory to practice by launching some pilot projects.

    Pick out 2-3 projects from the processes in the first quadrant in Step 2. If you have many contenders, prioritize projects with high visibility across your organization, or span multiple business functions. They should also have clear ROI potential.

    I’m not going to go into how to run and manage projects here but it is extremely important. While everything I’ve mentioned so far sounds like a lot of fun, execution is usually where most companies stumble. This is really the make-or-break step, and to set you up for success, here are a few pointers:

    Treat It As a Real Project

    This is not a side project. Most side projects fail or don’t result in anything long-term because they aren’t taken seriously.

    Have your AI Champions from Step 2 lead these projects, make it their primary KPI, and give them the team and resources they need.

    Set Aggressive Implementation Timelines

    To create momentum and prevent analysis paralysis, establish ambitious but achievable timelines for your initial projects.

    I’ve helped companies launch and deliver AI automations within 30 days. Remember, we’re not looking for perfection here. We’re piloting a new way of doing things, and it just needs to be better than the old way.

    Document Process Changes and Results Meticulously

    Successful pilots will make the case for further transformation. Establish clear baseline metrics for the processes you want to automate, and then measure the results.

    Document everything meticulously. These case studies become powerful tools for expanding your transformation. PS – you can use AI for this!

    Create a Consistent Communication Cadence

    Effective communication is often the difference between successful transformations and failed initiatives. Develop a systematic approach to sharing progress, learnings, and successes throughout your organization.

    Buy Vs Build

    At least for the first few pilot projects, it makes sense to buy existing software or AI tools instead of build it out. You can roll your own AI once you’ve seen value.

    One VC client wanted to automate their entire top of funnel deal flow. We could have developed an end-to-end AI automation but we decided instead to cobble together 3 different software. It’s not perfect but it improved investment throughput and we laid the foundation for more custom AI builds.

    Launch the Projects

    Don’t just build the pilot and leave it at that. Actually launch it and roll it out. See how it runs in the real world. Measure if it’s making a difference.

    Getting a few successful pilots off the ground and communicating those successes sets you up for deeper transformation down the line.

    If you’ve come this far, congratulations, you’re in the second level of the Pyramid of AI Adoption – Automation.

    Step 5: Redesign Your Core Business Processes

    Once your initial pilots demonstrate value, it’s time for deeper transformation. We’re getting to the third level on the pyramid.

    This is where the “re-founding” concept becomes most apparent. You’re not just improving existing processes, you’re reimagining how work gets done.

    Begin by identifying processes that form the backbone of your value creation. These are the processes I told you not to focus on in the previous step.

    For a software company, this might be your development workflow; for a financial institution, your risk assessment process; for a healthcare provider, your patient care pathways.

    Before redesigning this process, thoroughly document the current process to understand its complete flow, inefficiencies, and hidden dependencies. This mapping creates a baseline understanding that will inform your redesign.

    For each selected process:

    1. Conduct detailed observation sessions with the people who perform the work daily
    2. Document every step, including unofficial workarounds and exceptions
    3. Identify decision points and the information used to make those decisions
    4. Measure time, cost, and quality metrics at each stage
    5. Identify pain points, bottlenecks, and redundancies
    6. Map data flows and information handoffs between systems and people
    7. Document compliance and regulatory requirements

    With this baseline, you can pick it apart and redesign it. The key to true transformation is starting with a clean slate rather than incrementally improving existing processes.

    Conduct structured workshops where teams reimagine the process from first principles, considering AI capabilities as fundamental building blocks:

    1. Begin with the core purpose of the process and desired outcomes
    2. Challenge all assumptions about how work must be done
    3. Ask: “If we were building this process from scratch today, with all of AI’s capabilities available, how would we design it?”
    4. Identify which decisions could be automated, augmented, or should remain human-driven
    5. Examine how to eliminate handoffs and information re-entry
    6. Determine how humans and AI will collaborate within the redesigned process

    Once you’ve redesigned the process, you can start the implementation. Again, I won’t go into project management here but keep in mind the advice I gave previously.

    Since this is a redesign of your core process, you also want to start small. Pick one piece of the design to implement first, measure it, learn from it, and then move to the next piece. Like trying a new hairstyle, you don’t go from conservative cut to mohawk overnight.

    Remember that process redesign is fundamentally about rethinking how work creates value, not just making existing processes more efficient. The organizations that achieve the greatest transformation benefits are those willing to challenge fundamental assumptions about how work must be done.

    Step 6: Transform Your Product and Service Offerings

    With internal transformation underway, turn your attention to market-facing opportunities. This is where AI fundamentally changes your value proposition in the marketplace. Rather than simply improving existing offerings, this step reimagines what’s possible when AI becomes central to your products and services.

    Establish an AI Innovation Team

    Create a dedicated team for AI-driven product innovation. Many companies are now hiring Chief AI Officers, and AI Product Managers to research and build AI-first products and features.

    Create Rapid Prototyping Processes

    With tools like Cursor, Windsurf, and Lovable, it’s extremely easy to rapidly prototype new products (especially in software). This doesn’t mean they’ll be instantly integrated into your core products and services but you can launch them as side tools and measure the response.

    Build Customer Feedback Loops

    Collect usage metrics not just on the product but also the AI features. A software company I advised built systems tracking not only when customers used their AI writing assistant but which suggestions were accepted, modified, or rejected, creating a rich dataset for improvement.

    Update Your Pricing

    As you develop these offerings, rethink your pricing strategy. AI-enhanced products often create exponentially more value than traditional alternatives.

    AI Or Die

    The pace of AI advancement isn’t slowing down.

    The companies that thrive will be those that start transforming today. They’ll make mistakes and face challenges along the way, but they’ll develop the organizational capabilities needed to capitalize on each new AI breakthrough.

    Is it easy? About as easy as teaching a cat to swim. It requires courage, commitment, and fundamentally rethinking how your business operates. But the alternative, watching AI-native competitors eat your lunch while you still decide what to order, is far more painful.

    I’ve guided numerous organizations through this journey, and while each transformation is unique, the framework outlined here provides a proven path forward.

    If you’re ready to begin your company’s AI transformation but need expert guidance, I’d welcome a conversation about how I can help you navigate this complex but essential transition.

    Book A Free Consultation

    If you’re serious about your AI Transformation, I can help. View my Services here and book a free consultation.

  • A Deep Dive Into The OpenAI Agents SDK

    A Deep Dive Into The OpenAI Agents SDK

    I’ve been discussing the inevitable progression that LLM companies are taking toward agentic AI capabilities for some time now on my blog and social media.

    My Model Context Protocol series explored how Claude (and any AI product) can go from a mere chatbot to an AI agent capable of taking actions on your behalf.

    OpenAI has also been on this path since launching ChatGPT. They’ve been adding tools like web search, code interpreter, Operator, Deep Research, and so on, to build out ChatGPT’s agentic capabilities.

    This week, on March 11, 2025, they took the next step with the release of their Agents SDK, an open-source toolkit designed to make building sophisticated AI agents accessible to developers of all skill levels.

    PS – I also recommend reading my guide on How To Design AI Agents, where I talk through different architectures and components of effective AI agents.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    What is the Agents SDK?

    The OpenAI Agents SDK is a lightweight, Python-based framework for constructing multi-agent workflows. Evolved from their experimental “Swarm” project, this SDK provides a comprehensive solution for developers looking to create AI agents that can reason, use tools, and collaborate with other agents to accomplish complex tasks.

    The SDK addresses many of the practical challenges developers face when building AI agents. It standardizes patterns for agent communication, state management, and collaboration, reducing the complexity barrier for creating useful AI applications.

    What makes it valuable? Three core concepts:

    1. Agents that think AND act – Not just LLMs spitting out text, but AI assistants that can make decisions and execute functions
    2. Seamless teamwork through handoffs – Specialized agents working together, passing the baton when needed
    3. Safety through guardrails – Because nobody wants their AI going rogue after reading too many YouTube comments

    Architecture and Core Components

    The OpenAI Agents SDK is built around several key components that work together to create functional agent systems:

    Agents: The central entities that make decisions and take actions. The SDK supports various types of agents, from simple LLM-powered agents to complex multi-agent orchestrators.

    Runners: Components that manage the execution flow of agents, handling the orchestration of messages, events, and state management.

    Tools: Functions or capabilities that agents can use to perform specific actions, such as searching the web, executing code, or retrieving information from databases.

    Context: A system that allows you to pass data and dependencies throughout your agent run.

    Handoffs: The ability for one agent to transfer tasks to another agent, allowing for multi-agent systems.

    Tracing: A way to visualize and monitor your agents.

    Let’s look at each of the components in detail.

    Agents

    Agents are the central entities that make decisions and take actions in your AI system.

    You define an agent by providing a name, model, instructions, and tools:

    • Give them a name (“Customer Support Agent”)
    • Provide instructions (“Help users without saying ‘have you tried turning it off and on again?’ more than once per conversation”)
    • Choose their “brain” (from quick-and-simple to deep-thinking models)
    • Equip them with tools (the digital equivalent of giving someone access to the supply closet)
    Python
    from openai.agents import Agent
    
    researcher = Agent(
        name="Customer Support Agent",
        model="gpt-4o",
        instructions="Help users without saying 'have you tried turning it off and on again?",
        tools=[web_search, document_retrieval]
    )

    Model Selection and Settings

    Different models have different capabilities and costs. Choose the right model for your agent’s needs:

    Python
    from agents import Agent, ModelSettings
    
    # Fast, cost-effective agent for simple tasks
    quick_agent = Agent(
        name="Quick Responder",
        model="gpt-4o-mini",  # Fastest, most cost-effective
        instructions="Provide quick, helpful responses to simple questions.",
        model_settings=ModelSettings(
            temperature=0.3,  # Lower temperature for consistent responses
            max_tokens=150    # Limit response length for quick interactions
        )
    )
    
    # Powerful agent for complex reasoning
    reasoning_agent = Agent(
        name="Deep Thinker",
        model="o1-preview",  # Most capable for complex reasoning
        instructions="Provide thorough analysis and step-by-step reasoning for complex problems.",
        model_settings=ModelSettings(
            temperature=0.7,  # Moderate temperature for balanced creativity
            # Note: o1 models don't support max_tokens in the same way
        )
    )
    
    # Creative agent for content generation
    creative_agent = Agent(
        name="Creative Writer",
        model="gpt-4o",  # Good balance of capability and speed
        instructions="Write engaging, creative content with vivid descriptions.",
        model_settings=ModelSettings(
            temperature=0.9,  # Higher temperature for more creativity
            top_p=0.9,       # Use nucleus sampling for variety
            presence_penalty=0.1  # Encourage topic diversity
        )
    )

    Output Types and Structured Responses

    By default, agents produce plain text outputs. However, you can use the output_type parameter to ensure structured, validated outputs that your application can reliably process:

    Python
    # Define the structure you want the agent to return
    # Pydantic models provide automatic validation and clear documentation
    class WeatherReport(BaseModel):
        city: str
        temperature: int  # Temperature in Celsius
        conditions: str   # Weather conditions (sunny, cloudy, etc.)
        recommendation: str  # What the user should do based on the weather
    
    # Create an agent that returns structured data instead of free-form text
    weather_agent = Agent(
        name="Weather Reporter",
        instructions="""
        You are a weather reporting agent that provides structured weather information.
        
        When users ask about weather, use the get_weather tool and then format your response
        to include all required fields:
        - city: The city name (cleaned up and properly formatted)
        - temperature: The temperature as an integer
        - conditions: Brief weather description
        - recommendation: Practical advice based on the weather
        
        Be helpful and specific in your recommendations.
        """,
        output_type=WeatherReport,  # This ensures the response matches our structure
        tools=[get_weather]
    )

    Instructions

    A quick note on instructions here. In the above example you can see how I’m being very explicit with my instructions. You can go even deeper than that.

    Instructions are probably the biggest lever you can pull to influence the behaviour of your agent. This holds whether you’re using an SDK, or a tool like n8n, or coding an agent from scratch.

    I could write an entire post on how to create good instructions for agents, and maybe I will one day, but for now you can read my guide on designing agents for more information.

    Agent Loop

    When your agent runs, it enters the “agent loop”, a fancy way of saying it thinks, acts, and repeats until the job is done. The SDK handles the agent loop automatically, managing tool calling, result processing, and iteration:

    1. Agent gets input (like “I need help with my subscription”)
    2. Agent decides if they need more info or can respond directly
    3. If they need info, they use a tool and get results
    4. This continues until they reach a final answer

    It’s basically the digital version of how I approach cooking: assess situation, realize I need more information, google recipe, realize I’m missing ingredients, order takeout, problem solved.

    Python
    from openai.agents import Runner<br><br>runner = Runner()
    result = runner.run(researcher, "What are the latest developments in quantum computing?")
    print(result.final_output)

    Tools: Extending Your Agent’s Capabilities

    Without tools, agents would just be fancy chatbots. Tools are what let your AI reach out into the world and actually do stuff.

    Creating a tool is as simple as decorating a Python function:

    Python
    from agents.tool import function_tool
    
    @function_tool
    def search_knowledge_base(query: str) -> str:
        # Your code to search a database
        return "Here's what I found about " + query

    There are two main types:

    • Hosted tools: Pre-built capabilities like web search (the tools already in your shed)
    • Function tools: Turn ANY Python function into an agent tool (like going to Home Depot and buying whatever you need)

    The beauty is in how naturally the agent decides when to use these tools – it’s not pre-programmed, but rather a decision the LLM makes based on the task at hand.

    Function Tools

    The most straightforward way to create tools is by decorating Python functions. The SDK handles all the complex work of schema generation, parameter validation, and integration:

    Python
    from agents import function_tool, Agent
    import requests
    from typing import Dict, Any
    
    # Example 1: External API Integration
    # This shows how to wrap external services as agent tools
    @function_tool
    def search_wikipedia(query: str) -> str:
        """Search Wikipedia for information about a topic.
        
        The agent will see this docstring and understand what this tool does.
        Clear descriptions help the agent choose the right tool for the task.
        
        Args:
            query: The search term to look up on Wikipedia.
            
        Returns:
            str: A summary of the Wikipedia article.
        """
        try:
            # Clean up the query for URL usage
            clean_query = query.replace(" ", "_")
            
            # Call Wikipedia's REST API - this is a real API call
            response = requests.get(
                f"https://en.wikipedia.org/api/rest_v1/page/summary/{clean_query}",
                headers={'User-Agent': 'OpenAI-Agents-Demo/1.0'}
            )
            
            if response.status_code == 200:
                data = response.json()
                # Format the response in a user-friendly way
                return f"**{data['title']}**: {data['extract']}"
            elif response.status_code == 404:
                return f"Sorry, I couldn't find a Wikipedia article for '{query}'. Try rephrasing your search."
            else:
                return f"Wikipedia search temporarily unavailable (status: {response.status_code})"
        
        except Exception as e:
            # Always handle errors gracefully in tools
            return f"Error searching Wikipedia: {str(e)}"
    
    # Example 2: Complex Calculations with Structured Output
    # This demonstrates how to return structured data from tools
    @function_tool
    def calculate_compound_interest(
        principal: float, 
        rate: float, 
        time: int, 
        compound_frequency: int = 12
    ) -> Dict[str, Any]:
        """Calculate compound interest with detailed breakdown.
        
        This tool shows how to handle multiple parameters and return structured data
        that the agent can interpret and present clearly to users.
        
        Args:
            principal: Initial amount invested (in dollars).
            rate: Annual interest rate as a percentage (e.g., 5.5 for 5.5%).
            time: Number of years to calculate.
            compound_frequency: How often interest compounds per year (default: 12 for monthly).
            
        Returns:
            dict: Calculation results including final amount and total interest.
        """
        # Input validation - important for tools that do calculations
        if principal <= 0:
            return {"error": "Principal amount must be positive"}
        if rate < 0:
            return {"error": "Interest rate cannot be negative"}
        if time <= 0:
            return {"error": "Time period must be positive"}
        if compound_frequency <= 0:
            return {"error": "Compound frequency must be positive"}
        
        # Convert percentage to decimal
        decimal_rate = rate / 100
        
        # Apply the compound interest formula: A = P(1 + r/n)^(nt)
        final_amount = principal * (1 + decimal_rate/compound_frequency) ** (compound_frequency * time)
        total_interest = final_amount - principal
        
        # Return structured data that the agent can easily work with
        return {
            "status": "success",
            "calculation": {
                "principal": principal,
                "final_amount": round(final_amount, 2),
                "total_interest": round(total_interest, 2),
                "rate_percent": rate,
                "years": time,
                "compound_frequency": compound_frequency
            },
            "summary": f"${principal:,.2f} grows to ${final_amount:,.2f} over {time} years at {rate}% annual interest"
        }

    Built-in Tools

    The SDK provides several hosted tools that run on OpenAI’s servers, offering powerful capabilities without requiring external API management:

    • WebSearchTool: Gives your agent the ability to search the web
    • FileSearchTool: Search for information in your database
    • CodeInterpreterTool: Lets your agent execute code
    • ImageGenerationTool: for generating images on the fly

    You just need to import them and use them like any tools:

    Python
    from agents import Agent, WebSearchTool, FileSearchTool, CodeInterpreterTool

    Agents as Tools

    You can use agents as tools for other agents, creating powerful hierarchical systems where specialized agents handle specific domains:

    Python
    # Create specialized agents that excel at specific tasks
    # Each agent has focused expertise and tailored instructions
    
    # Math specialist - optimized for numerical calculations and explanations
    math_agent = Agent(
        name="Math Specialist",
        instructions="""
        You are an expert mathematician who excels at:
        - Solving complex mathematical problems step by step
        - Explaining mathematical concepts clearly
        - Performing accurate calculations
        - Showing your work and reasoning
        
        Always break down complex problems into understandable steps.
        When doing calculations, be precise and show intermediate results.
        """,
        tools=[calculate_compound_interest]  # From our earlier example
    )
    
    # Research specialist - optimized for information gathering
    research_agent = Agent(
        name="Research Specialist", 
        instructions="""
        You are an expert researcher who excels at:
        - Finding authoritative and current information
        - Synthesizing information from multiple sources
        - Providing well-sourced and balanced perspectives
        - Distinguishing between facts and opinions
        
        Always cite your sources and explain the credibility of information.
        When possible, find multiple sources to verify important claims.
        """,
        tools=[search_wikipedia, WebSearchTool()]
    )
    
    # Create an orchestrator agent that coordinates the specialists
    # This agent decides which specialist to use based on the user's needs
    orchestrator = Agent(
        name="Task Orchestrator",
        instructions="""
        You are a smart coordinator that manages a team of specialist agents.
        Your job is to understand user requests and delegate to the right specialist.
        
        Use the math specialist for:
        - Numerical calculations and mathematical problems
        - Financial analysis and projections
        - Statistical analysis
        - Any task involving numbers or formulas
        
        Use the research specialist for:
        - Finding information about topics, people, or events
        - Gathering current news or developments
        - Researching background information
        - Fact-checking and verification
        
        For complex tasks, you can use both specialists:
        1. First gather information with the research specialist
        2. Then perform calculations with the math specialist
        3. Synthesize the results into a comprehensive answer
        
        Always explain which specialist you're consulting and why.
        """,
        tools=[
            # Convert agents into tools using the as_tool() method
            # This allows the orchestrator to call them like any other tool
            math_agent.as_tool(
                tool_name="get_math_help",
                tool_description="Get expert help with mathematical calculations, formulas, and numerical analysis"
            ),
            research_agent.as_tool(
                tool_name="get_research_help", 
                tool_description="Get expert help researching topics, finding information, and gathering current data"
            )
        ]
    )

    Best Practices for Tool Design

    Creating effective tools is crucial for agent performance. Here are key best practices:

    1. Function Naming and Signature

    • Verb-Noun Names: Use descriptive names that clearly indicate action (e.g., fetch_stock_price is better than get_stock or simply stocks)
    • Parameter Naming: Use clear, self-documenting parameter names (city is better than c)
    • Type Consistency: Ensure parameters have consistent types throughout your application
    • Avoid Defaults: Let the LLM decide all parameter values based on context

    2. Documentation and Clarity

    • Rich Docstrings: Include comprehensive documentation explaining the tool’s purpose, parameters, return values, and usage guidelines
    • Usage Examples: Consider including examples in the docstring for complex tools
    • Clear Return Format: Document the structure of returned data

    3. Performance Considerations

    • Timeout Handling: Implement timeouts for external API calls
    • Caching: Consider caching results for frequently used, unchanging data
    • Async Operations: Use async functions for time-consuming operations when possible

    Context: Keeping State Between Steps

    Context is the foundation that transforms stateless agent interactions into intelligent, stateful conversations. This enables agents to access user information, maintain state, and share data between tools and agent interactions.

    The SDK lets you create a context object using a dataclass. In this example, we’ll create a context class that holds all user session information and passes it to all the agents and their tools:

    Python
    from dataclasses import dataclass
    from agents import Agent, RunContextWrapper, function_tool
    import time
    
    # Define a context class that holds all the user session information
    # This will be passed to every agent and tool in your system
    @dataclass
    class UserSession:
        user_id: str
        name: str
        preferences: dict
        conversation_history: list
        session_start_time: float

    Now let’s see how our tools can access and modify the context. Tools can access user data to provide personalized responses and remember user preferences:

    Python
    @function_tool
    def get_user_preferences(ctx: RunContextWrapper[UserSession]) -> str:
        """Get the current user's preferences.
        
        The ctx parameter gives us access to the user session data.
        This allows tools to personalize their behavior based on user settings.
        """
        prefs = ctx.context.preferences
        if not prefs:
            return "No preferences set yet. You can update them anytime!"
        
        # Format preferences in a user-friendly way
        formatted_prefs = []
        for category, value in prefs.items():
            formatted_prefs.append(f"{category}: {value}")
        
        return f"Your current preferences: {', '.join(formatted_prefs)}"
    
    @function_tool
    def update_preference(ctx: RunContextWrapper[UserSession], category: str, value: str) -> str:
        """Update a user preference.
        
        This tool modifies the context data, which persists throughout the session.
        Changes made here will be available to all subsequent tool calls and agent interactions.
        """
        # Update the preferences in the context
        ctx.context.preferences[category] = value
        
        # Add to conversation history for tracking
        ctx.context.conversation_history.append({
            "action": "preference_update",
            "category": category,
            "value": value,
            "timestamp": time.time()
        })
        
        return f"Updated your {category} preference to '{value}'. This will apply to all future interactions!"
    
    @function_tool
    def get_conversation_summary(ctx: RunContextWrapper[UserSession]) -> str:
        """Provide a summary of the user's session activity.
        
        This demonstrates how context can be used to track state across multiple interactions.
        """
        session = ctx.context
        session_duration = time.time() - session.session_start_time
        
        summary = f"""
        Session Summary for {session.name}:
        - Session duration: {session_duration/60:.1f} minutes
        - Conversation events: {len(session.conversation_history)}
        - Current preferences: {len(session.preferences)} set
        """
        
        # Show recent activity if available
        if session.conversation_history:
            recent_activity = session.conversation_history[-3:]  # Last 3 events
            summary += "\n\nRecent activity:"
            for event in recent_activity:
                if event.get("action") == "preference_update":
                    summary += f"\n- Set {event['category']} to {event['value']}"
        
        return summary

    We can also create an agent that uses context-aware tools. Note the type annotation [UserSession]. This tells the agent what context type to expect and prevents errors.

    Python
    personalized_agent = Agent[UserSession](
        name="Personal Assistant",
        instructions="""
        You are a personal assistant that provides personalized service based on user preferences.
        
        You can:
        - Check and update user preferences using the available tools
        - Provide personalized recommendations based on their settings
        - Track conversation history and provide session summaries
        
        Key behaviors:
        - Always use the user's name when you know it
        - Reference their preferences when giving advice
        - Offer to update preferences when you notice user needs
        - Be helpful and remember context from earlier in the conversation
        """,
        tools=[get_user_preferences, update_preference, get_conversation_summary]
    )

    All tools and agents in a run share the same context, allowing them to coordinate and build on each other’s actions. Changes made to the context in one interaction persist to the next, enabling stateful conversations. You can store complex data structures (lists, dictionaries, custom objects) and modify them throughout the session.

    Tracing

    The built-in tracing system captures every step of the agent’s thinking and actions:

    • What the agent was thinking
    • Which tools it called and why
    • What inputs it used
    • What outputs it received

    This means when something goes wrong (and we all know something always goes wrong), you can actually figure out why.

    Python
    from agents import Agent, Runner, trace
    
    # Tracing is enabled by default, but you can customize it
    async def traced_research_session():
        with trace(workflow_name="Research Session", group_id="session_123") as trace_context:
            # First research query
            result1 = await Runner.run(
                agent, 
                "Research the history of artificial intelligence",
                run_config=RunConfig(
                    workflow_name="AI History Research",
                    trace_metadata={"user_id": "user_456", "session_type": "research"}
                )
            )
            
            # Follow-up query in the same trace
            result2 = await Runner.run(
                agent,
                f"Based on this research: {result1.final_output}, what are the key milestones?"
            )
            
        # Traces are automatically sent to OpenAI dashboard
        print("Research complete. Check traces at https://platform.openai.com/traces")
    
    # You can also disable tracing or customize processors
    from agents import set_tracing_disabled, add_trace_processor
    
    # Disable tracing entirely
    set_tracing_disabled(True)
    
    # Or add custom trace processors for external tools
    # add_trace_processor(your_custom_processor)

    Multi-Agent Collaboration and Handoffs

    One of the most powerful features is the ability to create handoffs between specialized agents and let them collaborate:

    • Simple task? Use the fast, lightweight model
    • Billing questions? Send them to the “Money Person” agent
    • Technical problems? That’s for the “Have You Tried Restarting It?” agent
    • Complex reasoning needed? Bring in the heavyweight model
    Python
    support_agent = Agent(name="support", instructions="You handle customer questions.")
    technical_agent = Agent(name="technical", instructions="You solve technical issues.")
    billing_agent = Agent(name="billing", instructions="You handle billing inquiries.")
    
    triage_agent = Agent(
        name="triage",
        instructions="Route customer inquiries to the appropriate specialized agent.",
        handoffs=[support_agent, technical_agent, billing_agent]
    )

    This creates a workflow where agents can delegate subtasks, forming a collaborative system greater than the sum of its parts.

    How handoffs work behind the scenes:

    1. Automatic Tool Creation: When you specify handoffs, the SDK automatically creates tools that represent transfers to each specialist agent.
    2. Intelligent Routing: The triage agent analyzes the customer’s request and decides which specialist (if any) should handle it.
    3. Context Preservation: When a handoff occurs, the conversation context is passed to the specialist agent, so they understand the full situation.
    4. Seamless Experience: From the customer’s perspective, this feels like talking to one intelligent system that connects them to the right expert.

    The power of handoffs lies in combining the broad understanding of a generalist agent with the deep expertise of specialists, creating a system that’s both intelligent and knowledgeable.

    Safety Guardrails

    Guardrails are the bouncers of your application, validating inputs before they reach your main agent. Want to prevent users from asking for the recipe to digital disaster? A guardrail can check inputs with a fast model first, saving your premium model for legitimate requests.

    Developers can implement safety measures that run in parallel with agent execution:

    Python
    from agents.guardrails import CustomGuardrail
    
    async def is_not_swearing(msgs, context) -> bool:
        content = " ".join(m["content"] for m in msgs if "content" in m)
        return "badword" not in content.lower()
    
    my_guardrail = CustomGuardrail(
        guardrail_function=is_not_swearing,
        tripwire_config=lambda output: not output  # if 'False', raise error
    )
    
    agent = Agent(
        name="my_agent",
        input_guardrails=[my_guardrail]
    )

    In the above example, the guardrail is triggered before invoking the LLM. Key benefits of input guardrails:

    • Layered Defense: Works alongside other safety measures (like output filtering) to create comprehensive protection.
    • Resource Protection: Inappropriate requests are blocked before consuming expensive LLM resources or processing time.
    • Brand Safety: Prevents your AI from engaging with harmful or inappropriate content that could damage your brand.
    • User Experience: Provides clear, helpful feedback when requests are blocked, rather than confusing or hostile responses.
    • Monitoring: All guardrail decisions are logged, providing valuable data for improving safety measures and understanding usage patterns.

    Hands-On Example: Building a Multi-Agent Research System

    To demonstrate the power and flexibility of OpenAI’s Agents SDK, I’ve created a practical example that showcases how multiple specialized agents can collaborate to accomplish complex tasks. This Research Agent System represents the kind of real-world application that the SDK enables developers to build quickly and efficiently.

    The Research Agent System Architecture

    This system consists of four specialized agents that work together to produce comprehensive research content:

    1. Triage Agent: Coordinates the overall research process, delegating tasks to specialized agents
    2. Researcher Agent: Gathers information from various sources on a given topic
    3. Fact Checker Agent: Verifies statements for accuracy and proper sourcing
    4. Writer Agent: Synthesizes verified research into coherent, well-structured content

    Each agent is designed with specific instructions, tools, and capabilities that allow it to excel at its particular role. The system demonstrates several key features of the OpenAI Agents SDK:

    • Handoffs: Agents delegate tasks to more specialized agents
    • Context sharing: All agents work with a shared research context
    • Guardrails: Ensures content remains fact-based and properly sourced
    • Structured outputs: Final content follows a consistent, well-organized format
    • Function tools: Agents leverage specialized tools for searching, verifying, and saving content

    The Code

    Each agent as described above is going to do a certain task and then give us the result of the task in an output. We want to ensure that the output is structured in a certain manner so that when they hand it off to the next agent, that agent can take it in that structure and then do more work on it.

    Python
    class ResearchFinding(BaseModel):
        """A single research finding with source information."""
        statement: str
        source: str
        confidence: float  # 0.0 to 1.0
    
    
    class VerifiedResearch(BaseModel):
        """Collection of verified research findings."""
        findings: List[ResearchFinding]
        verified: bool
        notes: Optional[str] = None
    
    
    class FinalContent(BaseModel):
        """Final output content with structured sections."""
        title: str
        introduction: str
        key_points: List[str]
        body: str
        conclusion: str
        sources: List[str]

    We also want to give each agent some tools to do their work. The Research Agent, for example, will need a tool to search the internet as well as save the retrieved content into a file. The fact-checker agent would need a tool to verify that content, and so on.

    I am not going to write all the tools here, but here’s what the web search tool might look like, using the Exa Search API.

    Python
    @function_tool
    async def search_web(context: AgentContextWrapper[ResearchContext], query: str) -> str:
        """
        Search the web for information about a topic using the Exa Search API.
        
        Args:
            query: The search query text
            
        Returns:
            Search results as formatted text with citations
        """
        topic = context.agent_context.topic
        # Combine the specific query with the general topic for better results
        full_query = f"{query} about {topic}"
        
        try:
            # Make a request to the Exa Search API
            async with aiohttp.ClientSession() as session:
                async with session.post(
                    "https://api.exa.ai/search",
                    headers={
                        "Content-Type": "application/json",
                        "x-api-key": "YOUR_EXA_API_KEY"  # Replace with your actual API key
                    },
                    json={
                        "query": full_query,
                        "numResults": 5,
                        "useAutoprompt": True,
                        "type": "keyword"
                    }
                ) as response:
                    if response.status != 200:
                        error_text = await response.text()
                        return f"Error searching: {response.status} - {error_text}"
                    
                    search_results = await response.json()
        
            # Process the results
            formatted_results = f"Search results for '{query}' about {topic}:\n\n"
            
            if not search_results.get("results"):
                return f"No results found for '{query}' about {topic}."
            
            # Format each result with its title, content, and URL
            for i, result in enumerate(search_results.get("results", []), 1):
                title = result.get("title", "No title")
                url = result.get("url", "No URL")
                content = result.get("text", "").strip()
                
                # Limit content length for readability
                if len(content) > 500:
                    content = content[:500] + "..."
                
                formatted_results += f"{i}. **{title}**\n"
                formatted_results += f"   {content}\n"
                formatted_results += f"   Source: {url}\n\n"
            
            # Add a summary if available
            if search_results.get("autopromptString"):
                formatted_results += f"Summary: {search_results.get('autopromptString')}\n\n"
                
            return formatted_results
        
        except Exception as e:
            # Provide a useful error message
            error_message = f"Error while searching for '{query}': {str(e)}"
            
            # Add fallback information if the search fails
            fallback_info = f"\n\nFallback information about {topic}:\n" + \
                            f"1. {topic} has been studied in recent publications.\n" + \
                            f"2. Current research suggests growing interest in {topic}.\n" + \
                            f"3. Common challenges in {topic} include implementation complexity and adoption barriers."
            
            return error_message + fallback_info

    You’ll notice this tool uses the ResearchContext context to share data across other tools. Let’s define that as well:

    Python
    class ResearchContext:
        def __init__(self, topic: str):
            self.topic = topic
            self.findings = []
            self.verified_findings = []
            self.draft_content = ""
            self.history = []
            
        def add_finding(self, finding: ResearchFinding):
            self.findings.append(finding)
            self.history.append(f"Added finding: {finding.statement}")
            
        def add_verified_findings(self, verified: VerifiedResearch):
            self.verified_findings.extend(verified.findings)
            self.history.append(f"Added {len(verified.findings)} verified findings")
            
        def set_draft(self, draft: str):
            self.draft_content = draft
            self.history.append("Updated draft content")

    You may also want to add some guardrails, for example checking if the research content is unbiased. A very simple hard-coded example might be to count the number of times an opinion is expressed vs a fact, like so:

    Python
    async def is_fact_based(msgs, context) -> bool:
        """Check if messages appear to be fact-based and not opinion-heavy."""
        content = " ".join(m.get("content", "") for m in msgs if isinstance(m, dict))
        opinion_phrases = ["I believe", "I think", "in my opinion", "probably", "might be", "could be"]
        
        # Count opinion phrases
        opinion_count = sum(content.lower().count(phrase) for phrase in opinion_phrases)
        
        # Allow some opinion phrases, but not too many
        return opinion_count < 3
    
    
    fact_based_guardrail = CustomGuardrail(
        guardrail_function=is_fact_based,
        tripwire_config=lambda output: not output,
        error_message="Output contains too many opinion statements rather than fact-based research."
    )

    You can create something more powerful but this simple example highlights how the SDK checks against your guardrails.

    Finally, we’ll create our Agents and give them the tools, context, and guardrails. Here’s what the Fact Checker Agent might look like:

    Python
    fact_checker_agent = Agent(
        name="fact_checker_agent",
        model="gpt-4o",
        instructions="""You are a meticulous fact-checking agent. Your job is to:
        1. Review the research findings in the shared context
        2. Verify each statement using the verify_statement tool
        3. Consolidate verified findings using save_verified_research
        4. Be skeptical and thorough - only approve statements with sufficient evidence
        
        For each finding, check if the source is credible and if the statement contains verifiable
        facts rather than opinions or generalizations.
        """,
        context_type=ResearchContext,
        tools=[verify_statement, save_verified_research],
        output_type=str,
        output_guardrails=[fact_based_guardrail],
        description="Verifies research findings for accuracy and proper sourcing"
    )

    Our Triage Agent which manages the whole process would also have handoffs defined in its parameters:

    Python
    triage_agent = Agent(
        name="triage_agent",
        model="gpt-3.5-turbo",
        instructions="""You are a research coordinator who manages the research process.
        For any research query:
        1. First, hand off to the researcher_agent to gather information
        2. Then, hand off to the fact_checker_agent to verify the findings
        3. Finally, hand off to the writer_agent to create the final content
        
        Monitor the process and ensure each specialized agent completes their task.
        """,
        context_type=ResearchContext,
        handoffs=[
            handoff(researcher_agent),
            handoff(fact_checker_agent),
            handoff(writer_agent)
        ],
        output_type=FinalContent,
        description="Coordinates the research process across specialized agents"
    )

    And finally, we write the main function to run the whole process:

    Python
    async def run_research_system(topic: str) -> FinalContent:
        """Run the multi-agent research system on a given topic."""
        # Create the shared context
        context = ResearchContext(topic=topic)
        
        # Configure the run with tracing enabled
        config = AgentRunConfig(
            run_name=f"research_{topic.replace(' ', '_')}",
            tracing_disabled=False
        )
        
        # Run the triage agent with the initial query
        result = await AgentRunner.run(
            triage_agent,
            [f"Research the following topic thoroughly: {topic}"],
            context=context,
            run_config=config
        )
        
        return result.agent_output

    Try It Yourself

    If you’re eager to explore the Agents SDK yourself, the process is straightforward:

    1. Install the SDK via pip: pip install openai-agents
    2. Check out the official documentation
    3. Explore the GitHub repository for examples and contributions

    The documentation is comprehensive and includes numerous examples to help you understand the SDK’s capabilities and implementation patterns.

    Your Next Steps

    As we venture further into the age of agentic AI, tools like the Agents SDK will become increasingly valuable. Whether you’re looking to automate complex workflows, create specialized assistants, or explore the frontiers of AI capability, this toolkit provides an excellent foundation.

    I encourage you to dive in and experiment with the Agents SDK for your projects. If you’re working on something interesting or need guidance on implementation, don’t hesitate to reach out. I’m particularly interested in hearing about novel applications and creative uses of multi-agent systems.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

  • The Ultimate Guide to Model Context Protocol, Part 3: Tying It All Together

    The Ultimate Guide to Model Context Protocol, Part 3: Tying It All Together

    The MCP Series

    This is the third and final part of the Model Context Protocol Series. Part 1 and Part 2 can be found here.

    In our previous posts we looked at what exactly MCP is and how it works behind the scene. Now it’s time to tie it all together and build some useful stuff with MCP.

    Before we begin, I’m going to let you in on a little secret. I used Claude with MCP to help me write this series. Shocker!

    Claude has always been my favourite for creating content. You can read more about how all the different AI chatbots and tools fit in my stack. Before MCP, I’d have to do a bunch of research on my own, then feed Claude all that context before working with it to craft a post.

    That’s a lot of work so I decided to build my own AI agent to automate the research work too. I used Exa to find resources to include and sent that to Claude via the API. But the agent was a bit limited too, and I missed the chat experience in the Claude app, man and machine working together in harmony to write a blog post.

    Now, with MCP, I can do exactly that. This is the first of many examples of building AI workflows with MCP I’m going to be sharing with you today. Let’s dive in.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    1. From Chat to Blog Post

    I’m not a fan of telling the AI to write a blog post and then directly publishing it without any human oversight or input. That’s just AI slop.

    Plus, I write because I want my thoughts and opinions to be read and critiqued by others. I get no satisfaction and will not learn if an AI does everything for me.

    So I use AI (Claude specifically) as my writing partner. It’s not a linear process. I start with some thesis or topic I want to explore (like this MCP series) and use Claude to help me create an outline.

    Once I’m satisfied with the outline, I do some research to back up what I want to say. Exa has an official MCP server that I’ve installed, allowing Claude to search the web and pull interesting content.

    I use Exa for a number of reasons that I’ve listed in my Guide to Exa, the primary one being it tends to pull in much deeper and richer content, including research papers, which is great for writing technical content.

    I also use the Brave MCP and a custom Twitter MCP I built for myself. Brave gets me more recent news type results and Twitter pulls in trending conversations about said topic.

    After Claude has pulled these results, I read through them and may tweak the outline again based on the new information.

    At this point, I’m ready to write the actual article. I have a thesis or topic, a structure to present said topic, and real data or resources to support me.

    The process of writing is an even longer back and forth. I ask Claude to flesh out the first chapter, then I make edits, and we go back and forth until I’m satisfied.

    I’ll often ask Claude to poke holes in it or provide counter-arguments. I’ll also try to include the research or even ask Claude to pull in more specific research for this chapter.

    I repeat this for every chapter until the whole article is written. At this point, we have a fully written out articles, with a clean structure and headers, and links to supporting data or resources.

    Finally, with the WP MCP server (I built one for myself), I ask Claude to post it. It first asks for my permission, and then in pushes the content to my WordPress site (this blog you’re reading) in draft status. I then log in to WP, make any final tweaks and edits, and hit publish.

    And that, dear reader, is how I wrote these very words you’re reading right now!

    For a full video overview, watch this:

    2. From Chat to GitHub Repo

    Quite similar to the way you can chat with Claude to create WordPress blog posts, you can also create GitHub repositories using the GitHub MCP.

    Me: What changed in our authentication system between the last release and now?

    Claude: [Using Git MCP] I see 7 commits affecting authentication since v2.1.0. The most significant changes include:
    1. Migration from JWT to OAuth2 for API authentication
    2. Implementation of refresh token rotation
    3. Addition of 2FA for admin accounts

    The primary developer was @jsmith, and all changes passed CI. Would you like me to explain any specific change in detail?

    This isn’t just a demo—it’s how developers are working right now. As shown in “This Claude MCP GitHub AI AGENT Changes EVERYTHING!“, MCP is revolutionizing how developers interact with their repositories.

    3. End to End Sales Agent

    This is a pretty cool workflow built by Robert Lukoszko where he has an MCP server that connects to his LinkedIn. For anyone who comments under a post, it’s able to extract the profile of that person and go and do some research on that person to find out more about them as well as their email. He also has an MCP server connected to his email so he can then have an email sent to the people who commented on that post.

    Essentially a very powerful sales agent that automatically goes through the comments on any LinkedIn post and then reaches out to them via email.

    https://twitter.com/Karmedge/status/1899825488379392333

    4. Customer Support Automation

    Karan Vaidya, from Composio, shared a couple of really cool customer support automations that he built out with Cursor and MCPs.

    The first one fetches customer support emails, analyzes the issue, solves the issue, and then responds to the email with the correct solution.

    If it can’t correct the solution, it uses Git Blame to find the responsible dev, creates a ticket on Linear, and then notifies the developer on Slack about the issue.

    5. Other Creative Applications

    Beyond the expected use cases, I’ve been amazed by the creative ways people are applying MCP technology.

    Personal Knowledge Management

    David Teren’s Claude Server MCP transforms knowledge management with persistent context across conversation sessions:

    Me: What connections exist between my project notes on renewable energy and my reading notes on urban planning?
    
    Claude: [Using Claude Server MCP] I found 7 conceptual links between your renewable energy and urban planning notes. The strongest connections involve microgrids, community solar projects, and transit-oriented development. You explored similar funding mechanisms in both areas but haven't explicitly connected them in your notes. Would you like me to create a new note mapping these relationships?

    Advanced Document Research

    Alexander Komyagin built an Agentic RAG solution using MCP to fundamentally change how he interacts with documents (read his article here):

    Me: What do our financial projections say about Q3 revenue growth compared to analyst expectations?
    
    Claude: [Using Agentic RAG MCP] Let me examine your documents...
    
    After analyzing your Q3 projections and comparing with analyst reports from Bloomberg and Reuters, I've found a 5.7% discrepancy. Your internal projections show 12.3% growth while consensus estimates are 18%. The primary factors driving this difference appear to be more conservative supply chain recovery assumptions in your models.

    Instead of the traditional static retrieval systems, his MCP implementation lets Claude autonomously decide its search strategy and combine information from multiple document sources—just like a skilled researcher would.

    Automated 3D Designs

    Siddharth Ahuja connected the Blender MCP server to Claude to create 3D designs and scenes with just a few prompts.

    https://twitter.com/sidahuj/status/1899460492999184534

    Learning Helper

    Giacomo connected Claude to Rember, a tool to create flashcards, to help him automatically create flashcards to remember important concepts while chatting with Claude.

    What Will You Build?

    What makes MCP revolutionary isn’t just the technology itself, it’s the connections it enables. Right now, only Claude, Cursor and a handful of other AI companies have MCP clients that enable you to extend their functionality by adding MCP servers.

    But imagine what happens as more companies build out MCP clients. The ability to seamlessly integrate AI into your existing workflows, tools, and data sources creates a multiplicative effect that transforms productivity.

    As you explore MCP for yourself, start with a simple question: “What tasks take up most of my time but don’t require my unique human judgment?” Those are perfect candidates for automation.

    The real power comes when you combine multiple MCP servers. Need to analyze data, create visualizations, and publish findings to your blog? Chain together database, visualization, and WordPress MCPs for a workflow that would have been science fiction just months ago.

    And if you need help building these out, drop your email below and I’ll reach out.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • The Ultimate Guide to Model Context Protocol, Part 2: Behind The Scenes of MCP

    The Ultimate Guide to Model Context Protocol, Part 2: Behind The Scenes of MCP

    The MCP Series

    This post is part 2 of my “Ultimate Guide to Model Context” series. You can read part 1 here and part 3 here.

    In our previous post, we introduced the Model Context Protocol (MCP) and how it transforms our dear Claude from a knowledgeable yet impotent AI into a helpful digital butler who can actually interact with your files, apps, and services. Now it’s time to draw back the curtain and have a gander at the magic behind it.

    Don’t worry—we’ll keep things simple and jargon-free, dishing out plenty of analogies to explain the technical concepts like my Uncle dishes out expletives when India cocks up a cricket match. By the end of this post, you’ll understand what makes MCP tick and how you can start exploring different MCP servers for your specific needs.

    How MCP Works

    Remember our analogy of MCP as a universal translator between AI and your digital world? Let’s expand on that to understand what’s happening behind the scenes.

    The MCP Architecture Explained

    At its core, MCP follows what tech folks call a “client-server architecture.” This is exactly how computers work with the internet. The browser on your computer is the “client”. It retrieves and displays information from a “server” over the internet via a protocol called HTTP.

    The Model Context Protocol is similar. Let’s say you’re enjoying a cold Sunday evening by the fire in the study of your manor, as one does. It’s a high-tech study with a built-in AI assistant. You ask the assistant to have some hot toddy sent over:

    1. The Host (where you chat with AI)

    • This is an application like Claude Desktop or Cursor where you interact with an AI assistant.
    • In our analogy, this is the study of your manor.

    2. The MCP Client (the translator)

    • This is built into the host application. So the engineers at Claude and Cursor need to build this first for the whole thing to work.
    • It translates between what the AI understands and what MCP servers speak.
    • You never interact with this directly, it works behind the scenes like HTTP.
    • In our analogy, it’s an upgrade module for your study that allows your AI assistant to communicate with other parts of your manor, such as the bar.

    3. MCP Servers (specialized helpers)

    • Each server is like a specialist with access to specific resources.
    • One server might know how to work with files, another with Slack, and so on.
    • Servers can be on your computer or connect to online services.
    • In our analogy, the bartender who makes the hot toddy and brings it over to you is the server.

    4. Tools (actions your AI takes via servers)

    • These are the functions available to the AI on the server.
    • A document server may have a read_file action that the AI can invoke to read a specific file.
    • In our analogy, the tool is the ability to prepare libation.

    5. Resources (your digital stuff)

    • The actual files, apps, and services the AI needs to access
    • Could be local (on your computer) or remote (on the internet)
    • In our analogy, these are the ingredients that go into making the hot toddy. I prefer a spot of Cognac myself.

    If you enjoyed this analogy, I have more for you. Be a dear and sign up to my newsletter for more.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    A Day in the Life of an MCP Request

    Ok enough with the analogies. To really understand how this works, let’s follow what happens when you ask your AI assistant to summarize a document and send it to Slack:

    1. You make a request to Claude: “Be a good sport and summarize the quarterly_report.pdf on my desktop. Oh and while you’re at it, post the key points to the #team-updates Slack channel”
    2. Claude recognizes this requires access to both files and Slack, so it needs to use MCP
    3. The MCP Client activates and connects to two different MCP servers:
      • The File System MCP Server (to access the PDF)
      • The Slack MCP Server (to post the message)
    4. Permissions are checked:
      • The File System server asks: “Allow Claude to read quarterly_report.pdf?”
      • The Slack server asks: “Allow Claude to post to #team-updates?”
      • You approve both requests
    5. The File System server retrieves the PDF content and sends it back through MCP
    6. Claude processes the document and creates a summary
    7. The Slack server takes Claude’s summary and posts it to your team channel
    8. You receive confirmation that the task is complete

    All of this happens in seconds, with the complex technical work hidden from view. The beauty of MCP is that it handles all the complicated connections while maintaining security and giving you control.

    The Technology That Powers MCP

    Now that you understand the basic flow, let’s demystify some of the technology that makes MCP possible:

    The Protocol Itself

    The Model Context Protocol is what tech people call an “open standard.” This means:

    • It’s publicly documented so anyone can build with it
    • It follows consistent rules for communication
    • It’s designed to be secure from the ground up

    Think of it like the rules of the road—all vehicles (or in this case, different software) follow the same rules, allowing smooth traffic flow.

    Security Measures

    MCP takes security seriously with several built-in protections:

    Permission-Based Access

    • Nothing happens without your explicit approval
    • Permissions are fine-grained (specific to each action)

    Sandboxing

    • Each MCP server is isolated from others
    • If one server has a problem, it doesn’t affect the rest

    Audit Trails

    • All actions are logged so you can see what happened
    • Useful for troubleshooting or monitoring usage

    Real-Time Communication

    MCP uses modern, efficient methods for passing information back and forth:

    • It’s designed for low latency (minimal delays)
    • It handles both simple requests and large data transfers
    • It manages two-way communication seamlessly

    This means you don’t have to wait long for results, even when dealing with complex tasks involving multiple systems.

    MCP Servers: The Building Blocks of AI Integration

    MCP servers are the workhorses of the system. Each one is specialized for a specific purpose, and you can mix and match them based on your needs.

    Types of MCP Servers

    MCP servers generally fall into a few categories:

    1. Local Resource Servers

    • Access things on your computer
    • Examples: File System, Local Database, Browser Control

    2. Communication Servers

    • Connect to messaging and social platforms
    • Examples: Slack, Email, Bluesky

    3. Productivity Servers

    • Integrate with work tools
    • Examples: GitHub, Google Drive, Calendar

    4. Information Servers

    • Fetch and process data
    • Examples: Weather, Search, Wikipedia

    5. Specialized Servers

    • Handle niche needs
    • Examples: 3D Printer Control, Smart Home

    Where to Find MCP Servers

    In the previous post, I mentioned a few of the top MCP servers. If you’re looking for more, there are several places to discover and download MCP servers:

    1. Official MCP Servers Repository
    2. MCP.so Directory
    3. Glama.ai
    4. Composeio
    5. Cursor Directory
    6. Awesome MCP Servers
    7. Fleur MCP App Store
    8. MCP Run
    9. Smithery

    Setting Up Your Own MCP Server

    While most people will simply use existing MCP servers, you might be curious about how they’re created. Or perhaps you can’t find one and want to build your own. Here’s a simplified explanation:

    What You Need to Create an MCP Server

    If you’re not a developer, you probably won’t be creating your own MCP servers. But understanding what goes into them can help you appreciate what they do:

    1. Programming Skills

    • Knowledge of languages like Python and JavaScript
    • Understanding of APIs and web services

    2. Development Tools

    • MCP SDK (Software Development Kit)
    • Required libraries and dependencies

    3. Access to Resources

    • API keys for external services
    • Documentation for the systems you’re connecting to

    For the Technically Curious: A Simple Example

    Here’s what a very basic MCP server might look like in concept (this is simplified pseudocode):

    JSON
    // Define what the server can do
    server.addCapability("read-weather-forecast", {
      description: "Gets the weather forecast for a location",
      parameters: {
        location: "The city or area to get the forecast for",
        days: "Number of days to forecast"
      },
      securityLevel: "requires-approval"
    });
    
    // Implement the actual functionality
    server.onRequest("read-weather-forecast", async (request) => {
      // Get the forecast from a weather service
      const forecast = await weatherAPI.getForecast(
        request.parameters.location, 
        request.parameters.days
      );
      
      // Return the results
      return {
        current: forecast.current,
        daily: forecast.daily,
        warnings: forecast.alerts
      };
    });
    
    // Start listening for connections
    server.start();
    

    This simplified example shows how an MCP server:

    1. Defines what capabilities it offers
    2. Specifies what parameters are needed
    3. Sets security requirements
    4. Implements the actual functionality
    5. Returns results in a structured format

    In reality, MCP servers are more complex, with proper error handling, security features, and optimization—but this gives you a sense of their basic structure.

    Connecting Multiple MCP Servers: The Power of Combination

    One of the most powerful aspects of MCP is the ability to use multiple servers together. This creates workflows that would otherwise require complex programming.

    Example: A Research Assistant Workflow

    Imagine you’re researching a topic and want AI help. With multiple MCP servers, you could:

    1. Use the File System server to scan your existing notes
    2. Use the Browser Control server to search for new information
    3. Use the Wikipedia server to verify facts and get background
    4. Use the Google Drive server to save your findings
    5. Use the Slack server to share insights with colleagues

    All of this could be accomplished with a single request to your AI assistant, with each server handling its specialized part of the task.

    Common Questions About MCP Servers

    “Are MCP servers safe to install?”

    MCP servers from reputable sources follow strict security protocols. Stick to official directories and well-reviewed options. Each server will ask for specific permissions, so you always maintain control over what they can access.

    “How many servers should I install?”

    Start with just the ones you need for your common tasks. You can always add more later. Most users begin with the File System server and add others as needed.

    “Will MCP servers slow down my computer?”

    Most MCP servers use minimal resources when idle and are designed to be efficient. If you’re not actively using them with your AI assistant, they have very little impact on performance. I’ve noticed, however, that it does slow down my Claude Desktop app if I add too many.

    “Can I use MCP servers with any AI assistant?”

    Currently, MCP works with compatible hosts like Claude Desktop and Cursor. As the protocol gains popularity, more AI applications are likely to support it.

    What’s Next on Your MCP Journey

    Now that you understand how MCP works behind the scenes and what servers are available, you’re ready to start building your personalized AI workspace.

    In my next post in the series, I’ll provide a hands-on guide to building out useful agentic workflows with Claude and MCP servers. I’ll walk through the setup process with screenshots and troubleshooting tips to ensure a smooth experience.

    Sign up below and stay tuned for it!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • The Ultimate Guide to Model Context Protocol, Part 1: What is MCP

    The Ultimate Guide to Model Context Protocol, Part 1: What is MCP

    The MCP Series

    This post is part 1 of my “Ultimate Guide to Model Context” series. You can read part 2 here. Stay tuned for more posts.

    Well hello there! I take it you’ve been hearing about this MCP business online and have meandered over to my humble website looking for answers. This post shall shed some light on the entire affair.

    Today’s AI assistants like Claude, Grok, and ChatGPT, are a clever lot, brimming with facts and ready to toss out answers faster than you can say “what ho!” to any query you lob their way. But when it comes to rolling up the sleeves and actually doing something for you, well, they fall short.

    It’s like having a butler who’s all ears and sage nods, but when you cry, “Sort out my blasted emails!” or “Fish up those receipts before the taxman comes calling!” he merely blinks and offers a sympathetic, “Quite so, sir,” without lifting a finger.

    You’d get rid of him really fast.

    That’s where the Model Context Protocol comes in. Developed and open-sourced by Anthropic in November, 2024, MCP is a new standard for your AI butler to connect to your data, or any other siloed data source, and take actions, in a secure manner.

    In this post, the first of a series on MCP, we’ll cover what it is, why it’s different from an API call or integration, and how you can get started with using it in just a few minutes.

    Sign up if you want to know when I release the next post in the series.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    What is MCP?

    The Model Context Protocol (MCP) is like a universal translator between AI models and your digital world. Just as USB-C provides a standardized way to connect your devices to various accessories, MCP provides a standardized way for AI to securely access and work with your files, apps, and online services.

    Don’t worry about the technical jargon—here’s what you need to know:

    MCP Host: The application where you interact with AI (like Claude Desktop). Think of this as the “home base” where you chat with your AI assistant.

    MCP Server: A special program that gives AI access to specific resources (like your files or Slack). Each server is like a specialized tour guide that knows one area extremely well.

    MCP Client: The behind-the-scenes connector that lets the host talk to servers. You don’t need to worry about this part—it works automatically.

    How Is It Different from an API or Integration?

    Ok so essentially MCP is a way for Claude to talk to your data or some external service. Isn’t that literally what an API or an integration does? Why are we complicating this?

    Well, first of all, MCP sounds cooler than API.

    But yes, you could do this with an API call, except it’s complicated. For starters, you’d need to know how to code and make an API call. Then you’d need to configure Claude or another AI assistant to actually make that API call. And then you’d need to repeat that for everything you want it to access – your files, your email, your Slack, and so on. Exhausting, what?

    Why doesn’t Anthropic just build integrations to all these apps instead? Well, again, that’s a lot of work. So they’ve basically just pawned off all that work to the developer community to build MCP servers.

    It’s a bit of a middle ground, but still very simple for the end user. You find an MCP server by a third party that does the thing you want it to do, you tell Claude to use that MCP server, and Bob’s your uncle, job done.

    Can’t find an MCP server? Make your own (we’ll get to how in a later post in this series).

    In fact, some MCP servers are actually just wrappers over an API! But there are additional benefits:

    1. Standardized Security and Control
      • MCP servers enforce strict access rules, requiring user approval for actions (e.g., a tool like write-file needs explicit consent). APIs, by contrast, rely on developers to implement security, which can vary widely.
      • Example: An MCP server accessing your Slack channels ensures the AI only reads what you allow, unlike an API token that might grant full access if not scoped properly.
    2. Two-Way Communication
      • MCP supports bidirectional data flow, enabling AI models to not just fetch data but also act on it. For instance, an MCP File System server can let an AI read a document, summarize it, and save the summary back—all within one protocol.
      • APIs typically require separate calls for each step, increasing complexity.
    3. AI-Specific Optimization
      • MCP provides “tools” (callable functions) and “prompts” (pre-written templates) that align with how AI models process information. For example, a weather MCP server might offer a get-forecast tool that returns data in a format an AI can easily digest, reducing preprocessing.
      • APIs deliver raw data (e.g., JSON), leaving it to developers to adapt it for AI use.
    4. Local and Remote Flexibility
      • MCP servers can connect to local resources, like your computer’s file system, or remote like a Chrome browser without needing a web-based API.
      • Example: The Puppeteer MCP server controls a browser locally, while a Google Maps MCP server hits a remote API, blending both worlds.
    5. Simplified Integration
      • MCP standardizes how AI models interact with external systems, reducing the need for custom code per API. A developer can use one MCP client to connect to multiple servers (e.g., Slack, GitHub), whereas APIs require unique integrations for each.

    Practical Scenarios: API vs. MCP

    ScenarioAPI ApproachMCP ApproachWhy MCP Wins?
    Fetch Weather DataCall OpenWeather API, parse JSONUse MCP weather server’s get-forecast toolAI-ready output, less coding
    Manage FilesBuild a local server with API endpointsUse MCP File System serverNative local access, standardized
    Automate SlackUse Slack API, handle rate limits, authUse MCP Slack server with approved actionsSecure, controlled interaction
    Analyze GitHub IssuesMultiple API calls to GitHub, custom logicMCP GitHub server with tools like list-issuesStreamlined, two-way flow

    Do You Need MCP?

    • If you’re just fetching data: Stick with APIs—they’re simpler for basic tasks like grabbing stock prices.
    • If you’re powering an AI: MCP shines when you need your AI to interact with the world, locally or remotely, in a secure, controlled way. For example, integrating Claude with your file system via MCP is safer and easier than building an API for it.

    What MCP Can Do For You: Real-World Examples

    Ok, hopefully I’ve convinced you that an MCP is actually useful and not just Silicon Valley reinventing something that already exists.

    Now, let’s look at some real world examples:

    Personal Productivity

    • File Organization: “Claude, can you organize my downloaded files into folders by type and date?” With MCP, Claude can actually do this for you, not just tell you how, while you polish off your second donut of the morning.
    • Email Management: “Summarize all my unread emails from the bigwigs,” you plead. With MCP, Claude dives into the inbox, sifts through the missives, and delivers a pithy précis, perhaps even firing it off via Slack or a text.
    • Note Analysis: “Claude, cast an eye over my meeting notes from the past month and whip up an action plan, there’s a good chap.” With MCP, Claude rummages through your scribblings, plucks out the juicy bits, and adds todos to your task management app faster than you can blink.

    Information Access

    • Document Search: “Find me the skinny on our budget projections in my documents folder,” you command. With MCP, Claude ferrets through your private stash without so much as a whisper to the internet, emerging triumphant with the goods, like a bloodhound on the scent.
    • PDF Q&A: “What were the key recommendations in that report I nabbed yesterday?” you muse. Claude, armed with MCP, tracks down the PDF, pores over it like a don at his books, and serves up the answers with the precision of a well-aimed dart.

    Communication

    • Message Drafting: “Draft a Slack message to the troops summing up the quarterly report on my desktop,” you say. With MCP, Claude saunters over to your files, has a butcher’s at the report, and taps out a message with the finesse of a seasoned clubman penning a note to the committee.
    • Conversation Summaries: “What were the main thrusts of yesterday’s team chitchat?” you ask. Claude, with MCP as trusty steed, gallops through the chat logs and returns with a tidy summary, sparing you the bother of wading through the blather yourself.

    Web Search

    • Browse the internet: “Dig up the latest gossip on AI and give me the lowdown,” you request. With the Brave or Exa MCP servers, Claude scours the web and delivers a crisp rundown like a newsboy hawking the evening edition.
    • Find restaurants: “What are the top Thai eateries near my digs?” you wonder aloud. With the Google Maps MCP server in play, Claude not only unearths the finest curry houses but pops the addresses your way, like a cabbie with a knack for spice.

    By the way, I’m using Claude as an example, but any company can become an MCP Host and create their own client. This instantly opens up a world of possibilities for their users.

    Cursor, for example, also built an MCP client. So, just like with Claude, you can install a web scraper MCP and have Cursor scrape the most up to date documentation for a Python package to use in the code it generates.

    Top MCP Servers and What They Do

    MCP servers are the building blocks that give AI access to specific parts of your digital world. Here are the most popular ones:

    1. File System MCP Server: Lets AI safely work with files and folders on your computer
    2. Slack MCP Server: Enables Claude to post messages, reply to threads, add reactions, and more in Slack.
    3. GitHub MCP Server: Helps manage code repositories and issues.
    4. Google Maps MCP Server: Enables location-based assistance
    5. Brave MCP Server: An MCP server implementation that integrates the Brave Search API, providing both web and local search capabilities.

    You can find a list of servers on the MCP site. Each server handles one specific type of connection, and you can install exactly the ones you need.

    Getting Started in 10 Minutes

    Ready to try MCP yourself? Here’s how to get started:

    1. Download Claude for Desktop

    Right now, MCP servers are hosted locally (on your computer), so we need a local client as well. Download it and install it from https://claude.ai/download

    After you install it, run it and log in to your Claude account.

    2. Install Node.js

    We’ll have to install Node.js for the same reason we’re installing Claude for desktop. We’re running everything locally and node helps us load and run the servers.

    Go to Nodejs.org and follow the instructions to do so.

    3. Install your first MCP server

    We’re going to start with the File System server. It’s created by Anthropic and allows Claude to access files on your computer.

    The first thing you need to do is click on Claude and then Settings. Go to the Developer section and hit Edit Config.

    This will then open up a folder where you’ll find a file called claude_desktop_config.json. It’s an empty file right now with a pair of curly braces {}.

    Remove those braces and paste this in instead:

    JSON
    {
      "mcpServers": {
        "filesystem": {
          "command": "npx",
          "args": [
            "-y",
            "@modelcontextprotocol/server-filesystem",
            "/Users/<add your username here>/Documents",
            "/Users/<add your username here>/Downloads"
          ]
        }
      }
    }

    This configuration file tells Claude for Desktop that we have one MCP server, called “filesystem” and that it should use Node to install and run @modelcontextprotocol/server-filesystem. This server, described here, will let you access your file system in Claude for Desktop.

    It also lists out the folders it can access. Be sure to add the correct path names (on Mac that is usually “Users/your-username/Documents”.)

    3. Try it out!

    After you save the config file, restart the Claude for Desktop application. It may take a few moments for it to start but when it does, you’ll see a little hammer icon in the bottom right corner of your chat box.

    That’s our MCP tool list! We only installed on server, but it comes pre-packaged with 11 tools like creating a directory, editing files, and so on. Now you see why it’s so cool? Imagine having to build all of this yourself.

    Let’s give it a test drive. I’ve given Claude access to a folder called Code where I store all my coding projects locally. I’m going to ask Claude to generate Hello World code in python and save that as a file in my Code folder.

    Isn’t that cool? Now it seems simple but I can extend this to having Claude generated multiple files, organize it into different folders, and even push them to GitHub if it’s a coding project, all from my chat window.

    What’s Next?

    Creating the MCP was a genius move by Anthropic. They were losing the consumer race to OpenAI who are building integrations like web search. So they’ve leaned into their developer focus, having their community build MCP servers to extend Claude’s capabilities far beyond ChatGPT instantly.

    Now that you understand why MCP matters for making AI truly helpful in your digital life, part 2 of this series will take you behind the scenes.

    We’ll explore exactly how MCP works its magic, featuring a day-in-the-life scenario showing how different MCP servers can work together to accomplish tasks you never thought possible without programming knowledge.

    Read it here and sign up below for more posts!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • GPT4.5: A Complete Review and How It Compares To Others

    GPT4.5: A Complete Review and How It Compares To Others

    OpenAI finally released GPT4.5, hot on the heels of new SOTA models from Anthropic and xAI. As always, OpenAI hyped it up in the lead up to the launch.

    Sam Altman himself fueled the flames of expectation, describing it as “the first model that feels like talking to a thoughtful person to me” and hinting at capabilities edging closer to artificial general intelligence than ever before.

    https://twitter.com/sama/status/1895203654103351462

    As you can see, it’s a giant and expensive model. Andrej Karpathy reckons that every 0.5 update in the GPT series, there’s a 10X increase in compute cost. So 4.5 needed 10X more than 4 to train, and 4 needed 10X more than 3.5 to train, and so on.

    So it’s reasonable to expect some sort of step jump from 4 to 4.5 the same way we saw from previous upgrade, right? Right?

    Let’s Have A Look At The Numbers

    For all the computational resources poured into GPT-4.5, the performance improvements over GPT-4 are surprisingly modest. Let’s examine the actual benchmark data:

    On the Massive Multitask Language Understanding (MMLU) test – a comprehensive evaluation of knowledge across domains – GPT-4.5 scored approximately 89.6% versus GPT-4’s already impressive 86.4%. That’s a small improvement for what likely represents a 10X increase in computational resources.

    The pattern of modest gains continues across other benchmarks:

    • HumanEval (coding generation): GPT-4.5 achieves 88.6% accuracy, only slightly edging out GPT-4’s already near-human performance at 86.6%
    • MGSM (math problems): GPT-4.5 shows comparable performance to GPT-4 (86.9% vs 85.1%), with only modest improvements
    • DROP (reasoning): GPT-4.5 scored 83.4%, a little better than GPT4’s 81.5%.

    The other interesting this is some of these scores are lower than OpenAI’s smaller specialized reasoning models, especially the o3 series, which scores above 90% in some of these tests.

    So the data tells us that GPT-4.5 is better than GPT-4, but only incrementally so – and in some domains, it’s outperformed by more specialized, less computationally intensive models.

    Now some people say that the benchmarks aren’t the best tests, and we need better ones. And it can be argued that at such high numbers, every 1% increase is significant.

    Ok, I agree. To me the real test of how good a model is is whether the end user (you and I) finds it valuable. So, let’s judge for ourselves.

    The “Emotional Intelligence” Test: You Be the Judge

    The most intriguing claim about GPT-4.5 is its supposedly enhanced “emotional intelligence” and conversational abilities. Sam Altman’s assertion that it feels like “talking to a thoughtful person” suggests a qualitative leap in how the model handles nuanced human interaction.

    On Twitter, Andrej Karpathy ran GPT 4.5 and GPT 4o through the same set of questions and asked his audience which gave better results.

    https://twitter.com/karpathy/status/1895337579589079434

    I took inspiration from that and decided to give similar tests 4.5 and four other SOTA models for comparison: Claude 3.7, Grok 3, Gemini 2.0 Flash, and Meta’s Llama 3.3.

    To run this test, I built a little app that uses the APIs of all these models simultaneously and also calculates the response time and cost. This adds a layer of objectivity to the responses. If two models give me the same answer and one was faster and cheaper, that’s the better model.

    Here are some examples of responses:

    Q1: Invent a new literary genre blending cyberpunk, magical realism, and ancient mythology. Briefly describe the genre, name it, and provide a short sample narrative

    Q2: Describe a color that doesn’t exist but would be beautiful if it did.

    Q3: How would you console someone who just lost their job after 20 years at the same company?

    Q4: Analyze this statement for underlying emotions: ‘I’m fine with whatever you want to do. It doesn’t matter to me. You decide.’

    Q5: A self-driving car must decide between hitting three elderly pedestrians or swerving and hitting a child. Discuss the moral complexities.

    Here’s the full video if you want to see all the questions, answers, response times and costs.

    My Opinion

    I think Gemini and Meta do really well (surprisingly well) across the board. Meta got the math question wrong (which you can see in the video) but I loved the detailed answers to creative and EQ questions. Gemini made an assumption with the Brurberry question but got it right.

    If you add the response times and costs, my winner here is Gemini Flash 2.0, with Meta Llama a close second. That being said, OpenAI’s o3 is still the best for reasoning, while Claude and Grok are the best for coding.

    The Price of Incremental Progress

    I don’t know about you but I wouldn’t say 4.5 is any better than other leading models. Especially considering how slow and expensive it is.

    Which brings us to its cost. If you noticed in the video, I am also tracking how expensive each API cost is. OpenAI has priced GPT-4.5 at $75 per million input tokens and $150 per million output tokens – roughly 15 times more expensive than GPT-4o and other SOTA models.

    For perspective, a typical business use case involving moderate API usage could easily cost thousands of dollars per month on GPT-4.5, compared to hundreds for GPT-4o. Even access through ChatGPT initially required subscribing to the premium ChatGPT Pro tier at $200 per month, although they say it will soon be available at lower tiers.

    Credit Where It’s Due: Real Improvements

    Despite the underwhelming benchmark performance and concerning cost structure, GPT-4.5 does deliver meaningful improvements in two key areas: context window size and factual accuracy.

    The expanded context window of 128,000 tokens (quadrupling GPT-4’s 32,000) represents a genuine breakthrough for applications involving long documents or complex, multi-step interactions. Analysts, researchers, and content creators can now process entire reports, books, or codebases in a single prompt, eliminating the need for chunking and summarization workarounds.

    More impressive is the reduction in hallucinations – those plausible-sounding but factually incorrect outputs that have plagued large language models since their inception. On OpenAI’s internal “SimpleQA” evaluation, GPT-4.5 delivered the correct answer 62.5% of the time, compared to only 38% for GPT-4. Its hallucination rate nearly halved, from approximately 62% to 37%.

    This improved factual reliability could prove transformative for certain high-stakes applications in medicine, law, or finance, where accuracy is paramount. It represents a genuine step toward more trustworthy AI systems, even if the overall intelligence gain is modest.

    Making the Business Decision: When Is GPT-4.5 Worth It?

    For organizations weighing whether to adopt GPT-4.5, the decision comes down to a careful cost-benefit analysis. The model may be justified in scenarios where:

    1. Factual accuracy is paramount – In medical, legal, or financial contexts where errors could have serious consequences, the reduced hallucination rate might justify the premium.
    2. Long-context processing is essential – Applications requiring analysis of entire documents or complex multi-step reasoning can benefit substantially from the 128k token context.
    3. Cost is no object – For high-value applications where performance improvements of even a few percentage points translate to significant business value, the price premium may be acceptable.

    However, for most general-purpose applications, the value proposition is questionable. Companies with limited budgets may find better returns by:

    • Sticking with GPT-4o for most tasks
    • Using specialized models for specific domains (like mathematics)
    • Exploring competing models like Claude 3.7 or Gemini Ultra, which offer similar capabilities at lower price points
    • Investing in prompt engineering and fine-tuning of more affordable models

    The Future of AI Scaling: Diminishing Returns?

    GPT-4.5’s modest performance improvements despite massive computational investment raise profound questions about the future of AI development. Are we witnessing the beginning of diminishing returns in scaling language models? Has the low-hanging fruit of parameter counting and dataset expansion been largely picked?

    If we continue with the same rate of cost to train models, GPT-5 will require 100X more compute and GPT-6 10,000X more to train than GPT-4. The incremental improvement does not justify the cost.

    But there are a few things working in our favor. For starters, bigger is not necessarily better. Models like Meta’s LLama 3 and Mistral 7B show that smaller, highly optimized models can outperform massive models in certain tasks with much lower compute costs.

    We’re also seeing much better performance with Reasoning Models, which I covered in a previous blog post.

    All in all, it’s clear that throwing more compute at the problem isn’t the best solution, and we need newer techniques. And maybe we can’t get to AGI this way, but the fact is AI in it’s current state is already very useful (see agents!), and most people haven’t even scratched the surface with it yet.

  • How To Build Web-Aware AI Agents with Exa

    How To Build Web-Aware AI Agents with Exa

    Oh wait you still use Google to search the internet? In this glorious age of reasoning AI. Come on now.

    Here’s the thing, if you’re building AI agents, Google won’t cut it. You don’t need 500,000 search results. You need just the right amount of information for your agent to do its job. Your agent needs something more powerful. That’s where Exa comes in.

    Exa is a specialized web API designed specifically for AI applications. In this guide I will walk you through everything you need to know about leveraging Exa to create powerful, web-aware AI agents that can perform complex tasks with real-world data. There’s code examples that you can copy and paste too.

    But first, let’s look under the hood to see how it works.

    Why Exa for AI Agents?

    Exa positions itself as “a search engine made for AIs” and excels at retrieving high-quality, relevant web content to enhance AI applications.

    Key Advantages

    1. Designed for AI Integration: Unlike traditional search APIs, Exa is optimized for AI consumption, returning clean, parsed content that’s ready for processing by large language models (LLMs).
    2. Semantic Understanding: Exa offers neural search capabilities that understand the meaning behind queries, not just keywords, making it ideal for natural language interactions.
    3. Comprehensive Web Coverage: With very high availability across crucial categories like research papers, personal pages, news, and company information, Exa provides broad access to the web’s knowledge.
    4. Focused Results: Exa excels at finding specific entities like people, companies, and research papers—often delivering 20x more correct results than traditional search engines for complex queries.
    5. Verification Capabilities: Through its AI agent integration, Exa can verify and validate search results, ensuring higher accuracy for critical applications.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    How Exa Works Behind the Scenes

    At its core, Exa uses embeddings to transform web content into numerical vector representations. This allows it to understand the conceptual meaning of content rather than just matching keywords.

    Exa’s search engine consists of three main components:

    (1) Crawling & Indexing

    • Exa crawls the web to collect data, just like Google.
    • They identify and process URLs, storing document content in a structured format.
    • Unlike Google, which focuses on keyword-based indexing, Exa processes documents using AI models to understand content semantically.

    (2) AI Processing (Neural Search)

    • Instead of relying on traditional PageRank (which ranks results based on backlinks and domain authority), Exa uses a neural-based approach.
    • Exa’s link prediction model is trained to predict which links follow a given piece of text.
    • This means Exa learns the relationships between documents, similar to how transformers predict words in a sentence.

    Example: If an article says “Check out this aerospace startup” and links to spacex.com, Exa learns that “aerospace startup” is semantically related to SpaceX—even if the text doesn’t explicitly say “SpaceX.”

    (3) Search & Retrieval

    • At query time, Exa’s model predicts the most relevant documents, rather than relying on exact keyword matches.
    • The search query is processed like an LLM prompt, allowing for natural language queries instead of just keywords.

    Comprehensive Web Index

    Exa has indexed billions of web pages, focusing on high-quality content across various categories like research papers, personal websites, company information, and news sources. While this is smaller than Google’s trillion-page index, Exa has prioritized quality and relevance for specific use cases over sheer quantity.

    The index is maintained with special attention to categories that are particularly valuable for AI applications. For example, research papers, personal websites, and LinkedIn profiles have very high availability in Exa’s index, making it especially useful for finding specific entities and specialized information.

    Search Types and Processing

    Exa offers three main search types:

    1. Neural Search: This leverages the embedding technology for semantic understanding. When you use neural search, Exa finds content that is conceptually related to your query, even if it doesn’t contain the exact words. This is particularly effective for exploratory searches and complex concepts.
    2. Keyword Search: This more traditional approach focuses on finding exact word matches, which is useful for proper nouns or specific terminology. It’s optimized for precision when you know exactly what terms should appear.
    3. Auto Search: This lets Exa decide whether to use neural or keyword search based on the query, combining the advantages of both approaches.

    Content Processing

    When retrieving full content from web pages, Exa doesn’t just return raw HTML. It processes the content to extract the most relevant text, removing navigational elements, ads, and other noise. This clean parsing makes the content immediately usable by language models without requiring additional cleaning steps.

    Exa can also generate AI summaries of content, identify highlights based on queries, and extract structured data from web pages, further enhancing its utility for AI applications.

    Technical Infrastructure

    Under the hood, Exa likely uses a combination of:

    1. Vector Databases: To store and efficiently query the embedded representations of web pages.
    2. Large Language Models: For query understanding, content summarization, and result verification.
    3. Distributed Computing: To handle the computational demands of embedding and searching billions of web pages.
    4. Crawlers and Parsers: To continually update its index with fresh content from the web.

    API Functionality

    The Exa API exposes this functionality through endpoints like:

    • /search for finding relevant web pages
    • /contents for retrieving and processing the content of specific pages
    • Features to filter by domain, date, category, and other parameters

    The API is designed to be easily integrated with language models through SDKs for Python and JavaScript, making it straightforward to incorporate web data into AI workflows.

    Getting Started with Exa

    Now that you have an understanding of how Exa works, we can get to building agents with it. To begin building with Exa, you’ll need to:

    1. Create an Account: Visit dashboard.exa.ai to register for an account.
    2. Obtain an API Key: Generate your API key from the dashboard at dashboard.exa.ai/api-keys.
    3. Install the SDK: Choose between the Python or JavaScript SDK based on your preferred development environment. When building pure AI agents, I use Python, so all code examples in this post will use Python.
    4. Set Up Environment Variables: Create a .env file to securely store your API key

    Basic Exa Implementation

    Here’s an example of how to initiate an Exa search in Python:

    Python
    import os
    from exa import Exa
    from dotenv import load_dotenv
    
    # Load environment variables
    load_dotenv()
    
    # Initialize the Exa client
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    
    # Perform a basic search
    results = exa.search("Latest research in LLMs", type="auto")
    
    # Print the results
    for result in results:
        print(f"Title: {result.title}")
        print(f"URL: {result.url}")
        print(f"Published Date: {result.published_date}")
        print(f"Score: {result.score}")
        print("---")

    As you can see, it’s pretty simple. You initialize the Exa client, pass it a string to search, and then it returns a bunch of results. There are options to customize how the results are returned (how many, what level of detail, etc.) We’ll explore this with a few more examples.

    Building Your First AI Agent with Exa

    Let’s walk through creating a basic research agent that can gather information on a specific topic and provide a summary.

    Python
    import os
    from exa import Exa
    from dotenv import load_dotenv
    from openai import OpenAI
    
    # Load environment variables
    load_dotenv()
    
    # Initialize clients
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    def research_agent(topic, num_results=5):
        """
        A simple research agent that gathers information on a topic and summarizes it.
        
        Args:
            topic: The research topic
            num_results: Number of search results to consider
        
        Returns:
            A summary of findings
        """
        # Step 1: Search for relevant information
        print(f"Searching for information on: {topic}")
        search_results = exa.search_and_contents(
            query=topic,
            num_results=num_results,
            text=True  # Retrieve the full text content
        )
        
        # Step 2: Extract and compile the content
        all_content = ""
        sources = []
        
        for i, result in enumerate(search_results):
            if hasattr(result, 'text') and result.text:
                all_content += f"\nSource {i+1}: {result.title}\n{result.text[:1000]}...\n"
                sources.append(f"{i+1}. {result.title} - {result.url}")
        
        # Step 3: Summarize with LLM
        prompt = f"""
        Based on the following information about "{topic}", provide a comprehensive summary:
        
        {all_content}
        
        Your summary should be well-structured, factual, and highlight the most important points.
        """
        
        response = openai_client.chat.completions.create(
            model="gpt-4-turbo",
            messages=[
                {"role": "system", "content": "You are a research assistant that summarizes information accurately."},
                {"role": "user", "content": prompt}
            ]
        )
        
        summary = response.choices[0].message.content
        
        # Compile final report
        final_report = f"""
        # Research Summary: {topic}
        
        {summary}
        
        ## Sources
        {'\n'.join(sources)}
        """
        
        return final_report
    
    # Example usage
    if __name__ == "__main__":
        result = research_agent("Advances in quantum computing in 2024")
        print(result)

    This agent demonstrates the core workflow of using Exa for AI agents:

    1. Our agent starts by taking a query from the user and searching with Exa
    2. We use the search_and_contents function to not just get back a list of URLs that best match our query but also the content within those pages.
    3. We then use an LLM (in this case, GPT-4) to analyze and summarize the findings and format it into a report.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    More Complex AI Agents

    As you become more comfortable with basic implementations, you can build more sophisticated AI agents for specific use cases. Let’s explore three powerful examples.

    1. Competitor Research Agent

    This agent automatically discovers and analyzes competitors for a given company, compiling insights into a structured report.

    Python
    import os
    from exa import Exa
    from dotenv import load_dotenv
    from openai import OpenAI
    
    load_dotenv()
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    class CompetitorResearchAgent:
        def __init__(self):
            self.exa_client = exa
            self.openai_client = openai_client
            
        def find_competitors(self, company_name):
            """Find potential competitors using Exa search"""
            query = f"Top competitors of {company_name}"
            results = self.exa_client.search(query, num_results=10, exclude_domains = ["mydomain.com"])  # you can change this
            return results
            
        def analyze_competitor(self, competitor_result):
            """Analyze a specific competitor based on web content"""
            # Get detailed content about the competitor
            content_result = self.exa_client.get_contents(competitor_result.id, text=True)
            
            if not hasattr(content_result, 'text') or not content_result.text:
                return {
                    "name": competitor_result.title,
                    "url": competitor_result.url,
                    "overview": "No detailed information available",
                    "products": [],
                    "strengths": [],
                    "weaknesses": []
                }
                
            # Use LLM to extract structured information
            prompt = f"""
            Based on the following content about a company, extract:
            1. Company name
            2. Brief overview (2-3 sentences)
            3. Main product offerings (up to 5)
            4. Key strengths (up to 3)
            5. Potential weaknesses (up to 3)
            
            Content:
            {content_result.text[:4000]}
            
            Format your response as JSON with the following structure:
            {{
                "name": "Company Name",
                "overview": "Brief overview",
                "products": ["Product 1", "Product 2", ...],
                "strengths": ["Strength 1", "Strength 2", ...],
                "weaknesses": ["Weakness 1", "Weakness 2", ...]
            }}
            """
            
            response = self.openai_client.chat.completions.create(
                model="gpt-4-turbo",
                response_format={"type": "json_object"},
                messages=[{"role": "user", "content": prompt}]
            )
            
            try:
                import json
                analysis = json.loads(response.choices[0].message.content)
                analysis["url"] = competitor_result.url
                return analysis
            except:
                return {
                    "name": competitor_result.title,
                    "url": competitor_result.url,
                    "overview": "Analysis failed",
                    "products": [],
                    "strengths": [],
                    "weaknesses": []
                }
        
        def generate_report(self, company_name):
            """Generate a complete competitor analysis report"""
            print(f"Finding competitors for {company_name}...")
            competitors = self.find_competitors(company_name)
            
            if not competitors:
                return f"No competitors found for {company_name}"
                
            print(f"Found {len(competitors)} competitors. Analyzing each...")
            
            analyses = []
            for competitor in competitors:
                print(f"Analyzing {competitor.title}...")
                analysis = self.analyze_competitor(competitor)
                analyses.append(analysis)
                
            # Generate the final report
            report = f"# Competitor Analysis Report for {company_name}\n\n"
            report += f"## Executive Summary\n\n"
            
            # Use LLM to generate executive summary
            companies_list = ", ".join([a["name"] for a in analyses])
            summary_prompt = f"""
            Create a brief executive summary for a competitor analysis report for {company_name}.
            The identified competitors are: {companies_list}.
            Keep it under 150 words and highlight key insights.
            """
            
            summary_response = self.openai_client.chat.completions.create(
                model="gpt-4-turbo",
                messages=[{"role": "user", "content": summary_prompt}]
            )
            
            report += f"{summary_response.choices[0].message.content}\n\n"
            
            # Add detailed competitor analyses
            report += f"## Detailed Competitor Analysis\n\n"
            
            for analysis in analyses:
                report += f"### {analysis['name']}\n\n"
                report += f"**Website**: {analysis['url']}\n\n"
                report += f"**Overview**: {analysis['overview']}\n\n"
                
                report += "**Product Offerings**:\n"
                for product in analysis['products']:
                    report += f"- {product}\n"
                report += "\n"
                
                report += "**Key Strengths**:\n"
                for strength in analysis['strengths']:
                    report += f"- {strength}\n"
                report += "\n"
                
                report += "**Potential Weaknesses**:\n"
                for weakness in analysis['weaknesses']:
                    report += f"- {weakness}\n"
                report += "\n"
                
            return report
    
    # Example usage
    if __name__ == "__main__":
        agent = CompetitorResearchAgent()
        report = agent.generate_report("MyAwesomeStartup") #insert your company name here
        print(report)
        
        # Save report to file
        with open("competitor_analysis.md", "w") as f:
            f.write(report)

    There’s a lot going on here so let’s break it down.

    First we run a basic Exa search for “Top competitors of {company_name}” which in this case would be your company. We’re also setting a parameter for number of search results (in this case 10), which you can customize.

    We also exclude our own company just in case but you can add more domains to exclude here.

    We get the results back in a list which we then loop through. At this point, you can also filter it further but for each URL, we run exa.get_contents() which simply scrapes that URL and returns all the page content.

    We then use GPT 4 to analyze the content and turn it into a comprehensive competitor research report. This section is mostly prompt engineering so feel free to play around and try some other prompts.

    2. Newsletter Generation Agent with CrewAI

    For more complex workflows, you can combine Exa with agent orchestration frameworks like CrewAI. This example creates a team of specialized agents that work together to generate a newsletter.

    Python
    import os
    from dotenv import load_dotenv
    from crewai import Agent, Task, Crew, Process
    from exa import Exa
    from openai import OpenAI
    
    load_dotenv()
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    # Create specialized agents
    researcher = Agent(
        role="Research Specialist",
        goal="Find the latest and most relevant news on a given topic",
        backstory="You are an expert researcher who excels at finding accurate information online",
        verbose=True,
        allow_delegation=True,
        tools=[exa.search, exa.get_contents],
        llm=openai_client,
    )
    
    fact_checker = Agent(
        role="Fact Checker",
        goal="Verify information accuracy and source credibility",
        backstory="You're a meticulous fact-checker with years of experience in journalism",
        verbose=True,
        allow_delegation=True,
        tools=[exa.search, exa.get_contents],
        llm=openai_client,
    )
    
    writer = Agent(
        role="Newsletter Writer",
        goal="Create engaging, informative newsletter content",
        backstory="You're a talented writer who specializes in distilling complex topics into readable content",
        verbose=True,
        allow_delegation=False,
        llm=openai_client,
    )
    
    # Create tasks for each agent
    def create_newsletter_crew(topic):
        research_task = Task(
            description=f"Research the latest news about {topic} from the past week. Find at least 5 important developments or stories.",
            expected_output="A list of news items with titles, brief summaries, and source URLs",
            agent=researcher,
        )
    
        verification_task = Task(
            description="Verify the accuracy of each news item and evaluate the credibility of sources",
            expected_output="Verified list of news items with credibility scores for each source",
            agent=fact_checker,
            context=[research_task],
        )
    
        writing_task = Task(
            description=f"Create a newsletter about {topic} based on the verified research. Include an introduction, summaries of the top stories, and a conclusion.",
            expected_output="Complete newsletter in HTML format, ready to be sent",
            agent=writer,
            context=[verification_task],
        )
    
        # Create and run the crew
        crew = Crew(
            agents=[researcher, fact_checker, writer],
            tasks=[research_task, verification_task, writing_task],
            verbose=True,
            process=Process.sequential,
        )
        
        return crew
    
    # Example usage
    if __name__ == "__main__":
        topic = "Artificial Intelligence Advancements"
        crew = create_newsletter_crew(topic)
        result = crew.kickoff()
        
        with open(f"{topic.replace(' ', '_')}_newsletter.html", "w") as f:
            f.write(result)
        
        print(f"Newsletter generated and saved to {topic.replace(' ', '_')}_newsletter.html")

    The research agent uses Exa to find relevant news, the fact-checker verifies information, and the writer compiles everything into a cohesive newsletter. You can tweak the instructions to structure the newsletter in a specific way.

    3. Recruiting Agent with Exa and OpenAI

    This agent automates the process of discovering, researching, and evaluating exceptional candidates for recruitment purposes.

    It’s a lot more complicated than the previous two and the code requires multiple files, so I haven’t added it here. I will describe the logic to it so that you can try it out on your own. If you need help, contact me!

    1. Candidate Discovery: We start with an Exa search for qualified professionals based on a job title and required skills, focusing on LinkedIn profiles and GitHub accounts. In the Exa API call you can limit results to certain domains.
    2. Comprehensive Research: For each potential candidate, it gathers information from their LinkedIn profile and then tries to find their personal websites or GitHub profiles for additional context.
    3. Intelligent Evaluation: Using GPT-4o, it evaluates each candidate against job requirements, scoring them on technical skills, experience, education, communication, and overall fit.
    4. Similar Candidate Finding: After identifying top performers, it uses Exa’s semantic search to find similar professionals, expanding the talent pool.
    5. Structured Reporting: It generates a comprehensive markdown report with an executive summary, detailed evaluations of each candidate, and recommendations.

    Additional Exa Features and Best Practices

    As you build more complex agents with Exa, try using these additional features and best practices to maximize effectiveness:

    Filtering and Refinement

    Exa offers powerful filtering options to narrow down search results:

    Python
    # Date filtering
    results = exa.search(
        "AI ethics developments",
        start_published_date="2024-01-01",
        end_published_date="2024-12-31"
    )
    
    # Domain filtering
    results = exa.search(
        "Climate change research",
        include_domains=["nature.com", "science.org"],
        exclude_domains=["wikipedia.org"]
    )
    
    # Category filtering
    results = exa.search(
        "Quantum computing advances",
        category="research paper"
    )

    Content Retrieval

    For deeper analysis, retrieve the full content of web pages, or an AI-generated summary:

    Python
    # Retrieve text content
    content = exa.get_contents(result_id, text=True)
    
    # Retrieve highlights based on a query
    highlights = exa.get_contents(result_id, highlights="quantum advantage")
    
    # Retrieve AI-generated summaries
    summary = exa.get_contents(result_id, summary=True)

    Working with Websets

    Exa’s Websets feature, launched in December 2024, transforms complex searches into structured datasets. This powerful tool helps you find specific entities like people, companies, and research papers with greater precision than traditional search engines.

    I’ve played around with it and I’ve found it to be really good for use cases like sales, HR and recruiting, and even finding founders to invest in. It doesn’t just bring you a list, it also researches them and verifies all the information

    Advanced AI Agents with Exa

    If you’ve made it this far, congrats. Hopefully you’ve tried some of the code samples above and even adapted it to your use case. In this section, I’m going to talk about even more complex builds. These are large projects I’ve built out for clients and there’s too much code to add to this post, so I’m just going to explain the intuition behind them.

    Chain-of-Thought Research Agent

    One powerful pattern for research agents is the chain-of-thought approach, where the agent breaks down complex questions into sub-questions, researches each one separately, and then synthesizes the findings. Here’s how it works:

    1. Question Decomposition: When given a complex research question, the agent uses GPT-4 to break it down into 3-5 more focused sub-questions. Each sub-question targets a specific aspect of the main question, making the research more manageable and thorough.
    2. Sub-Question Research: For each sub-question, the agent uses Exa to search the web for relevant information. It gathers content from multiple sources, extracts the most important passages, and then uses GPT-4 to formulate a concise but comprehensive answer based strictly on the information found.
    3. Synthesis of Findings: After researching all sub-questions, the agent compiles the individual answers and uses GPT-4 to synthesize them into a cohesive response to the original question. This step ensures that connections between different aspects are identified and incorporated.
    4. Report Generation: Finally, the agent creates a structured research report with an executive summary containing the synthesized answer, followed by detailed findings for each sub-question with their respective sources properly cited.

    This approach mirrors how human researchers tackle complex topics—breaking down big questions into manageable parts, researching each thoroughly, and then connecting the dots to form a complete picture. It’s particularly effective for multifaceted questions that require exploring different angles or domains of knowledge.

    Here’s an example of a research agent I built:

    Multi-Modal Generation with Exa and LLMs

    This agent implements a comprehensive multi-modal content production pipeline that mimics the process a professional content creator might follow. The workflow consists of six carefully orchestrated steps:

    1. Topic Research: When given a topic, the agent first conducts thorough research using Exa’s search and content retrieval capabilities. It gathers information from multiple sources, extracting relevant text and organizing it with proper attribution.

    2. Visual Element Research: Recognizing that engaging content isn’t just text, the agent searches for relevant imagery related to the topic. Again, with Exa, we can search specifically for images. We can also limit the search to sites like Unsplash.

    3. Structured Outline Generation:With research in hand, the agent uses GPT-4 to create a comprehensive outline for the article with the proposed title, introduction concept, detailed section breakdowns with subheadings, key points for each section, and a conclusion approach.

    4. Data Visualization Creation: Here, the agent generates custom Python code for a data visualization relevant to the article topic, if applicable. It analyzes the research data to identify key concepts that would benefit from visual representation, then creates complete, executable code using matplotlib or seaborn.

    5. Article Writing: The agent then synthesizes all the previous elements – research, outline, images, data viz – into a complete article. It follows the outline structure precisely, incorporates references to the suggested images, maintains an engaging writing style, and includes proper citations to the original sources.

    6. Content Package Assembly: Finally, the agent compiles everything into a comprehensive content package containing the original topic, structured outline, finished article text, data visualization code, image descriptions, and source references. This modular approach makes it easy to use the outputs in various ways – publishing the article as-is, extracting just the visualization code, or using the outline as a starting point for further development.

    What will you build with Exa?

    As we’ve explored throughout this guide, Exa represents a powerful evolution in how AI agents interact with the web. By providing a dedicated search API optimized specifically for AI consumption, Exa bridges a crucial gap between large language models and the vast, constantly updating knowledge contained on the internet.

    The agents we’ve examined demonstrate how this integration creates systems that are not merely intelligent but also well-informed. These agents ground their reasoning in current, relevant information rather than relying solely on their training data, which inevitably becomes outdated over time.

    I hope this guide serves as a starting point for you, demonstrating what’s possible today. So, tell me, what will you build with Exa?

    Need Help Building AI Agents?

    Our custom AI agents go beyond typical automation tools. We create sophisticated digital workers that understand your business context, integrate seamlessly with your systems, and continuously improve through real-world interactions.

  • The SAAS Advertising Playbook for 2025: A Framework for Profitable Ad Campaigns with AI

    The SAAS Advertising Playbook for 2025: A Framework for Profitable Ad Campaigns with AI

    Update

    This post was first published in 2020. While the core framework described here is still relevant, I’ve updated it for 2025 to talk about how AI can be used to build it out.

    What platforms should we advertise on? How much budget should we allocate to each? What’s the ballpark CPC for our industry?

    I often hear questions like this from founders and marketers when they explore advertising for their SAAS startup. And my answer is always the same, “it depends”.

    It depends on so many factors and if someone can give you a straight answer, they’re scamming you.

    I’ve helped dozens of SAAS companies like Typeform, Olark, and ClickUp set up ad campaigns, and I know from experience that, while there are similarities, there are also differences.

    This is where a framework comes in handy. It’s a systematic process that will help you figure out which platforms to advertise on, what your ads should look like, and now in 2025, how to effectively integrate AI into your advertising strategy. Instead of guessing or listening to the “gurus” and ending up with sub-optimal ad campaigns, you’ll be able to devise a profitable ad strategy that works for your SAAS.

    In this updated post, I’ll share the framework I use for every SAAS company I work with, now enhanced with AI-powered strategies that simply weren’t available or mature enough five years ago.

    PS – If you prefer video, here it is –

    An Overview Of The Framework

    The framework I use for ads is conceptually similar to my overarching framework for identifying growth channels at a startup. Both frameworks are built on the stage of awareness concept created by Eugene Schwartz in his book Breakthrough Advertising.

    You’ve probably come across some version of it before. Customers typically start off being Unaware that they have a problem. At some point, it comes to their attention that they have a problem, so they’re Problem Aware. 

    As they start looking for ways to solve their problem, they become Solution Aware. Then they become Product Aware as they learn of the products that enable these solutions. And finally, they make a Decision.

    I’ve adapted this concept to SAAS with the main idea being that people are searching for different things or have different needs at each stage of the journey, hence they should see different ads.

    What’s changed since 2020 is that AI now enables us to:

    • Test hundreds of creative approaches simultaneously
    • More precisely identify what stage a prospect is in
    • Create personalized content at scale for each stage
    • Dynamically optimize campaigns across the awareness spectrum

    A Hypothetical Example – Shopify

    The bulk of Shopify’s marketing is aimed at first-time eCommerce entrepreneurs though a second and, I’d expect, more profitable customer persona would be the entrepreneur with an existing retail store.

    So let’s say you’re a purveyor of tiger cubs and you sell them at your zoo/shop. Business is going well and you’re pretty happy until one day you realize that you could actually be making more. 

    Maybe it’s because your main competitor, Carole Baskin, is doing way better than you and you absolutely hate her guts, especially since she definitely killed her husband and got away with it. Or you may have just tapped out the local market and you’ve hit a plateau. Whatever it is, you realize you have a problem. You’re Problem Aware.

    To solve this problem, you look for solutions. Maybe you dig into Carole’s business, or you start reading articles online about how to grow your business. Your research leads you to many solutions like opening up new locations, switching to a franchise model, or selling online. You’re Solution Aware.

    Let’s say you decide to go online because of the attractive cost structure. Now you’re trying to figure out how that works. Should you hire someone to build a website and checkout system for you? Is there an easier way? You do a lot of Googling, subscribe to the newsletter of one of those fake make money online gurus, and find out that there are products like Shopify, Magento, and BigCommerce that make it easy. You’re Product Aware.

    Now it’s time for you to make a decision. After asking around, looking at reviews, and even trying out the different products, you pick one and move your store online. Congratulations, you’re now selling tiger cubs internationally while you sit at home in your pajamas.

    As you can see, understanding this journey for your customers and business allows us to determine the targeting, messaging, and call to action at each stage. We can also create a sequence of ads and landing pages that take people from one stage to the next. 

    For Panoply, we’ve seen far greater effects by segmenting landing pages to ad intents than broad tactical message or layout tests.

    Trevor Fox, Panoply

    The closer someone is to that final Decision stage, the easier it is to convert them. So I usually start my campaigns with the Product Aware stage and move backward. That way I can get some early wins before tackling the tougher ones.

    Let’s dig deeper into each stage and see how it works.

    Product Aware

    In this stage, the prospect knows about the various products that solve their problem, and they’re trying to figure out which one to pick. 

    In SAAS, this usually means taking a free trial, looking up reviews or comparison posts, or talking to people at other companies to see which product they picked.

    That implies anyone who has come across your product, or a competitor’s product, but is still trying to decide, is in the Product Aware stage. Most people who have either visited your website but haven’t signed up, or are specifically Googling a competitor’s product, or are looking at reviews on a site like G2Crowd, are in this category.

    So your targeting becomes website traffic, competitor searches on Google, and review sites. Your message is whatever can convince the customer that your product is better than the others they’re looking at.

    Let’s look at some examples of Product Aware campaigns –

    Competitor Targeting

    In 2014, I used to work for a company called LemonStand, an eCommerce platform like Shopify, and recently acquired by Mailchimp.

    Back then we couldn’t afford to advertise for keywords like “eCommerce platform” because the word eCommerce was bid up by Shopify. In fact, if you searched for “worst eCommerce platform” you’d see their ad, though now they’ve wisened up.

    To side-step this we simply advertised to people searching for Shopify. They weren’t a household name like they are now in 2020 but they were still well known in the eCommerce space. We figured anyone looking for them could be a potential customer.

    Our biggest differentiating factor was customizability. With LemonStand you could change every aspect of your online store, but you couldn’t do that with Shopify. 

    And the campaign worked! We got leads that wanted more control over their store. We even drew the ire of Shopify who reached out to us via email and then started their own competitor ads, which didn’t really matter to us because we didn’t have a lot of traffic anyway.

    Sadly I can’t seem to find screenshots of my campaigns, but it turns out Wix is carrying on the fight.

    There’s is an art to tasteful competitor campaigns as I’ve learned over the years since, and I’ll share them in a separate blog post. But look up a well-known software product in any industry and you’re bound to see competitor ads. Use those as guidelines to set up your own!

    AI-Enhanced Competitor Targeting (2025 Update)

    Competitor targeting remains as effective as ever, but AI has changed the game. Today, we use language models to analyze competitor content, product pages, and reviews to identify specific pain points their customers experience.

    For example, at one client (an analytics company), we developed an AI system that scrapes thousands of reviews across G2, Capterra, and other review sites to identify exactly where competitors are falling short. We then generate ads that directly address those unmet needs with our solution.

    Another client used this approach to identify that a major competitor was receiving numerous complaints about their mobile app experience. Within days, we launched a campaign specifically targeting searches for “[Competitor] mobile app problems” and related terms, resulting in a 42% lower CPA compared to broader competitor terms.

    The key AI enhancement is specificity at scale. We can now target hundreds of micro-segments within competitor audiences, each with tailored messaging.

    Retargeting

    I often suggest setting up retargeting early on even if you’re not planning on launching other ad campaigns. A good retargeting campaign that runs in the background can bring in new customers for pretty cheap. I prefer Facebook and Instagram for this but that depends on where your audience is.

    One tactic is to promote a customer testimonial or case study as social proof. Try to highlight why a customer chose you over other products. Another previous client, an eCommerce customer support app called Gorgias, does this well.

    For freemium SAAS, I sometimes break it up into two. The first ad converts site traffic to free users, and the second converts free users to paid.

    Here’s one I made for Scott’s Cheap Flights. It’s not exactly SAAS but works on a freemium subscription model. They send you price drop alerts for international flights every day for a small fee.

    I used a mild “Fear of missing out” tactic by highlighting some of the deals they missed by not signing up.

    I then follow up with a case study, similar to Gorgias, to convert the free users to premium.

    Remember that in this stage people are almost ready to make a decision, so your CTA for any ad and landing page is to sign up for your product, whether it’s free or paid.

    Solution Aware

    Once I max out Product Aware channels, and that could happen quickly if your competitors have low search volume or you have a large budget, I move back one stage to Solution Aware.

    Here people are still researching solutions to their problems. They may not be ready to pick a product, and may not even know what products exist yet.

    So on Google, they may be searching for “XYZ software” or “How to automate XYZ”. I may sometimes send traffic straight to a landing page depending on the search query.

    On social networks like Facebook, Instagram, or LinkedIn, you could target a custom audience of prospect emails. I often promote blog content or webinars here to get an email address at the very least. After this, our retargeting ads kick in.

    Here are some example campaigns – 

    Buyer Intent Campaigns

    When expanding past competitor ads and retargeting into Solution Aware campaigns, I like to start with Google ads targeting buyer-intent keywords. These are keywords that indicate someone is looking for a particular solution. They may not necessarily know which products exist but at least they have an idea of the solution they need.

    For example, one of my past clients is Procurify, a procurement software company. They have a complex product with features for every part of the procurement process – from creating purchase orders to approving them based on budgets, to paying suppliers, and tracking the delivery.

    This is to say they solve many problems. So while they brand themselves as a “spend management” software, I also figured that anyone looking for, say, software to manage purchase orders would be qualified. So each solution Procurify offered had its own campaign. 

    These campaigns went to a landing page that talked about that specific solution, and also introduced the product and its other features. In fact, I didn’t send them to a landing page with only one CTA. While the primary CTA was to book a demo, the page I sent them to was also just a regular product page with links to other pages. 

    The reason is, again, we’re in a Solution Aware stage and people may not necessarily be ready to buy the product even though the keywords have buying intent. The page allowed visitors to self-select and either book a demo if they wanted to move quickly, or explore the rest of the site and understand the product a bit more.

    And if they chose the latter, that was fine because our YouTube and Facebook retargeting from the Product Aware stage would kick in.

    AI-Enhanced Buyer Intent Campaigns (2025 Update)

    The biggest change in Solution Aware campaigns is the ability to create hundreds of micro-targeted landing pages that precisely match search intent.

    Back in 2020, we might create 10-20 landing pages for different solution categories. Today, with AI-generated content and design tools, we can create hundreds of hyper-specific landing pages that exactly match long-tail search queries, each with unique headlines, content, and social proof elements.

    What makes this sustainable in 2025 is that these pages aren’t static, they’re dynamically assembled based on:
    – Search query data
    – User location and industry
    – Real-time conversion performance
    – Current feature availability

    Informational Campaigns

    Many B2B SAAS companies have large email lists for cold outreach. You can upload these into Facebook to create custom audiences that are probably Solution Aware, if not Product Aware, depending on your initial criteria for creating the list.

    This strategy worked well for Plato, a mentorship platform for engineering managers. We’d run webinars with the VPs of engineering and product at high-profile tech companies like Lyft and Segment. 

    Then, we’d promote those webinars to a custom audience of our email list on Facebook and Instagram and optimize for webinar signups.

    After that, our retargeting campaign would kick in and push people to sign up for a demo of Plato.

    Another option is targeted newsletter inserts. Marketplaces like BuySellAds allow you to place ads in curated lists. So instead of building the list yourself and then reaching them in Facebook, you use an existing list and reach them directly.

    AI-Powered Content Distribution (2025 Update)

    Content marketing has undergone a revolution with AI-powered distribution. Rather than creating a single piece of content and promoting it broadly, we now:

    1. Create core “pillar” content pieces
    2. Use AI to generate dozens of derivatives optimized for each platform
    3. Deploy AI targeting to find the perfect audience match

    For one client, we evolved their webinar strategy with an AI system that automatically:

    – Transcribes webinar content
    – Identifies the most impactful insights
    – Creates platform-specific snippets (LinkedIn posts, Twitter threads, TikTok clips)
    – Identifies and targets professionals most likely to engage with each specific insight

    This approach has reduced their cost per signup by 51% while increasing overall webinar participation.

    Problem Aware and Unaware

    You’ll find that a bulk of your ad spend will go to Solution Aware and you may not even need to hit the Problem Aware stage.

    In fact, people in the Problem Aware stage are so early in the buying process that if you’re expecting an instant return on your ad spend, you definitely shouldn’t bother with this.

    However, if you have more realistic expectations, here are some campaign ideas. For starters, your audience will probably just be anyone in your ideal customer profile. 

    On Facebook and Instagram this could be a lookalike audience or interest-based audience. So Shopify would target people who like entrepreneurship. 

    On LinkedIn, you could use job title targeting. On Google Display and YouTube you have audience-based options for people in a certain industry or in the market for certain types of software.

    Pain Point Campaigns

    One example would be a campaign that targets a broad audience and highlights a pain point that your product solves. The idea here is to bring awareness to that problem so that the prospect starts to think about it and eventually begins the journey of looking for a solution.

    Grammarly does an amazing job here. This YouTube video ad targets students and all it does is highlight the pain point of writing a term paper. 

    Notice there’s no narrator going “Grammarly helps you yada yada yada…” It’s just a simple ad showing a student who is frustrated initially while writing a term paper and then eventually gets an A+ because she used Grammarly. 

    As a student, you may realize, upon seeing this ad, that you have a problem with writing grammatically correct sentences too, and that might get you to start taking it more seriously, eventually ending up using a tool like Grammarly.

    Here’s another example from a past client, ClickUp, a project management software. They’re using a lookalike audience to highlight the pain point of not being able to visualize projects.

    In fact, many display, video and even billboard ad campaigns fall under this category. The point is not to get you to buy something instantly, which is why all the new-age marketers who make fun of it don’t really understand what’s happening. 

    The point is simply to highlight a pain point you might have and get you to start thinking about solving it. If you click through and sign up immediately, that’s just a bonus.

    AI-Powered Predictive Problem Awareness (2025 Update)

    The most innovative AI application in advertising is the ability to predict who is about to become Problem Aware before they even recognize it themselves.

    Using federated learning models that respect privacy while analyzing behavior patterns, we can now identify companies that are displaying indicators of specific problems:
    – For a customer success platform, we identify companies showing increased customer churn signals on social media
    – For a security solution, we spot organizations showing vulnerability markers before they experience a breach
    – For financial software, we detect companies with inefficient spending patterns in public financial data

    This predictive approach means we can introduce problems to prospects right before they would discover them naturally, positioning our solutions as forward-thinking and proactive.

    Optimization and Scaling

    So the framework helps you decide which platforms to use, and what your ads and landing pages should say. But how do you know if your campaigns are working or not? When do you decide to cut something or double down on it?

    For starters, you want to be profitable. In general, I like to aim for a 3:1 ratio of LTV to CAC. Sometimes, depending on the payback period, I may even go higher.

    I also try as much as possible to look at the CAC holistically. So instead of comparing platforms, since my platforms are working together to create a cohesive ad campaign, I’m looking at overall CAC.

    To optimize within platforms, I’d do a straight comparison for campaigns in a certain stage of the journey. So I wouldn’t compare a Product Aware campaign on Google against a Solution Aware campaign.

    But I would compare all Product Aware campaigns against each other to decide which ones to double down on and which ones to drop.

    I’d then do the same thing on an ad set and ad level.

    As you pause underperforming ads or campaigns, expand the best-performing ones, and even add new campaigns as the framework dictates, you’ll be able to scale your ads profitably.

    AI Creative Testing (2025 Update)

    The days of A/B testing two ad variants are long gone. Today’s AI systems enable:
    Multivariate testing at unprecedented scale: Testing hundreds of creative combinations simultaneously
    Real-time generation of new variants: Based on early performance signals
    Element-level analysis: Understanding which specific components of successful ads drive conversions
    Cross-channel creative insights: Learning what works on one platform and adapting it for others

    For one client, we built an AI system that continuously generates and tests new ad creative variations. The system identified that customer testimonials featuring specific metrics outperformed generic social proof by 73%. It then automatically generated dozens of new testimonial-based ads, each highlighting different concrete results.

    Putting it all together

    In the end, you may end up with a set of ads across different channels that work together to move people from one stage in the journey to the next, and finally to a purchasing decision.

    As you can see, there’s no “best ad platform” or “best type of ad”. They each have their strengths and work at different parts of the journey.

    If you’re setting up ads for your SAAS business, I suggest testing the waters with the Product Aware campaign ideas I suggested.

    And if you need help building out some of the AI automations I mentioned, feel free to contact me!