Category: Blog

  • The Ultimate Guide to Model Context Protocol, Part 2: Behind The Scenes of MCP

    The Ultimate Guide to Model Context Protocol, Part 2: Behind The Scenes of MCP

    The MCP Series

    This post is part 2 of my “Ultimate Guide to Model Context” series. You can read part 1 here and part 3 here.

    In our previous post, we introduced the Model Context Protocol (MCP) and how it transforms our dear Claude from a knowledgeable yet impotent AI into a helpful digital butler who can actually interact with your files, apps, and services. Now it’s time to draw back the curtain and have a gander at the magic behind it.

    Don’t worry—we’ll keep things simple and jargon-free, dishing out plenty of analogies to explain the technical concepts like my Uncle dishes out expletives when India cocks up a cricket match. By the end of this post, you’ll understand what makes MCP tick and how you can start exploring different MCP servers for your specific needs.

    How MCP Works

    Remember our analogy of MCP as a universal translator between AI and your digital world? Let’s expand on that to understand what’s happening behind the scenes.

    The MCP Architecture Explained

    At its core, MCP follows what tech folks call a “client-server architecture.” This is exactly how computers work with the internet. The browser on your computer is the “client”. It retrieves and displays information from a “server” over the internet via a protocol called HTTP.

    The Model Context Protocol is similar. Let’s say you’re enjoying a cold Sunday evening by the fire in the study of your manor, as one does. It’s a high-tech study with a built-in AI assistant. You ask the assistant to have some hot toddy sent over:

    1. The Host (where you chat with AI)

    • This is an application like Claude Desktop or Cursor where you interact with an AI assistant.
    • In our analogy, this is the study of your manor.

    2. The MCP Client (the translator)

    • This is built into the host application. So the engineers at Claude and Cursor need to build this first for the whole thing to work.
    • It translates between what the AI understands and what MCP servers speak.
    • You never interact with this directly, it works behind the scenes like HTTP.
    • In our analogy, it’s an upgrade module for your study that allows your AI assistant to communicate with other parts of your manor, such as the bar.

    3. MCP Servers (specialized helpers)

    • Each server is like a specialist with access to specific resources.
    • One server might know how to work with files, another with Slack, and so on.
    • Servers can be on your computer or connect to online services.
    • In our analogy, the bartender who makes the hot toddy and brings it over to you is the server.

    4. Tools (actions your AI takes via servers)

    • These are the functions available to the AI on the server.
    • A document server may have a read_file action that the AI can invoke to read a specific file.
    • In our analogy, the tool is the ability to prepare libation.

    5. Resources (your digital stuff)

    • The actual files, apps, and services the AI needs to access
    • Could be local (on your computer) or remote (on the internet)
    • In our analogy, these are the ingredients that go into making the hot toddy. I prefer a spot of Cognac myself.

    If you enjoyed this analogy, I have more for you. Be a dear and sign up to my newsletter for more.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    A Day in the Life of an MCP Request

    Ok enough with the analogies. To really understand how this works, let’s follow what happens when you ask your AI assistant to summarize a document and send it to Slack:

    1. You make a request to Claude: “Be a good sport and summarize the quarterly_report.pdf on my desktop. Oh and while you’re at it, post the key points to the #team-updates Slack channel”
    2. Claude recognizes this requires access to both files and Slack, so it needs to use MCP
    3. The MCP Client activates and connects to two different MCP servers:
      • The File System MCP Server (to access the PDF)
      • The Slack MCP Server (to post the message)
    4. Permissions are checked:
      • The File System server asks: “Allow Claude to read quarterly_report.pdf?”
      • The Slack server asks: “Allow Claude to post to #team-updates?”
      • You approve both requests
    5. The File System server retrieves the PDF content and sends it back through MCP
    6. Claude processes the document and creates a summary
    7. The Slack server takes Claude’s summary and posts it to your team channel
    8. You receive confirmation that the task is complete

    All of this happens in seconds, with the complex technical work hidden from view. The beauty of MCP is that it handles all the complicated connections while maintaining security and giving you control.

    The Technology That Powers MCP

    Now that you understand the basic flow, let’s demystify some of the technology that makes MCP possible:

    The Protocol Itself

    The Model Context Protocol is what tech people call an “open standard.” This means:

    • It’s publicly documented so anyone can build with it
    • It follows consistent rules for communication
    • It’s designed to be secure from the ground up

    Think of it like the rules of the road—all vehicles (or in this case, different software) follow the same rules, allowing smooth traffic flow.

    Security Measures

    MCP takes security seriously with several built-in protections:

    Permission-Based Access

    • Nothing happens without your explicit approval
    • Permissions are fine-grained (specific to each action)

    Sandboxing

    • Each MCP server is isolated from others
    • If one server has a problem, it doesn’t affect the rest

    Audit Trails

    • All actions are logged so you can see what happened
    • Useful for troubleshooting or monitoring usage

    Real-Time Communication

    MCP uses modern, efficient methods for passing information back and forth:

    • It’s designed for low latency (minimal delays)
    • It handles both simple requests and large data transfers
    • It manages two-way communication seamlessly

    This means you don’t have to wait long for results, even when dealing with complex tasks involving multiple systems.

    MCP Servers: The Building Blocks of AI Integration

    MCP servers are the workhorses of the system. Each one is specialized for a specific purpose, and you can mix and match them based on your needs.

    Types of MCP Servers

    MCP servers generally fall into a few categories:

    1. Local Resource Servers

    • Access things on your computer
    • Examples: File System, Local Database, Browser Control

    2. Communication Servers

    • Connect to messaging and social platforms
    • Examples: Slack, Email, Bluesky

    3. Productivity Servers

    • Integrate with work tools
    • Examples: GitHub, Google Drive, Calendar

    4. Information Servers

    • Fetch and process data
    • Examples: Weather, Search, Wikipedia

    5. Specialized Servers

    • Handle niche needs
    • Examples: 3D Printer Control, Smart Home

    Where to Find MCP Servers

    In the previous post, I mentioned a few of the top MCP servers. If you’re looking for more, there are several places to discover and download MCP servers:

    1. Official MCP Servers Repository
    2. MCP.so Directory
    3. Glama.ai
    4. Composeio
    5. Cursor Directory
    6. Awesome MCP Servers
    7. Fleur MCP App Store
    8. MCP Run
    9. Smithery

    Setting Up Your Own MCP Server

    While most people will simply use existing MCP servers, you might be curious about how they’re created. Or perhaps you can’t find one and want to build your own. Here’s a simplified explanation:

    What You Need to Create an MCP Server

    If you’re not a developer, you probably won’t be creating your own MCP servers. But understanding what goes into them can help you appreciate what they do:

    1. Programming Skills

    • Knowledge of languages like Python and JavaScript
    • Understanding of APIs and web services

    2. Development Tools

    • MCP SDK (Software Development Kit)
    • Required libraries and dependencies

    3. Access to Resources

    • API keys for external services
    • Documentation for the systems you’re connecting to

    For the Technically Curious: A Simple Example

    Here’s what a very basic MCP server might look like in concept (this is simplified pseudocode):

    JSON
    // Define what the server can do
    server.addCapability("read-weather-forecast", {
      description: "Gets the weather forecast for a location",
      parameters: {
        location: "The city or area to get the forecast for",
        days: "Number of days to forecast"
      },
      securityLevel: "requires-approval"
    });
    
    // Implement the actual functionality
    server.onRequest("read-weather-forecast", async (request) => {
      // Get the forecast from a weather service
      const forecast = await weatherAPI.getForecast(
        request.parameters.location, 
        request.parameters.days
      );
      
      // Return the results
      return {
        current: forecast.current,
        daily: forecast.daily,
        warnings: forecast.alerts
      };
    });
    
    // Start listening for connections
    server.start();
    

    This simplified example shows how an MCP server:

    1. Defines what capabilities it offers
    2. Specifies what parameters are needed
    3. Sets security requirements
    4. Implements the actual functionality
    5. Returns results in a structured format

    In reality, MCP servers are more complex, with proper error handling, security features, and optimization—but this gives you a sense of their basic structure.

    Connecting Multiple MCP Servers: The Power of Combination

    One of the most powerful aspects of MCP is the ability to use multiple servers together. This creates workflows that would otherwise require complex programming.

    Example: A Research Assistant Workflow

    Imagine you’re researching a topic and want AI help. With multiple MCP servers, you could:

    1. Use the File System server to scan your existing notes
    2. Use the Browser Control server to search for new information
    3. Use the Wikipedia server to verify facts and get background
    4. Use the Google Drive server to save your findings
    5. Use the Slack server to share insights with colleagues

    All of this could be accomplished with a single request to your AI assistant, with each server handling its specialized part of the task.

    Common Questions About MCP Servers

    “Are MCP servers safe to install?”

    MCP servers from reputable sources follow strict security protocols. Stick to official directories and well-reviewed options. Each server will ask for specific permissions, so you always maintain control over what they can access.

    “How many servers should I install?”

    Start with just the ones you need for your common tasks. You can always add more later. Most users begin with the File System server and add others as needed.

    “Will MCP servers slow down my computer?”

    Most MCP servers use minimal resources when idle and are designed to be efficient. If you’re not actively using them with your AI assistant, they have very little impact on performance. I’ve noticed, however, that it does slow down my Claude Desktop app if I add too many.

    “Can I use MCP servers with any AI assistant?”

    Currently, MCP works with compatible hosts like Claude Desktop and Cursor. As the protocol gains popularity, more AI applications are likely to support it.

    What’s Next on Your MCP Journey

    Now that you understand how MCP works behind the scenes and what servers are available, you’re ready to start building your personalized AI workspace.

    In my next post in the series, I’ll provide a hands-on guide to building out useful agentic workflows with Claude and MCP servers. I’ll walk through the setup process with screenshots and troubleshooting tips to ensure a smooth experience.

    Sign up below and stay tuned for it!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • The Ultimate Guide to Model Context Protocol, Part 1: What is MCP

    The Ultimate Guide to Model Context Protocol, Part 1: What is MCP

    The MCP Series

    This post is part 1 of my “Ultimate Guide to Model Context” series. You can read part 2 here. Stay tuned for more posts.

    Well hello there! I take it you’ve been hearing about this MCP business online and have meandered over to my humble website looking for answers. This post shall shed some light on the entire affair.

    Today’s AI assistants like Claude, Grok, and ChatGPT, are a clever lot, brimming with facts and ready to toss out answers faster than you can say “what ho!” to any query you lob their way. But when it comes to rolling up the sleeves and actually doing something for you, well, they fall short.

    It’s like having a butler who’s all ears and sage nods, but when you cry, “Sort out my blasted emails!” or “Fish up those receipts before the taxman comes calling!” he merely blinks and offers a sympathetic, “Quite so, sir,” without lifting a finger.

    You’d get rid of him really fast.

    That’s where the Model Context Protocol comes in. Developed and open-sourced by Anthropic in November, 2024, MCP is a new standard for your AI butler to connect to your data, or any other siloed data source, and take actions, in a secure manner.

    In this post, the first of a series on MCP, we’ll cover what it is, why it’s different from an API call or integration, and how you can get started with using it in just a few minutes.

    Sign up if you want to know when I release the next post in the series.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

    What is MCP?

    The Model Context Protocol (MCP) is like a universal translator between AI models and your digital world. Just as USB-C provides a standardized way to connect your devices to various accessories, MCP provides a standardized way for AI to securely access and work with your files, apps, and online services.

    Don’t worry about the technical jargon—here’s what you need to know:

    MCP Host: The application where you interact with AI (like Claude Desktop). Think of this as the “home base” where you chat with your AI assistant.

    MCP Server: A special program that gives AI access to specific resources (like your files or Slack). Each server is like a specialized tour guide that knows one area extremely well.

    MCP Client: The behind-the-scenes connector that lets the host talk to servers. You don’t need to worry about this part—it works automatically.

    How Is It Different from an API or Integration?

    Ok so essentially MCP is a way for Claude to talk to your data or some external service. Isn’t that literally what an API or an integration does? Why are we complicating this?

    Well, first of all, MCP sounds cooler than API.

    But yes, you could do this with an API call, except it’s complicated. For starters, you’d need to know how to code and make an API call. Then you’d need to configure Claude or another AI assistant to actually make that API call. And then you’d need to repeat that for everything you want it to access – your files, your email, your Slack, and so on. Exhausting, what?

    Why doesn’t Anthropic just build integrations to all these apps instead? Well, again, that’s a lot of work. So they’ve basically just pawned off all that work to the developer community to build MCP servers.

    It’s a bit of a middle ground, but still very simple for the end user. You find an MCP server by a third party that does the thing you want it to do, you tell Claude to use that MCP server, and Bob’s your uncle, job done.

    Can’t find an MCP server? Make your own (we’ll get to how in a later post in this series).

    In fact, some MCP servers are actually just wrappers over an API! But there are additional benefits:

    1. Standardized Security and Control
      • MCP servers enforce strict access rules, requiring user approval for actions (e.g., a tool like write-file needs explicit consent). APIs, by contrast, rely on developers to implement security, which can vary widely.
      • Example: An MCP server accessing your Slack channels ensures the AI only reads what you allow, unlike an API token that might grant full access if not scoped properly.
    2. Two-Way Communication
      • MCP supports bidirectional data flow, enabling AI models to not just fetch data but also act on it. For instance, an MCP File System server can let an AI read a document, summarize it, and save the summary back—all within one protocol.
      • APIs typically require separate calls for each step, increasing complexity.
    3. AI-Specific Optimization
      • MCP provides “tools” (callable functions) and “prompts” (pre-written templates) that align with how AI models process information. For example, a weather MCP server might offer a get-forecast tool that returns data in a format an AI can easily digest, reducing preprocessing.
      • APIs deliver raw data (e.g., JSON), leaving it to developers to adapt it for AI use.
    4. Local and Remote Flexibility
      • MCP servers can connect to local resources, like your computer’s file system, or remote like a Chrome browser without needing a web-based API.
      • Example: The Puppeteer MCP server controls a browser locally, while a Google Maps MCP server hits a remote API, blending both worlds.
    5. Simplified Integration
      • MCP standardizes how AI models interact with external systems, reducing the need for custom code per API. A developer can use one MCP client to connect to multiple servers (e.g., Slack, GitHub), whereas APIs require unique integrations for each.

    Practical Scenarios: API vs. MCP

    ScenarioAPI ApproachMCP ApproachWhy MCP Wins?
    Fetch Weather DataCall OpenWeather API, parse JSONUse MCP weather server’s get-forecast toolAI-ready output, less coding
    Manage FilesBuild a local server with API endpointsUse MCP File System serverNative local access, standardized
    Automate SlackUse Slack API, handle rate limits, authUse MCP Slack server with approved actionsSecure, controlled interaction
    Analyze GitHub IssuesMultiple API calls to GitHub, custom logicMCP GitHub server with tools like list-issuesStreamlined, two-way flow

    Do You Need MCP?

    • If you’re just fetching data: Stick with APIs—they’re simpler for basic tasks like grabbing stock prices.
    • If you’re powering an AI: MCP shines when you need your AI to interact with the world, locally or remotely, in a secure, controlled way. For example, integrating Claude with your file system via MCP is safer and easier than building an API for it.

    What MCP Can Do For You: Real-World Examples

    Ok, hopefully I’ve convinced you that an MCP is actually useful and not just Silicon Valley reinventing something that already exists.

    Now, let’s look at some real world examples:

    Personal Productivity

    • File Organization: “Claude, can you organize my downloaded files into folders by type and date?” With MCP, Claude can actually do this for you, not just tell you how, while you polish off your second donut of the morning.
    • Email Management: “Summarize all my unread emails from the bigwigs,” you plead. With MCP, Claude dives into the inbox, sifts through the missives, and delivers a pithy précis, perhaps even firing it off via Slack or a text.
    • Note Analysis: “Claude, cast an eye over my meeting notes from the past month and whip up an action plan, there’s a good chap.” With MCP, Claude rummages through your scribblings, plucks out the juicy bits, and adds todos to your task management app faster than you can blink.

    Information Access

    • Document Search: “Find me the skinny on our budget projections in my documents folder,” you command. With MCP, Claude ferrets through your private stash without so much as a whisper to the internet, emerging triumphant with the goods, like a bloodhound on the scent.
    • PDF Q&A: “What were the key recommendations in that report I nabbed yesterday?” you muse. Claude, armed with MCP, tracks down the PDF, pores over it like a don at his books, and serves up the answers with the precision of a well-aimed dart.

    Communication

    • Message Drafting: “Draft a Slack message to the troops summing up the quarterly report on my desktop,” you say. With MCP, Claude saunters over to your files, has a butcher’s at the report, and taps out a message with the finesse of a seasoned clubman penning a note to the committee.
    • Conversation Summaries: “What were the main thrusts of yesterday’s team chitchat?” you ask. Claude, with MCP as trusty steed, gallops through the chat logs and returns with a tidy summary, sparing you the bother of wading through the blather yourself.

    Web Search

    • Browse the internet: “Dig up the latest gossip on AI and give me the lowdown,” you request. With the Brave or Exa MCP servers, Claude scours the web and delivers a crisp rundown like a newsboy hawking the evening edition.
    • Find restaurants: “What are the top Thai eateries near my digs?” you wonder aloud. With the Google Maps MCP server in play, Claude not only unearths the finest curry houses but pops the addresses your way, like a cabbie with a knack for spice.

    By the way, I’m using Claude as an example, but any company can become an MCP Host and create their own client. This instantly opens up a world of possibilities for their users.

    Cursor, for example, also built an MCP client. So, just like with Claude, you can install a web scraper MCP and have Cursor scrape the most up to date documentation for a Python package to use in the code it generates.

    Top MCP Servers and What They Do

    MCP servers are the building blocks that give AI access to specific parts of your digital world. Here are the most popular ones:

    1. File System MCP Server: Lets AI safely work with files and folders on your computer
    2. Slack MCP Server: Enables Claude to post messages, reply to threads, add reactions, and more in Slack.
    3. GitHub MCP Server: Helps manage code repositories and issues.
    4. Google Maps MCP Server: Enables location-based assistance
    5. Brave MCP Server: An MCP server implementation that integrates the Brave Search API, providing both web and local search capabilities.

    You can find a list of servers on the MCP site. Each server handles one specific type of connection, and you can install exactly the ones you need.

    Getting Started in 10 Minutes

    Ready to try MCP yourself? Here’s how to get started:

    1. Download Claude for Desktop

    Right now, MCP servers are hosted locally (on your computer), so we need a local client as well. Download it and install it from https://claude.ai/download

    After you install it, run it and log in to your Claude account.

    2. Install Node.js

    We’ll have to install Node.js for the same reason we’re installing Claude for desktop. We’re running everything locally and node helps us load and run the servers.

    Go to Nodejs.org and follow the instructions to do so.

    3. Install your first MCP server

    We’re going to start with the File System server. It’s created by Anthropic and allows Claude to access files on your computer.

    The first thing you need to do is click on Claude and then Settings. Go to the Developer section and hit Edit Config.

    This will then open up a folder where you’ll find a file called claude_desktop_config.json. It’s an empty file right now with a pair of curly braces {}.

    Remove those braces and paste this in instead:

    JSON
    {
      "mcpServers": {
        "filesystem": {
          "command": "npx",
          "args": [
            "-y",
            "@modelcontextprotocol/server-filesystem",
            "/Users/<add your username here>/Documents",
            "/Users/<add your username here>/Downloads"
          ]
        }
      }
    }

    This configuration file tells Claude for Desktop that we have one MCP server, called “filesystem” and that it should use Node to install and run @modelcontextprotocol/server-filesystem. This server, described here, will let you access your file system in Claude for Desktop.

    It also lists out the folders it can access. Be sure to add the correct path names (on Mac that is usually “Users/your-username/Documents”.)

    3. Try it out!

    After you save the config file, restart the Claude for Desktop application. It may take a few moments for it to start but when it does, you’ll see a little hammer icon in the bottom right corner of your chat box.

    That’s our MCP tool list! We only installed on server, but it comes pre-packaged with 11 tools like creating a directory, editing files, and so on. Now you see why it’s so cool? Imagine having to build all of this yourself.

    Let’s give it a test drive. I’ve given Claude access to a folder called Code where I store all my coding projects locally. I’m going to ask Claude to generate Hello World code in python and save that as a file in my Code folder.

    Isn’t that cool? Now it seems simple but I can extend this to having Claude generated multiple files, organize it into different folders, and even push them to GitHub if it’s a coding project, all from my chat window.

    What’s Next?

    Creating the MCP was a genius move by Anthropic. They were losing the consumer race to OpenAI who are building integrations like web search. So they’ve leaned into their developer focus, having their community build MCP servers to extend Claude’s capabilities far beyond ChatGPT instantly.

    Now that you understand why MCP matters for making AI truly helpful in your digital life, part 2 of this series will take you behind the scenes.

    We’ll explore exactly how MCP works its magic, featuring a day-in-the-life scenario showing how different MCP servers can work together to accomplish tasks you never thought possible without programming knowledge.

    Read it here and sign up below for more posts!

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • GPT4.5: A Complete Review and How It Compares To Others

    GPT4.5: A Complete Review and How It Compares To Others

    OpenAI finally released GPT4.5, hot on the heels of new SOTA models from Anthropic and xAI. As always, OpenAI hyped it up in the lead up to the launch.

    Sam Altman himself fueled the flames of expectation, describing it as “the first model that feels like talking to a thoughtful person to me” and hinting at capabilities edging closer to artificial general intelligence than ever before.

    As you can see, it’s a giant and expensive model. Andrej Karpathy reckons that every 0.5 update in the GPT series, there’s a 10X increase in compute cost. So 4.5 needed 10X more than 4 to train, and 4 needed 10X more than 3.5 to train, and so on.

    So it’s reasonable to expect some sort of step jump from 4 to 4.5 the same way we saw from previous upgrade, right? Right?

    Let’s Have A Look At The Numbers

    For all the computational resources poured into GPT-4.5, the performance improvements over GPT-4 are surprisingly modest. Let’s examine the actual benchmark data:

    On the Massive Multitask Language Understanding (MMLU) test – a comprehensive evaluation of knowledge across domains – GPT-4.5 scored approximately 89.6% versus GPT-4’s already impressive 86.4%. That’s a small improvement for what likely represents a 10X increase in computational resources.

    The pattern of modest gains continues across other benchmarks:

    • HumanEval (coding generation): GPT-4.5 achieves 88.6% accuracy, only slightly edging out GPT-4’s already near-human performance at 86.6%
    • MGSM (math problems): GPT-4.5 shows comparable performance to GPT-4 (86.9% vs 85.1%), with only modest improvements
    • DROP (reasoning): GPT-4.5 scored 83.4%, a little better than GPT4’s 81.5%.

    The other interesting this is some of these scores are lower than OpenAI’s smaller specialized reasoning models, especially the o3 series, which scores above 90% in some of these tests.

    So the data tells us that GPT-4.5 is better than GPT-4, but only incrementally so – and in some domains, it’s outperformed by more specialized, less computationally intensive models.

    Now some people say that the benchmarks aren’t the best tests, and we need better ones. And it can be argued that at such high numbers, every 1% increase is significant.

    Ok, I agree. To me the real test of how good a model is is whether the end user (you and I) finds it valuable. So, let’s judge for ourselves.

    The “Emotional Intelligence” Test: You Be the Judge

    The most intriguing claim about GPT-4.5 is its supposedly enhanced “emotional intelligence” and conversational abilities. Sam Altman’s assertion that it feels like “talking to a thoughtful person” suggests a qualitative leap in how the model handles nuanced human interaction.

    On Twitter, Andrej Karpathy ran GPT 4.5 and GPT 4o through the same set of questions and asked his audience which gave better results.

    https://twitter.com/karpathy/status/1895337579589079434

    I took inspiration from that and decided to give similar tests 4.5 and four other SOTA models for comparison: Claude 3.7, Grok 3, Gemini 2.0 Flash, and Meta’s Llama 3.3.

    To run this test, I built a little app that uses the APIs of all these models simultaneously and also calculates the response time and cost. This adds a layer of objectivity to the responses. If two models give me the same answer and one was faster and cheaper, that’s the better model.

    Here are some examples of responses:

    Q1: Invent a new literary genre blending cyberpunk, magical realism, and ancient mythology. Briefly describe the genre, name it, and provide a short sample narrative

    Q2: Describe a color that doesn’t exist but would be beautiful if it did.

    Q3: How would you console someone who just lost their job after 20 years at the same company?

    Q4: Analyze this statement for underlying emotions: ‘I’m fine with whatever you want to do. It doesn’t matter to me. You decide.’

    Q5: A self-driving car must decide between hitting three elderly pedestrians or swerving and hitting a child. Discuss the moral complexities.

    Here’s the full video if you want to see all the questions, answers, response times and costs.

    My Opinion

    I think Gemini and Meta do really well (surprisingly well) across the board. Meta got the math question wrong (which you can see in the video) but I loved the detailed answers to creative and EQ questions. Gemini made an assumption with the Brurberry question but got it right.

    If you add the response times and costs, my winner here is Gemini Flash 2.0, with Meta Llama a close second. That being said, OpenAI’s o3 is still the best for reasoning, while Claude and Grok are the best for coding.

    The Price of Incremental Progress

    I don’t know about you but I wouldn’t say 4.5 is any better than other leading models. Especially considering how slow and expensive it is.

    Which brings us to its cost. If you noticed in the video, I am also tracking how expensive each API cost is. OpenAI has priced GPT-4.5 at $75 per million input tokens and $150 per million output tokens – roughly 15 times more expensive than GPT-4o and other SOTA models.

    For perspective, a typical business use case involving moderate API usage could easily cost thousands of dollars per month on GPT-4.5, compared to hundreds for GPT-4o. Even access through ChatGPT initially required subscribing to the premium ChatGPT Pro tier at $200 per month, although they say it will soon be available at lower tiers.

    Credit Where It’s Due: Real Improvements

    Despite the underwhelming benchmark performance and concerning cost structure, GPT-4.5 does deliver meaningful improvements in two key areas: context window size and factual accuracy.

    The expanded context window of 128,000 tokens (quadrupling GPT-4’s 32,000) represents a genuine breakthrough for applications involving long documents or complex, multi-step interactions. Analysts, researchers, and content creators can now process entire reports, books, or codebases in a single prompt, eliminating the need for chunking and summarization workarounds.

    More impressive is the reduction in hallucinations – those plausible-sounding but factually incorrect outputs that have plagued large language models since their inception. On OpenAI’s internal “SimpleQA” evaluation, GPT-4.5 delivered the correct answer 62.5% of the time, compared to only 38% for GPT-4. Its hallucination rate nearly halved, from approximately 62% to 37%.

    This improved factual reliability could prove transformative for certain high-stakes applications in medicine, law, or finance, where accuracy is paramount. It represents a genuine step toward more trustworthy AI systems, even if the overall intelligence gain is modest.

    Making the Business Decision: When Is GPT-4.5 Worth It?

    For organizations weighing whether to adopt GPT-4.5, the decision comes down to a careful cost-benefit analysis. The model may be justified in scenarios where:

    1. Factual accuracy is paramount – In medical, legal, or financial contexts where errors could have serious consequences, the reduced hallucination rate might justify the premium.
    2. Long-context processing is essential – Applications requiring analysis of entire documents or complex multi-step reasoning can benefit substantially from the 128k token context.
    3. Cost is no object – For high-value applications where performance improvements of even a few percentage points translate to significant business value, the price premium may be acceptable.

    However, for most general-purpose applications, the value proposition is questionable. Companies with limited budgets may find better returns by:

    • Sticking with GPT-4o for most tasks
    • Using specialized models for specific domains (like mathematics)
    • Exploring competing models like Claude 3.7 or Gemini Ultra, which offer similar capabilities at lower price points
    • Investing in prompt engineering and fine-tuning of more affordable models

    The Future of AI Scaling: Diminishing Returns?

    GPT-4.5’s modest performance improvements despite massive computational investment raise profound questions about the future of AI development. Are we witnessing the beginning of diminishing returns in scaling language models? Has the low-hanging fruit of parameter counting and dataset expansion been largely picked?

    If we continue with the same rate of cost to train models, GPT-5 will require 100X more compute and GPT-6 10,000X more to train than GPT-4. The incremental improvement does not justify the cost.

    But there are a few things working in our favor. For starters, bigger is not necessarily better. Models like Meta’s LLama 3 and Mistral 7B show that smaller, highly optimized models can outperform massive models in certain tasks with much lower compute costs.

    We’re also seeing much better performance with Reasoning Models, which I covered in a previous blog post.

    All in all, it’s clear that throwing more compute at the problem isn’t the best solution, and we need newer techniques. And maybe we can’t get to AGI this way, but the fact is AI in it’s current state is already very useful (see agents!), and most people haven’t even scratched the surface with it yet.

  • How To Build Web-Aware AI Agents with Exa

    How To Build Web-Aware AI Agents with Exa

    Oh wait you still use Google to search the internet? In this glorious age of reasoning AI. Come on now.

    Here’s the thing, if you’re building AI agents, Google won’t cut it. You don’t need 500,000 search results. You need just the right amount of information for your agent to do its job. Your agent needs something more powerful. That’s where Exa comes in.

    Exa is a specialized web API designed specifically for AI applications. In this guide I will walk you through everything you need to know about leveraging Exa to create powerful, web-aware AI agents that can perform complex tasks with real-world data. There’s code examples that you can copy and paste too.

    But first, let’s look under the hood to see how it works.

    Why Exa for AI Agents?

    Exa positions itself as “a search engine made for AIs” and excels at retrieving high-quality, relevant web content to enhance AI applications.

    Key Advantages

    1. Designed for AI Integration: Unlike traditional search APIs, Exa is optimized for AI consumption, returning clean, parsed content that’s ready for processing by large language models (LLMs).
    2. Semantic Understanding: Exa offers neural search capabilities that understand the meaning behind queries, not just keywords, making it ideal for natural language interactions.
    3. Comprehensive Web Coverage: With very high availability across crucial categories like research papers, personal pages, news, and company information, Exa provides broad access to the web’s knowledge.
    4. Focused Results: Exa excels at finding specific entities like people, companies, and research papers—often delivering 20x more correct results than traditional search engines for complex queries.
    5. Verification Capabilities: Through its AI agent integration, Exa can verify and validate search results, ensuring higher accuracy for critical applications.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    How Exa Works Behind the Scenes

    At its core, Exa uses embeddings to transform web content into numerical vector representations. This allows it to understand the conceptual meaning of content rather than just matching keywords.

    Exa’s search engine consists of three main components:

    (1) Crawling & Indexing

    • Exa crawls the web to collect data, just like Google.
    • They identify and process URLs, storing document content in a structured format.
    • Unlike Google, which focuses on keyword-based indexing, Exa processes documents using AI models to understand content semantically.

    (2) AI Processing (Neural Search)

    • Instead of relying on traditional PageRank (which ranks results based on backlinks and domain authority), Exa uses a neural-based approach.
    • Exa’s link prediction model is trained to predict which links follow a given piece of text.
    • This means Exa learns the relationships between documents, similar to how transformers predict words in a sentence.

    Example: If an article says “Check out this aerospace startup” and links to spacex.com, Exa learns that “aerospace startup” is semantically related to SpaceX—even if the text doesn’t explicitly say “SpaceX.”

    (3) Search & Retrieval

    • At query time, Exa’s model predicts the most relevant documents, rather than relying on exact keyword matches.
    • The search query is processed like an LLM prompt, allowing for natural language queries instead of just keywords.

    Comprehensive Web Index

    Exa has indexed billions of web pages, focusing on high-quality content across various categories like research papers, personal websites, company information, and news sources. While this is smaller than Google’s trillion-page index, Exa has prioritized quality and relevance for specific use cases over sheer quantity.

    The index is maintained with special attention to categories that are particularly valuable for AI applications. For example, research papers, personal websites, and LinkedIn profiles have very high availability in Exa’s index, making it especially useful for finding specific entities and specialized information.

    Search Types and Processing

    Exa offers three main search types:

    1. Neural Search: This leverages the embedding technology for semantic understanding. When you use neural search, Exa finds content that is conceptually related to your query, even if it doesn’t contain the exact words. This is particularly effective for exploratory searches and complex concepts.
    2. Keyword Search: This more traditional approach focuses on finding exact word matches, which is useful for proper nouns or specific terminology. It’s optimized for precision when you know exactly what terms should appear.
    3. Auto Search: This lets Exa decide whether to use neural or keyword search based on the query, combining the advantages of both approaches.

    Content Processing

    When retrieving full content from web pages, Exa doesn’t just return raw HTML. It processes the content to extract the most relevant text, removing navigational elements, ads, and other noise. This clean parsing makes the content immediately usable by language models without requiring additional cleaning steps.

    Exa can also generate AI summaries of content, identify highlights based on queries, and extract structured data from web pages, further enhancing its utility for AI applications.

    Technical Infrastructure

    Under the hood, Exa likely uses a combination of:

    1. Vector Databases: To store and efficiently query the embedded representations of web pages.
    2. Large Language Models: For query understanding, content summarization, and result verification.
    3. Distributed Computing: To handle the computational demands of embedding and searching billions of web pages.
    4. Crawlers and Parsers: To continually update its index with fresh content from the web.

    API Functionality

    The Exa API exposes this functionality through endpoints like:

    • /search for finding relevant web pages
    • /contents for retrieving and processing the content of specific pages
    • Features to filter by domain, date, category, and other parameters

    The API is designed to be easily integrated with language models through SDKs for Python and JavaScript, making it straightforward to incorporate web data into AI workflows.

    Getting Started with Exa

    Now that you have an understanding of how Exa works, we can get to building agents with it. To begin building with Exa, you’ll need to:

    1. Create an Account: Visit dashboard.exa.ai to register for an account.
    2. Obtain an API Key: Generate your API key from the dashboard at dashboard.exa.ai/api-keys.
    3. Install the SDK: Choose between the Python or JavaScript SDK based on your preferred development environment. When building pure AI agents, I use Python, so all code examples in this post will use Python.
    4. Set Up Environment Variables: Create a .env file to securely store your API key

    Basic Exa Implementation

    Here’s an example of how to initiate an Exa search in Python:

    Python
    import os
    from exa import Exa
    from dotenv import load_dotenv
    
    # Load environment variables
    load_dotenv()
    
    # Initialize the Exa client
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    
    # Perform a basic search
    results = exa.search("Latest research in LLMs", type="auto")
    
    # Print the results
    for result in results:
        print(f"Title: {result.title}")
        print(f"URL: {result.url}")
        print(f"Published Date: {result.published_date}")
        print(f"Score: {result.score}")
        print("---")

    As you can see, it’s pretty simple. You initialize the Exa client, pass it a string to search, and then it returns a bunch of results. There are options to customize how the results are returned (how many, what level of detail, etc.) We’ll explore this with a few more examples.

    Building Your First AI Agent with Exa

    Let’s walk through creating a basic research agent that can gather information on a specific topic and provide a summary.

    Python
    import os
    from exa import Exa
    from dotenv import load_dotenv
    from openai import OpenAI
    
    # Load environment variables
    load_dotenv()
    
    # Initialize clients
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    def research_agent(topic, num_results=5):
        """
        A simple research agent that gathers information on a topic and summarizes it.
        
        Args:
            topic: The research topic
            num_results: Number of search results to consider
        
        Returns:
            A summary of findings
        """
        # Step 1: Search for relevant information
        print(f"Searching for information on: {topic}")
        search_results = exa.search_and_contents(
            query=topic,
            num_results=num_results,
            text=True  # Retrieve the full text content
        )
        
        # Step 2: Extract and compile the content
        all_content = ""
        sources = []
        
        for i, result in enumerate(search_results):
            if hasattr(result, 'text') and result.text:
                all_content += f"\nSource {i+1}: {result.title}\n{result.text[:1000]}...\n"
                sources.append(f"{i+1}. {result.title} - {result.url}")
        
        # Step 3: Summarize with LLM
        prompt = f"""
        Based on the following information about "{topic}", provide a comprehensive summary:
        
        {all_content}
        
        Your summary should be well-structured, factual, and highlight the most important points.
        """
        
        response = openai_client.chat.completions.create(
            model="gpt-4-turbo",
            messages=[
                {"role": "system", "content": "You are a research assistant that summarizes information accurately."},
                {"role": "user", "content": prompt}
            ]
        )
        
        summary = response.choices[0].message.content
        
        # Compile final report
        final_report = f"""
        # Research Summary: {topic}
        
        {summary}
        
        ## Sources
        {'\n'.join(sources)}
        """
        
        return final_report
    
    # Example usage
    if __name__ == "__main__":
        result = research_agent("Advances in quantum computing in 2024")
        print(result)

    This agent demonstrates the core workflow of using Exa for AI agents:

    1. Our agent starts by taking a query from the user and searching with Exa
    2. We use the search_and_contents function to not just get back a list of URLs that best match our query but also the content within those pages.
    3. We then use an LLM (in this case, GPT-4) to analyze and summarize the findings and format it into a report.

    Want to build your own AI agents?

    Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

    More Complex AI Agents

    As you become more comfortable with basic implementations, you can build more sophisticated AI agents for specific use cases. Let’s explore three powerful examples.

    1. Competitor Research Agent

    This agent automatically discovers and analyzes competitors for a given company, compiling insights into a structured report.

    Python
    import os
    from exa import Exa
    from dotenv import load_dotenv
    from openai import OpenAI
    
    load_dotenv()
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    class CompetitorResearchAgent:
        def __init__(self):
            self.exa_client = exa
            self.openai_client = openai_client
            
        def find_competitors(self, company_name):
            """Find potential competitors using Exa search"""
            query = f"Top competitors of {company_name}"
            results = self.exa_client.search(query, num_results=10, exclude_domains = ["mydomain.com"])  # you can change this
            return results
            
        def analyze_competitor(self, competitor_result):
            """Analyze a specific competitor based on web content"""
            # Get detailed content about the competitor
            content_result = self.exa_client.get_contents(competitor_result.id, text=True)
            
            if not hasattr(content_result, 'text') or not content_result.text:
                return {
                    "name": competitor_result.title,
                    "url": competitor_result.url,
                    "overview": "No detailed information available",
                    "products": [],
                    "strengths": [],
                    "weaknesses": []
                }
                
            # Use LLM to extract structured information
            prompt = f"""
            Based on the following content about a company, extract:
            1. Company name
            2. Brief overview (2-3 sentences)
            3. Main product offerings (up to 5)
            4. Key strengths (up to 3)
            5. Potential weaknesses (up to 3)
            
            Content:
            {content_result.text[:4000]}
            
            Format your response as JSON with the following structure:
            {{
                "name": "Company Name",
                "overview": "Brief overview",
                "products": ["Product 1", "Product 2", ...],
                "strengths": ["Strength 1", "Strength 2", ...],
                "weaknesses": ["Weakness 1", "Weakness 2", ...]
            }}
            """
            
            response = self.openai_client.chat.completions.create(
                model="gpt-4-turbo",
                response_format={"type": "json_object"},
                messages=[{"role": "user", "content": prompt}]
            )
            
            try:
                import json
                analysis = json.loads(response.choices[0].message.content)
                analysis["url"] = competitor_result.url
                return analysis
            except:
                return {
                    "name": competitor_result.title,
                    "url": competitor_result.url,
                    "overview": "Analysis failed",
                    "products": [],
                    "strengths": [],
                    "weaknesses": []
                }
        
        def generate_report(self, company_name):
            """Generate a complete competitor analysis report"""
            print(f"Finding competitors for {company_name}...")
            competitors = self.find_competitors(company_name)
            
            if not competitors:
                return f"No competitors found for {company_name}"
                
            print(f"Found {len(competitors)} competitors. Analyzing each...")
            
            analyses = []
            for competitor in competitors:
                print(f"Analyzing {competitor.title}...")
                analysis = self.analyze_competitor(competitor)
                analyses.append(analysis)
                
            # Generate the final report
            report = f"# Competitor Analysis Report for {company_name}\n\n"
            report += f"## Executive Summary\n\n"
            
            # Use LLM to generate executive summary
            companies_list = ", ".join([a["name"] for a in analyses])
            summary_prompt = f"""
            Create a brief executive summary for a competitor analysis report for {company_name}.
            The identified competitors are: {companies_list}.
            Keep it under 150 words and highlight key insights.
            """
            
            summary_response = self.openai_client.chat.completions.create(
                model="gpt-4-turbo",
                messages=[{"role": "user", "content": summary_prompt}]
            )
            
            report += f"{summary_response.choices[0].message.content}\n\n"
            
            # Add detailed competitor analyses
            report += f"## Detailed Competitor Analysis\n\n"
            
            for analysis in analyses:
                report += f"### {analysis['name']}\n\n"
                report += f"**Website**: {analysis['url']}\n\n"
                report += f"**Overview**: {analysis['overview']}\n\n"
                
                report += "**Product Offerings**:\n"
                for product in analysis['products']:
                    report += f"- {product}\n"
                report += "\n"
                
                report += "**Key Strengths**:\n"
                for strength in analysis['strengths']:
                    report += f"- {strength}\n"
                report += "\n"
                
                report += "**Potential Weaknesses**:\n"
                for weakness in analysis['weaknesses']:
                    report += f"- {weakness}\n"
                report += "\n"
                
            return report
    
    # Example usage
    if __name__ == "__main__":
        agent = CompetitorResearchAgent()
        report = agent.generate_report("MyAwesomeStartup") #insert your company name here
        print(report)
        
        # Save report to file
        with open("competitor_analysis.md", "w") as f:
            f.write(report)

    There’s a lot going on here so let’s break it down.

    First we run a basic Exa search for “Top competitors of {company_name}” which in this case would be your company. We’re also setting a parameter for number of search results (in this case 10), which you can customize.

    We also exclude our own company just in case but you can add more domains to exclude here.

    We get the results back in a list which we then loop through. At this point, you can also filter it further but for each URL, we run exa.get_contents() which simply scrapes that URL and returns all the page content.

    We then use GPT 4 to analyze the content and turn it into a comprehensive competitor research report. This section is mostly prompt engineering so feel free to play around and try some other prompts.

    2. Newsletter Generation Agent with CrewAI

    For more complex workflows, you can combine Exa with agent orchestration frameworks like CrewAI. This example creates a team of specialized agents that work together to generate a newsletter.

    Python
    import os
    from dotenv import load_dotenv
    from crewai import Agent, Task, Crew, Process
    from exa import Exa
    from openai import OpenAI
    
    load_dotenv()
    exa = Exa(api_key=os.getenv("EXA_API_KEY"))
    openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
    
    # Create specialized agents
    researcher = Agent(
        role="Research Specialist",
        goal="Find the latest and most relevant news on a given topic",
        backstory="You are an expert researcher who excels at finding accurate information online",
        verbose=True,
        allow_delegation=True,
        tools=[exa.search, exa.get_contents],
        llm=openai_client,
    )
    
    fact_checker = Agent(
        role="Fact Checker",
        goal="Verify information accuracy and source credibility",
        backstory="You're a meticulous fact-checker with years of experience in journalism",
        verbose=True,
        allow_delegation=True,
        tools=[exa.search, exa.get_contents],
        llm=openai_client,
    )
    
    writer = Agent(
        role="Newsletter Writer",
        goal="Create engaging, informative newsletter content",
        backstory="You're a talented writer who specializes in distilling complex topics into readable content",
        verbose=True,
        allow_delegation=False,
        llm=openai_client,
    )
    
    # Create tasks for each agent
    def create_newsletter_crew(topic):
        research_task = Task(
            description=f"Research the latest news about {topic} from the past week. Find at least 5 important developments or stories.",
            expected_output="A list of news items with titles, brief summaries, and source URLs",
            agent=researcher,
        )
    
        verification_task = Task(
            description="Verify the accuracy of each news item and evaluate the credibility of sources",
            expected_output="Verified list of news items with credibility scores for each source",
            agent=fact_checker,
            context=[research_task],
        )
    
        writing_task = Task(
            description=f"Create a newsletter about {topic} based on the verified research. Include an introduction, summaries of the top stories, and a conclusion.",
            expected_output="Complete newsletter in HTML format, ready to be sent",
            agent=writer,
            context=[verification_task],
        )
    
        # Create and run the crew
        crew = Crew(
            agents=[researcher, fact_checker, writer],
            tasks=[research_task, verification_task, writing_task],
            verbose=True,
            process=Process.sequential,
        )
        
        return crew
    
    # Example usage
    if __name__ == "__main__":
        topic = "Artificial Intelligence Advancements"
        crew = create_newsletter_crew(topic)
        result = crew.kickoff()
        
        with open(f"{topic.replace(' ', '_')}_newsletter.html", "w") as f:
            f.write(result)
        
        print(f"Newsletter generated and saved to {topic.replace(' ', '_')}_newsletter.html")

    The research agent uses Exa to find relevant news, the fact-checker verifies information, and the writer compiles everything into a cohesive newsletter. You can tweak the instructions to structure the newsletter in a specific way.

    3. Recruiting Agent with Exa and OpenAI

    This agent automates the process of discovering, researching, and evaluating exceptional candidates for recruitment purposes.

    It’s a lot more complicated than the previous two and the code requires multiple files, so I haven’t added it here. I will describe the logic to it so that you can try it out on your own. If you need help, contact me!

    1. Candidate Discovery: We start with an Exa search for qualified professionals based on a job title and required skills, focusing on LinkedIn profiles and GitHub accounts. In the Exa API call you can limit results to certain domains.
    2. Comprehensive Research: For each potential candidate, it gathers information from their LinkedIn profile and then tries to find their personal websites or GitHub profiles for additional context.
    3. Intelligent Evaluation: Using GPT-4o, it evaluates each candidate against job requirements, scoring them on technical skills, experience, education, communication, and overall fit.
    4. Similar Candidate Finding: After identifying top performers, it uses Exa’s semantic search to find similar professionals, expanding the talent pool.
    5. Structured Reporting: It generates a comprehensive markdown report with an executive summary, detailed evaluations of each candidate, and recommendations.

    Additional Exa Features and Best Practices

    As you build more complex agents with Exa, try using these additional features and best practices to maximize effectiveness:

    Filtering and Refinement

    Exa offers powerful filtering options to narrow down search results:

    Python
    # Date filtering
    results = exa.search(
        "AI ethics developments",
        start_published_date="2024-01-01",
        end_published_date="2024-12-31"
    )
    
    # Domain filtering
    results = exa.search(
        "Climate change research",
        include_domains=["nature.com", "science.org"],
        exclude_domains=["wikipedia.org"]
    )
    
    # Category filtering
    results = exa.search(
        "Quantum computing advances",
        category="research paper"
    )

    Content Retrieval

    For deeper analysis, retrieve the full content of web pages, or an AI-generated summary:

    Python
    # Retrieve text content
    content = exa.get_contents(result_id, text=True)
    
    # Retrieve highlights based on a query
    highlights = exa.get_contents(result_id, highlights="quantum advantage")
    
    # Retrieve AI-generated summaries
    summary = exa.get_contents(result_id, summary=True)

    Working with Websets

    Exa’s Websets feature, launched in December 2024, transforms complex searches into structured datasets. This powerful tool helps you find specific entities like people, companies, and research papers with greater precision than traditional search engines.

    I’ve played around with it and I’ve found it to be really good for use cases like sales, HR and recruiting, and even finding founders to invest in. It doesn’t just bring you a list, it also researches them and verifies all the information

    Advanced AI Agents with Exa

    If you’ve made it this far, congrats. Hopefully you’ve tried some of the code samples above and even adapted it to your use case. In this section, I’m going to talk about even more complex builds. These are large projects I’ve built out for clients and there’s too much code to add to this post, so I’m just going to explain the intuition behind them.

    Chain-of-Thought Research Agent

    One powerful pattern for research agents is the chain-of-thought approach, where the agent breaks down complex questions into sub-questions, researches each one separately, and then synthesizes the findings. Here’s how it works:

    1. Question Decomposition: When given a complex research question, the agent uses GPT-4 to break it down into 3-5 more focused sub-questions. Each sub-question targets a specific aspect of the main question, making the research more manageable and thorough.
    2. Sub-Question Research: For each sub-question, the agent uses Exa to search the web for relevant information. It gathers content from multiple sources, extracts the most important passages, and then uses GPT-4 to formulate a concise but comprehensive answer based strictly on the information found.
    3. Synthesis of Findings: After researching all sub-questions, the agent compiles the individual answers and uses GPT-4 to synthesize them into a cohesive response to the original question. This step ensures that connections between different aspects are identified and incorporated.
    4. Report Generation: Finally, the agent creates a structured research report with an executive summary containing the synthesized answer, followed by detailed findings for each sub-question with their respective sources properly cited.

    This approach mirrors how human researchers tackle complex topics—breaking down big questions into manageable parts, researching each thoroughly, and then connecting the dots to form a complete picture. It’s particularly effective for multifaceted questions that require exploring different angles or domains of knowledge.

    Here’s an example of a research agent I built:

    Multi-Modal Generation with Exa and LLMs

    This agent implements a comprehensive multi-modal content production pipeline that mimics the process a professional content creator might follow. The workflow consists of six carefully orchestrated steps:

    1. Topic Research: When given a topic, the agent first conducts thorough research using Exa’s search and content retrieval capabilities. It gathers information from multiple sources, extracting relevant text and organizing it with proper attribution.

    2. Visual Element Research: Recognizing that engaging content isn’t just text, the agent searches for relevant imagery related to the topic. Again, with Exa, we can search specifically for images. We can also limit the search to sites like Unsplash.

    3. Structured Outline Generation:With research in hand, the agent uses GPT-4 to create a comprehensive outline for the article with the proposed title, introduction concept, detailed section breakdowns with subheadings, key points for each section, and a conclusion approach.

    4. Data Visualization Creation: Here, the agent generates custom Python code for a data visualization relevant to the article topic, if applicable. It analyzes the research data to identify key concepts that would benefit from visual representation, then creates complete, executable code using matplotlib or seaborn.

    5. Article Writing: The agent then synthesizes all the previous elements – research, outline, images, data viz – into a complete article. It follows the outline structure precisely, incorporates references to the suggested images, maintains an engaging writing style, and includes proper citations to the original sources.

    6. Content Package Assembly: Finally, the agent compiles everything into a comprehensive content package containing the original topic, structured outline, finished article text, data visualization code, image descriptions, and source references. This modular approach makes it easy to use the outputs in various ways – publishing the article as-is, extracting just the visualization code, or using the outline as a starting point for further development.

    What will you build with Exa?

    As we’ve explored throughout this guide, Exa represents a powerful evolution in how AI agents interact with the web. By providing a dedicated search API optimized specifically for AI consumption, Exa bridges a crucial gap between large language models and the vast, constantly updating knowledge contained on the internet.

    The agents we’ve examined demonstrate how this integration creates systems that are not merely intelligent but also well-informed. These agents ground their reasoning in current, relevant information rather than relying solely on their training data, which inevitably becomes outdated over time.

    I hope this guide serves as a starting point for you, demonstrating what’s possible today. So, tell me, what will you build with Exa?

    Need Help Building AI Agents?

    Our custom AI agents go beyond typical automation tools. We create sophisticated digital workers that understand your business context, integrate seamlessly with your systems, and continuously improve through real-world interactions.

  • The SAAS Advertising Playbook for 2025: A Framework for Profitable Ad Campaigns with AI

    The SAAS Advertising Playbook for 2025: A Framework for Profitable Ad Campaigns with AI

    Update

    This post was first published in 2020. While the core framework described here is still relevant, I’ve updated it for 2025 to talk about how AI can be used to build it out.

    What platforms should we advertise on? How much budget should we allocate to each? What’s the ballpark CPC for our industry?

    I often hear questions like this from founders and marketers when they explore advertising for their SAAS startup. And my answer is always the same, “it depends”.

    It depends on so many factors and if someone can give you a straight answer, they’re scamming you.

    I’ve helped dozens of SAAS companies like Typeform, Olark, and ClickUp set up ad campaigns, and I know from experience that, while there are similarities, there are also differences.

    This is where a framework comes in handy. It’s a systematic process that will help you figure out which platforms to advertise on, what your ads should look like, and now in 2025, how to effectively integrate AI into your advertising strategy. Instead of guessing or listening to the “gurus” and ending up with sub-optimal ad campaigns, you’ll be able to devise a profitable ad strategy that works for your SAAS.

    In this updated post, I’ll share the framework I use for every SAAS company I work with, now enhanced with AI-powered strategies that simply weren’t available or mature enough five years ago.

    PS – If you prefer video, here it is –

    An Overview Of The Framework

    The framework I use for ads is conceptually similar to my overarching framework for identifying growth channels at a startup. Both frameworks are built on the stage of awareness concept created by Eugene Schwartz in his book Breakthrough Advertising.

    You’ve probably come across some version of it before. Customers typically start off being Unaware that they have a problem. At some point, it comes to their attention that they have a problem, so they’re Problem Aware. 

    As they start looking for ways to solve their problem, they become Solution Aware. Then they become Product Aware as they learn of the products that enable these solutions. And finally, they make a Decision.

    I’ve adapted this concept to SAAS with the main idea being that people are searching for different things or have different needs at each stage of the journey, hence they should see different ads.

    What’s changed since 2020 is that AI now enables us to:

    • Test hundreds of creative approaches simultaneously
    • More precisely identify what stage a prospect is in
    • Create personalized content at scale for each stage
    • Dynamically optimize campaigns across the awareness spectrum

    A Hypothetical Example – Shopify

    The bulk of Shopify’s marketing is aimed at first-time eCommerce entrepreneurs though a second and, I’d expect, more profitable customer persona would be the entrepreneur with an existing retail store.

    So let’s say you’re a purveyor of tiger cubs and you sell them at your zoo/shop. Business is going well and you’re pretty happy until one day you realize that you could actually be making more. 

    Maybe it’s because your main competitor, Carole Baskin, is doing way better than you and you absolutely hate her guts, especially since she definitely killed her husband and got away with it. Or you may have just tapped out the local market and you’ve hit a plateau. Whatever it is, you realize you have a problem. You’re Problem Aware.

    To solve this problem, you look for solutions. Maybe you dig into Carole’s business, or you start reading articles online about how to grow your business. Your research leads you to many solutions like opening up new locations, switching to a franchise model, or selling online. You’re Solution Aware.

    Let’s say you decide to go online because of the attractive cost structure. Now you’re trying to figure out how that works. Should you hire someone to build a website and checkout system for you? Is there an easier way? You do a lot of Googling, subscribe to the newsletter of one of those fake make money online gurus, and find out that there are products like Shopify, Magento, and BigCommerce that make it easy. You’re Product Aware.

    Now it’s time for you to make a decision. After asking around, looking at reviews, and even trying out the different products, you pick one and move your store online. Congratulations, you’re now selling tiger cubs internationally while you sit at home in your pajamas.

    As you can see, understanding this journey for your customers and business allows us to determine the targeting, messaging, and call to action at each stage. We can also create a sequence of ads and landing pages that take people from one stage to the next. 

    For Panoply, we’ve seen far greater effects by segmenting landing pages to ad intents than broad tactical message or layout tests.

    Trevor Fox, Panoply

    The closer someone is to that final Decision stage, the easier it is to convert them. So I usually start my campaigns with the Product Aware stage and move backward. That way I can get some early wins before tackling the tougher ones.

    Let’s dig deeper into each stage and see how it works.

    Product Aware

    In this stage, the prospect knows about the various products that solve their problem, and they’re trying to figure out which one to pick. 

    In SAAS, this usually means taking a free trial, looking up reviews or comparison posts, or talking to people at other companies to see which product they picked.

    That implies anyone who has come across your product, or a competitor’s product, but is still trying to decide, is in the Product Aware stage. Most people who have either visited your website but haven’t signed up, or are specifically Googling a competitor’s product, or are looking at reviews on a site like G2Crowd, are in this category.

    So your targeting becomes website traffic, competitor searches on Google, and review sites. Your message is whatever can convince the customer that your product is better than the others they’re looking at.

    Let’s look at some examples of Product Aware campaigns –

    Competitor Targeting

    In 2014, I used to work for a company called LemonStand, an eCommerce platform like Shopify, and recently acquired by Mailchimp.

    Back then we couldn’t afford to advertise for keywords like “eCommerce platform” because the word eCommerce was bid up by Shopify. In fact, if you searched for “worst eCommerce platform” you’d see their ad, though now they’ve wisened up.

    To side-step this we simply advertised to people searching for Shopify. They weren’t a household name like they are now in 2020 but they were still well known in the eCommerce space. We figured anyone looking for them could be a potential customer.

    Our biggest differentiating factor was customizability. With LemonStand you could change every aspect of your online store, but you couldn’t do that with Shopify. 

    And the campaign worked! We got leads that wanted more control over their store. We even drew the ire of Shopify who reached out to us via email and then started their own competitor ads, which didn’t really matter to us because we didn’t have a lot of traffic anyway.

    Sadly I can’t seem to find screenshots of my campaigns, but it turns out Wix is carrying on the fight.

    There’s is an art to tasteful competitor campaigns as I’ve learned over the years since, and I’ll share them in a separate blog post. But look up a well-known software product in any industry and you’re bound to see competitor ads. Use those as guidelines to set up your own!

    AI-Enhanced Competitor Targeting (2025 Update)

    Competitor targeting remains as effective as ever, but AI has changed the game. Today, we use language models to analyze competitor content, product pages, and reviews to identify specific pain points their customers experience.

    For example, at one client (an analytics company), we developed an AI system that scrapes thousands of reviews across G2, Capterra, and other review sites to identify exactly where competitors are falling short. We then generate ads that directly address those unmet needs with our solution.

    Another client used this approach to identify that a major competitor was receiving numerous complaints about their mobile app experience. Within days, we launched a campaign specifically targeting searches for “[Competitor] mobile app problems” and related terms, resulting in a 42% lower CPA compared to broader competitor terms.

    The key AI enhancement is specificity at scale. We can now target hundreds of micro-segments within competitor audiences, each with tailored messaging.

    Retargeting

    I often suggest setting up retargeting early on even if you’re not planning on launching other ad campaigns. A good retargeting campaign that runs in the background can bring in new customers for pretty cheap. I prefer Facebook and Instagram for this but that depends on where your audience is.

    One tactic is to promote a customer testimonial or case study as social proof. Try to highlight why a customer chose you over other products. Another previous client, an eCommerce customer support app called Gorgias, does this well.

    For freemium SAAS, I sometimes break it up into two. The first ad converts site traffic to free users, and the second converts free users to paid.

    Here’s one I made for Scott’s Cheap Flights. It’s not exactly SAAS but works on a freemium subscription model. They send you price drop alerts for international flights every day for a small fee.

    I used a mild “Fear of missing out” tactic by highlighting some of the deals they missed by not signing up.

    I then follow up with a case study, similar to Gorgias, to convert the free users to premium.

    Remember that in this stage people are almost ready to make a decision, so your CTA for any ad and landing page is to sign up for your product, whether it’s free or paid.

    Solution Aware

    Once I max out Product Aware channels, and that could happen quickly if your competitors have low search volume or you have a large budget, I move back one stage to Solution Aware.

    Here people are still researching solutions to their problems. They may not be ready to pick a product, and may not even know what products exist yet.

    So on Google, they may be searching for “XYZ software” or “How to automate XYZ”. I may sometimes send traffic straight to a landing page depending on the search query.

    On social networks like Facebook, Instagram, or LinkedIn, you could target a custom audience of prospect emails. I often promote blog content or webinars here to get an email address at the very least. After this, our retargeting ads kick in.

    Here are some example campaigns – 

    Buyer Intent Campaigns

    When expanding past competitor ads and retargeting into Solution Aware campaigns, I like to start with Google ads targeting buyer-intent keywords. These are keywords that indicate someone is looking for a particular solution. They may not necessarily know which products exist but at least they have an idea of the solution they need.

    For example, one of my past clients is Procurify, a procurement software company. They have a complex product with features for every part of the procurement process – from creating purchase orders to approving them based on budgets, to paying suppliers, and tracking the delivery.

    This is to say they solve many problems. So while they brand themselves as a “spend management” software, I also figured that anyone looking for, say, software to manage purchase orders would be qualified. So each solution Procurify offered had its own campaign. 

    These campaigns went to a landing page that talked about that specific solution, and also introduced the product and its other features. In fact, I didn’t send them to a landing page with only one CTA. While the primary CTA was to book a demo, the page I sent them to was also just a regular product page with links to other pages. 

    The reason is, again, we’re in a Solution Aware stage and people may not necessarily be ready to buy the product even though the keywords have buying intent. The page allowed visitors to self-select and either book a demo if they wanted to move quickly, or explore the rest of the site and understand the product a bit more.

    And if they chose the latter, that was fine because our YouTube and Facebook retargeting from the Product Aware stage would kick in.

    AI-Enhanced Buyer Intent Campaigns (2025 Update)

    The biggest change in Solution Aware campaigns is the ability to create hundreds of micro-targeted landing pages that precisely match search intent.

    Back in 2020, we might create 10-20 landing pages for different solution categories. Today, with AI-generated content and design tools, we can create hundreds of hyper-specific landing pages that exactly match long-tail search queries, each with unique headlines, content, and social proof elements.

    What makes this sustainable in 2025 is that these pages aren’t static, they’re dynamically assembled based on:
    – Search query data
    – User location and industry
    – Real-time conversion performance
    – Current feature availability

    Informational Campaigns

    Many B2B SAAS companies have large email lists for cold outreach. You can upload these into Facebook to create custom audiences that are probably Solution Aware, if not Product Aware, depending on your initial criteria for creating the list.

    This strategy worked well for Plato, a mentorship platform for engineering managers. We’d run webinars with the VPs of engineering and product at high-profile tech companies like Lyft and Segment. 

    Then, we’d promote those webinars to a custom audience of our email list on Facebook and Instagram and optimize for webinar signups.

    After that, our retargeting campaign would kick in and push people to sign up for a demo of Plato.

    Another option is targeted newsletter inserts. Marketplaces like BuySellAds allow you to place ads in curated lists. So instead of building the list yourself and then reaching them in Facebook, you use an existing list and reach them directly.

    AI-Powered Content Distribution (2025 Update)

    Content marketing has undergone a revolution with AI-powered distribution. Rather than creating a single piece of content and promoting it broadly, we now:

    1. Create core “pillar” content pieces
    2. Use AI to generate dozens of derivatives optimized for each platform
    3. Deploy AI targeting to find the perfect audience match

    For one client, we evolved their webinar strategy with an AI system that automatically:

    – Transcribes webinar content
    – Identifies the most impactful insights
    – Creates platform-specific snippets (LinkedIn posts, Twitter threads, TikTok clips)
    – Identifies and targets professionals most likely to engage with each specific insight

    This approach has reduced their cost per signup by 51% while increasing overall webinar participation.

    Problem Aware and Unaware

    You’ll find that a bulk of your ad spend will go to Solution Aware and you may not even need to hit the Problem Aware stage.

    In fact, people in the Problem Aware stage are so early in the buying process that if you’re expecting an instant return on your ad spend, you definitely shouldn’t bother with this.

    However, if you have more realistic expectations, here are some campaign ideas. For starters, your audience will probably just be anyone in your ideal customer profile. 

    On Facebook and Instagram this could be a lookalike audience or interest-based audience. So Shopify would target people who like entrepreneurship. 

    On LinkedIn, you could use job title targeting. On Google Display and YouTube you have audience-based options for people in a certain industry or in the market for certain types of software.

    Pain Point Campaigns

    One example would be a campaign that targets a broad audience and highlights a pain point that your product solves. The idea here is to bring awareness to that problem so that the prospect starts to think about it and eventually begins the journey of looking for a solution.

    Grammarly does an amazing job here. This YouTube video ad targets students and all it does is highlight the pain point of writing a term paper. 

    Notice there’s no narrator going “Grammarly helps you yada yada yada…” It’s just a simple ad showing a student who is frustrated initially while writing a term paper and then eventually gets an A+ because she used Grammarly. 

    As a student, you may realize, upon seeing this ad, that you have a problem with writing grammatically correct sentences too, and that might get you to start taking it more seriously, eventually ending up using a tool like Grammarly.

    Here’s another example from a past client, ClickUp, a project management software. They’re using a lookalike audience to highlight the pain point of not being able to visualize projects.

    In fact, many display, video and even billboard ad campaigns fall under this category. The point is not to get you to buy something instantly, which is why all the new-age marketers who make fun of it don’t really understand what’s happening. 

    The point is simply to highlight a pain point you might have and get you to start thinking about solving it. If you click through and sign up immediately, that’s just a bonus.

    AI-Powered Predictive Problem Awareness (2025 Update)

    The most innovative AI application in advertising is the ability to predict who is about to become Problem Aware before they even recognize it themselves.

    Using federated learning models that respect privacy while analyzing behavior patterns, we can now identify companies that are displaying indicators of specific problems:
    – For a customer success platform, we identify companies showing increased customer churn signals on social media
    – For a security solution, we spot organizations showing vulnerability markers before they experience a breach
    – For financial software, we detect companies with inefficient spending patterns in public financial data

    This predictive approach means we can introduce problems to prospects right before they would discover them naturally, positioning our solutions as forward-thinking and proactive.

    Optimization and Scaling

    So the framework helps you decide which platforms to use, and what your ads and landing pages should say. But how do you know if your campaigns are working or not? When do you decide to cut something or double down on it?

    For starters, you want to be profitable. In general, I like to aim for a 3:1 ratio of LTV to CAC. Sometimes, depending on the payback period, I may even go higher.

    I also try as much as possible to look at the CAC holistically. So instead of comparing platforms, since my platforms are working together to create a cohesive ad campaign, I’m looking at overall CAC.

    To optimize within platforms, I’d do a straight comparison for campaigns in a certain stage of the journey. So I wouldn’t compare a Product Aware campaign on Google against a Solution Aware campaign.

    But I would compare all Product Aware campaigns against each other to decide which ones to double down on and which ones to drop.

    I’d then do the same thing on an ad set and ad level.

    As you pause underperforming ads or campaigns, expand the best-performing ones, and even add new campaigns as the framework dictates, you’ll be able to scale your ads profitably.

    AI Creative Testing (2025 Update)

    The days of A/B testing two ad variants are long gone. Today’s AI systems enable:
    Multivariate testing at unprecedented scale: Testing hundreds of creative combinations simultaneously
    Real-time generation of new variants: Based on early performance signals
    Element-level analysis: Understanding which specific components of successful ads drive conversions
    Cross-channel creative insights: Learning what works on one platform and adapting it for others

    For one client, we built an AI system that continuously generates and tests new ad creative variations. The system identified that customer testimonials featuring specific metrics outperformed generic social proof by 73%. It then automatically generated dozens of new testimonial-based ads, each highlighting different concrete results.

    Putting it all together

    In the end, you may end up with a set of ads across different channels that work together to move people from one stage in the journey to the next, and finally to a purchasing decision.

    As you can see, there’s no “best ad platform” or “best type of ad”. They each have their strengths and work at different parts of the journey.

    If you’re setting up ads for your SAAS business, I suggest testing the waters with the Product Aware campaign ideas I suggested.

    And if you need help building out some of the AI automations I mentioned, feel free to contact me!

  • The Age of Reasoning: AI’s Evolution from Augmentation to Transformation

    The Age of Reasoning: AI’s Evolution from Augmentation to Transformation

    Some time during the summer of last year there were rumours that the current architectures for language models and AI were hitting a wall. When GPT-3 first came out, it amazed us with its ability to generate human-like text, enabling people to produce more work, faster.

    Soon we had Claude, Gemini, and other language models. Stable Diffusion came out with image models. We also got multi-modal models like GPT-4o. These models were powerful tools for augmentation, helping with tasks from writing to image generation and even coding.

    But things seemed to slow down for a bit. Sequoia asked about AI’s $600B Question. Newer models and updates didn’t seem to have the same impact as the jump from GPT 2 to 3, or even 3 to 4. Many wondered if the scaling laws would hold even as big AI companies poured more money into training larger models.

    And then we discovered something. Giving models more compute during inference (letting it think about a problem) dramatically improved results. This gave us reasoning models, first seen in OpenAI’s o1 model last September, o3 in December, then Gemini 2.0, DeepSeek, Grok 3, and now Claude 3.7.

    In this article, we’ll define reasoning AI, explore its technical underpinnings, and assess its implications for work and life. We’ll then scrutinize its limitations, look ahead to its future, and conclude with reflections on this pivotal moment.

    What is Reasoning in AI?

    At its core, reasoning in AI refers to the ability of a model to solve problems by breaking them down into logical, step-by-step processes. This is akin to human cognition, where we think through a problem, consider various angles, and arrive at a solution through a series of reasoned steps.

    In contrast, traditional generative AI models, while impressive in their ability to produce coherent text, often lacked the depth required for complex problem-solving. They could mimic human language but struggled with tasks that demanded structured thinking, such as solving a math word problem or debugging a piece of code.

    Chain Of Thought

    The breakthrough came with the development of techniques like Chain-of-Thought (CoT) prompting. Introduced by researchers in 2022, CoT encourages models to “think aloud” by generating intermediate steps before arriving at a final answer.

    This simple yet powerful method dramatically improved the performance of LLMs on reasoning tasks. For example, on the GSM8K benchmark—a dataset of grade school math problems—CoT increased the accuracy of GPT-3 from 15.6% to 46.9%. This was a clear signal that models could be taught to reason, not just generate.

    Variations of CoT include zero-shot CoT, where models are prompted with “Let’s think step by step” without examples, and code-based reasoning, where models trained on code (e.g., Codex) perform better when reasoning is framed as code generation. Self-consistency and tool-based evaluation, such as using Python for math verification, further enhance accuracy.

    Reinforcement Learning

    But CoT was just the beginning. In January 2025, DeepSeek made waves showing how the integration of reinforcement learning (RL) matched OpenAI’s o1 models at 95% lower cost. OpenAI later admitted this is what they used to train their reasoning models as well.

    RL allows AI systems to learn from feedback, refining their reasoning processes over time. It optimizes their chain-of-thought processes, enabling them to tackle increasingly complex tasks. RL could be applied during training, using a reward model to score sub-tasks, or during inference, dynamically evaluating reasoning paths.

    For instance, o1 achieved an astonishing 83% accuracy on International Mathematics Olympiad problems, compared to just 13% for its predecessor, GPT-4o. Similarly, Grok 3 claims to outperform leading models in math, science, and coding benchmarks.

    The New Reasoning Models

    The landscape of LLMs has shifted toward models optimized for reasoning, particularly since late 2024. This new age is characterized by models that prioritize step-by-step problem-solving, aligning with human cognitive processes. These systems don’t just assist; they think, solving problems with human-like logic.

    • OpenAI o1 and o3: o1, initially a preview model, was fully released by December 2024, with o3 enhancing capabilities. These models are trained to generate long chains of thought, achieving 83% accuracy on International Mathematics Olympiad qualifying exam problems, compared to 13% for GPT-4o. They use reinforcement learning (RL) to refine CoT, as noted in OpenAI’s documentation, enabling them to tackle complex tasks in math, science, and coding.
    • Gemini 2.0 Flash Thinking: This is part of Google’s Gemini 2.0 family of models, launched as an experimental release in December 2024, with updates rolled out in January and February 2025. It’s optimized for low latency, meaning it processes tasks quickly despite its reasoning focus.
    • DeepSeek-R1: Released in January 2025, DeepSeek-R1 is a 671-billion-parameter open-weight model, performing comparably to o1 but at 95% lower cost. This model is designed for tasks requiring complex reasoning, mathematical problem-solving, and logical inference, making it accessible for research and development.
    • Grok 3: Released by xAI in February 2025, Grok 3 is claimed to outperform leading models like GPT-4o, DeepSeek’s V3, and Claude in math, science, and coding benchmarks. Trained with 10 times the compute power of its predecessor, Grok 2, it uses reinforcement learning to enhance reasoning capabilities and introduces “Deep Search,” a next-generation search engine. It achieved an Elo score of 1402 in the Chatbot Arena, indicating strong performance across academic and real-world user preferences, according to xAI’s blog.
    • Claude 3.7 Sonnet: Also released in February 2025 by Anthropic, Claude 3.7 Sonnet is described as the first “hybrid reasoning model,” offering both quick responses and extended, step-by-step thinking. It’s state-of-the-art for coding and delivers improvements in content generation, data analysis, and planning, available on Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.

    I have tried all these models and found them to be on par with each other in terms of quality of output across tasks such as creative writing, logic and reasoning, and coding. My personal preference is Claude because I think it has a personality, but I’m extremely impressed by Grok’s DeepSearch feature. Meanwhile, it’s clear that OpenAI is moving into general-purpose agentic behavior, while Gemini is focusing on speed and lower cost.

    Standardized Benchmarks

    We don’t have numbers across all benchmarks for every model but here’s the best I could find so far (thanks Grok DeepSearch!)

    ModelMath (AIME)Science (GPQA)Coding (HumanEval)Reasoning (MMLU)General Performance (ELO)
    OpenAI o396.7% 87.7%71.7% (Bench Verified)N/AN/A
    DeepSeek R171.0%73.3%Lower than o1N/AN/A
    Gemini73.3%74.2%N/AN/AN/A
    Grok 393%~80-85%N/AN/A1402
    Claude 3.7N/AN/A92%88.7%N/A

    Another fun benchmark is SnakeBench, by Greg Kamradt, which pits models against each other in a competitive snake game simulation. It mostly tests reasoning and coding, and Claude 3.7 come out on top.

    https://twitter.com/GregKamradt/status/1894179293292622312

    Application of Reasoning AI

    Generative AI amplified output; reasoning AI unlocks autonomous problem-solving and entirely new cognitive capacities, such as abstract reasoning and hypothesis generation.

    Scientific Research

    Combining reasoning with search gives us something powerful – a research agent that can take in a query, think about it, search down multiple paths, synthesize information, and continue going down new path and making connections, just like a human researcher would.

    An example of this is the AI co-scientist released by Google, which is already showing promising results in medicine and drug research.

    Knowledge Work

    Reasoning AI can handle knowledge work traditionally performed by human experts, such as legal research, financial analysis, and medical diagnostics. When you add function calling, tool handling, and memory, you get an agent that can perform an entire workflow, like researching a stock, analyzing data, and creating a full report.

    Education

    Reasoning AI’s strengths in solving complex problems and coding can transform the way we learn. Imagine an AI that helps you work through problems or codes an interactive application to illustrate new concepts. Even the thinking process of these models is enlightening.

    The Implications of Reasoning AI

    Reasoning AI will revolutionize the work and life by automating cognitive tasks, redefining job roles, and driving productivity gains, though it will also bring disruptions that society must address.

    Shift in Job Roles

    While some lower-level jobs may be eliminated entirely, I believe many more will be transformed. New roles will emerge, such as AI strategy curators or human-AI collaboration specialists, who will oversee AI outputs and ensure they align with human needs. For instance, while AI might draft a legal contract, a human lawyer will still be needed to interpret client nuances and ethical considerations. This evolution mirrors past technological shifts, where new job categories arose alongside automation.

    Productivity and Economic Impact

    The efficiency boosts from reasoning AI could rival the transformative effects of the internet or industrial automation. Businesses might see productivity improvements of 20-30%, driving economic growth. However, this could also disrupt labor markets, potentially displacing 15-20% of knowledge workers by 2030 according to some estimates. To mitigate this, large-scale re-skilling initiatives will be essential to help workers adapt to new demands.

    Personalized Assistance

    Imagine an AI that optimizes your day based on your goals, preferences, and real-time data like traffic or weather. Reasoning AI could manage tasks with a level of sophistication beyond current tools, potentially saving individuals 5-10 hours per week. For example, it could plan your week to balance work, exercise, and relaxation, adapting as priorities shift.

    Creative and Leisure Activities

    In creative pursuits, reasoning AI could act as a collaborator. Writers might use it to brainstorm plot ideas, while musicians could generate harmonies, blending human intuition with AI’s logical capabilities. This partnership could enrich leisure time, making creative expression more accessible and dynamic.

    Current Limitations and Challenges

    Reasoning AI faces hurdles that must be addressed, including ethical considerations that demand careful attention.

    Generalization

    Specialization limits versatility. o1 excels in math (83% IMO) but falters in general queries, while Claude 3.7’s hybrid approach sacrifices depth for speed in some cases. Achieving broad reasoning still remains elusive.

    Ethics and Bias

    As with any AI system, there are always biases that creep in based on the training data. Techniques like RL may even amplify these biases in reasoning AI.

    There’s also the potential for misuse. Users on X have noted that Grok 3 will generate complete step-by-step instructions on making weapons and dangerous chemicals at home.

    Of course, the cat’s already out of the bag. DeepSeek is completely free and open-source, which means anyone can train their own models to output dangerous or biased content.

    So What Does The Future Hold?

    I think we’re just scratching the surface of implementing reasoning in AI. Continued advancements in techniques like reinforcement learning and chain-of-thought prompting are likely to produce even more capable models, potentially leading to AI systems that can reason across a broader range of tasks. We may see the development of more general reasoning models that can handle everything from scientific research to creative writing, blurring the lines between specialized and general AI.

    The most exciting applications would be the integration of reasoning AI with other technologies—such as robotics or the Internet of Things (IoT)— which could enable AI to perform physical tasks, further expanding its role in the world. Imagine AI-powered robots that can reason through complex environments, making decisions in real-time to complete tasks like disaster response, space exploration, or even just serving you a cup of tea at home.

    However, the path forward is not without challenges. The ethical implications of AI that can replace human knowledge workers must be carefully considered. Workforce displacement, data privacy, and the potential for misuse of autonomous AI agents are all issues that will require thoughtful solutions. Additionally, ensuring that these technologies are accessible and affordable will be key to preventing a new digital divide.

    The age of reasoning AI represents a monumental shift in the capabilities of artificial intelligence. We are moving from a world where AI augments human work to one where it can potentially replace entire categories of knowledge work.

    As we stand on the brink of this new era, one thing is clear: AI is no longer just a tool; it is becoming a partner in problem-solving, capable of thinking, reasoning, and acting in ways once thought to be uniquely human.

  • From Internal Memo to Public Story: Building a Content Agent for VCs

    From Internal Memo to Public Story: Building a Content Agent for VCs

    I was speaking to a VC last month and they mentioned their marketing strategy is to publish their investment memos as blog posts.

    It’s really one of the best ways for a VC to grow their brand. Bessemer and other big funds do it often.

    The only problem is it takes time. Smaller VCs don’t have the bandwidth or resources to turn their private investment memos into public ones.

    The VC I spoke to said it took them hours to remove sensitive information from their investment memos, and rewrite them in a consistent style for the blog. So they weren’t able to do it consistently.

    Well, AI can solve that. Today, we’ll build an AI agent that automates this process using Claude and modern web scraping tools.

    The Solution

    I built a fully autonomous agent for the VC that gets triggered when they create a new investment memo and automatically publishes a draft post to their blog. This entire workflow is invisible and sits inside their operations.

    For the purposes of this blog post, I’m going to simplify it. Instead of automatically triggering it (since I don’t know what your operations look like), we’ll create an interface with Streamlit where you can manually trigger it.

    Our interface will:

    • Accept investment memos in various formats (text, PDF, DOCX)
    • Use Claude to identify and remove sensitive information
    • Scrape the company’s website for public information
    • Generate a polished blog post matching your writing style
    • Provide easy export options for the final content

    Implementation

    Setting Up the Project

    This entire project with code is open source on my GitHub for you to clone and run locally. After that you can modify the code and customize it to your fund.

    You can also take the core agent logic and have it triggered by your existing workflow and even publish to your website.

    I’ve set up the files with this structure:

    Python
    memo-converter/
    ├── requirements.txt
    ├── README.md
    ├── .env
    └── src/
        ├── __init__.py
        ├── main.py
        ├── interface.py
        ├── fetcher.py
        ├── sanitizer.py
        └── generator.py

    The Interface file is the code for our Streamlit interface. You can upload your investment memo to it, enter the URL of the company you’re investing in, and also upload a blog post whose style you want to match (ideally one of your own).

    Fetcher fetches more information about the company you’re investing in. Sanitizer cleans your investment memo. Generator creates the blog post.

    At the end of it, the Interface display the final post.

    Building the Interface

    The interface.py file is where we’ll create our Streamlit interface. Streamlit is an open-source Python package that makes it easy to build visual interfaces for data and AI.

    Our interface will provide a clean, intuitive way to input memos and reference content. As you can see in the code, we use tabs to organize different input methods and provide document preview functionality. We’re also going to add a little slider to control how long we want the final blog post to be.

    Python
    def create_interface(self) -> Dict:
           st.title("Investment Memo to Marketing Blog Post Converter")
          
           # Create tabs for different input methods
           tab1, tab2 = st.tabs(["Paste Memo", "Upload Document"])
          
           # Initialize memo variable
           memo = None
          
           with tab1:
               memo_text = st.text_area(
                   "Investment Memo",
                   height=300,
                   placeholder="Paste your investment memo here..."
               )
               if memo_text:
                   memo = memo_text
                  
           with tab2:
               uploaded_file = st.file_uploader(
                   "Upload Memo Document",
                   type=['txt', 'docx', 'pdf']
               )
               if uploaded_file:
                   memo = self._read_document(uploaded_file)
                   if memo:
                       st.success("Document successfully loaded!")
                       with st.expander("Preview Document Content"):
                           st.text(memo[:500] + "...")
    
    
           # Reference memo input
           st.subheader("Reference Content")
           reference_tab1, reference_tab2 = st.tabs(["Paste Reference", "Upload Reference"])
          
           reference_memo = None
           with reference_tab1:
               reference_text = st.text_area(
                   "Paste a previous public memo for tone/style reference",
                   height=200,
                   placeholder="Paste a previous public memo here..."
               )
               if reference_text:
                   reference_memo = reference_text
                  
           with reference_tab2:
               reference_file = st.file_uploader(
                   "Upload Reference Document",
                   type=['txt', 'docx', 'pdf'],
                   key="reference_uploader"
               )
               if reference_file:
                   reference_memo = self._read_document(reference_file)
                   if reference_memo:
                       st.success("Reference document loaded!")
          
           # Company URL input
           company_url = st.text_input(
               "Company Website URL",
               placeholder="https://company.com"
           )
          
           # Only keep the length slider
           length = st.slider(
               "Target Blog Length (words)",
               min_value=500,
               max_value=2000,
               value=1000,
               step=100
           )
          
           return {
               "memo": memo,
               "reference_memo": reference_memo,
               "company_url": company_url,
               "length": length
           }

    We’re allowing document upload so we need a function to help us read this:

    Python
    def _read_document(self, uploaded_file) -> Optional[str]:
           """Extract text from uploaded document."""
           if uploaded_file is None:
               return None
              
           try:
               file_extension = uploaded_file.name.split('.')[-1].lower()
              
               if file_extension == 'txt':
                   return uploaded_file.getvalue().decode('utf-8')
                  
               elif file_extension == 'docx':
                   doc = docx.Document(io.BytesIO(uploaded_file.getvalue()))
                   return '\n'.join([paragraph.text for paragraph in doc.paragraphs])
                  
               elif file_extension == 'pdf':
                   pdf_reader = PdfReader(io.BytesIO(uploaded_file.getvalue()))
                   text = ''
                   for page in pdf_reader.pages:
                       text += page.extract_text() + '\n'
                   return text
                  
               else:
                   st.error(f"Unsupported file format: {file_extension}")
                   return None
                  
           except Exception as e:
               st.error(f"Error reading document: {str(e)}")
               return None

    Sanitizing Sensitive Information

    The first thing we want to do when our interface accepts a new investment memo is sanitize it and remove sensitive information.

    If you were building this as an agent, you would skip the interface and trigger the agent here.

    Instead of using a rule-based approach, we can just ask an LLM (Claude in this case) to read through it, remove sensitive information, and return a clean version.

    You can use any other LLM. It’s probably more optimal to use a smaller and cheaper model like GPT 3.5 turbo, but I just prefer Claude when it comes to working with content.

    You’ll find this code in sanitize.py:

    Python
    import anthropic
    import streamlit as st
    
    
    class MemoSanitizer:
       def __init__(self, api_key: str):
           """Initialize the Claude client."""
           self.client = anthropic.Anthropic(api_key=api_key)
      
       async def sanitize(self, memo_text: str) -> str:
           """Use Claude to identify and remove sensitive information."""
           try:
               message = self.client.messages.create(
                   model="claude-3-5-sonnet-20241022",
                   max_tokens=8192,
                   temperature=0,
                   system="You are an expert at identifying sensitive information in VC investment memos. Your task is to identify and remove sensitive information while preserving the key insights and analysis.",
                   messages=[
                       {
                           "role": "user",
                           "content": [
                               {
                                   "type": "text",
                                   "text": f"""Please analyze this investment memo and create a version with all sensitive information removed.
                                   Sensitive information includes but is not limited to:
                                   - Specific financial metrics (revenue, growth rates, burn rate, etc.)
                                   - Valuation details and cap table information
                                   - Customer names and specific deal values
                                   - Internal strategic discussions
                                   - Detailed technical information not public
                                   - Specific product roadmap details
                                  
                                   Memo:
                                   {memo_text}
                                  
                                   Return ONLY the sanitized version, with no explanation or additional text."""
                               }
                           ]
                       }
                   ]
               )
              
               return message.content
              
           except Exception as e:
               st.error(f"Error sanitizing memo: {str(e)}")
               return ""

    Gathering Public Information

    You’ll notice our interface accepts a URL for the startup you’re investing in. We use Firecrawl to scrape the company’s public website and get more information about it to add to our marketing post.

    If your investment memo already contains a lot of information about the company, you may not even need this.

    All this code goes in the fetcher.py file:

    Python
    from firecrawl import FirecrawlApp
    import streamlit as st
    
    
    class StartupInfoFetcher:
       def __init__(self, api_key: str):
           """Initialize the Firecrawl client."""
           self.client = FirecrawlApp(api_key=api_key)
      
       async def fetch_startup_info(self, company_url: str) -> str:
           """Fetch website content using Firecrawl.
          
           Args:
               company_url (str): URL of the company website
              
           Returns:
               str: Website content in markdown format
           """
           try:
               response = self.client.scrape_url(
                   url=company_url,
                   params={
                       'formats': ['markdown']
                   }
               )
              
               # Return the markdown content
               return response.get('markdown', '')
              
           except Exception as e:
               st.error(f"Error fetching website content: {str(e)}")
               return ""

    Generating the Blog Post

    Ok, now we have all the pieces we need to generate our final blog post. We got public information about the company from the fetcher, a clean investment memo from the sanitizer, and a reference memo from the interface.

    Using another Claude instance, we can generate the final blog post, using the reference memo to match your writing style.

    As you can see in the generator.py file, most of the code is really just a well-crafted prompt:

    Python
    import anthropic
    import streamlit as st
    
    
    class BlogPostGenerator:
       def __init__(self, api_key: str):
           """Initialize the Claude client."""
           self.client = anthropic.Anthropic(api_key=api_key)
      
       async def generate_post(
           self,
           clean_memo: str,
           public_info: str,
           reference_memo: str,
           target_length: int
       ) -> str:
           """Generate a polished blog post using Claude."""
           try:
               message = self.client.messages.create(
                   model="claude-3-5-sonnet-20241022",
                   max_tokens=8192,
                   temperature=0.7,
                   system="You are an expert at writing compelling VC investment blog posts that share insights while maintaining confidentiality.",
                   messages=[
                       {
                           "role": "user",
                           "content": [
                               {
                                   "type": "text",
                                   "text": f"""Create a compelling blog post about an investment, using the following information and guidelines:
    
    
                                   Clean Investment Memo:
                                   {clean_memo}
    
    
                                   Public Information from Company Website:
                                   {public_info}
    
    
                                   Reference Memo (for tone/style):
                                   {reference_memo}
    
    
                                   Guidelines:
                                   - Match the tone/style of the reference memo
                                   - Target length: {target_length} words
                                   - Focus on market insights and investment thesis
                                   - Only include public information or high-level insights
                                   - Structure the post with clear sections and engaging headlines
                                  
                                   Return ONLY the blog post, with no explanation or additional text."""
                               }
                           ]
                       }
                   ]
               )
              
               return message.content
              
           except Exception as e:
               st.error(f"Error generating blog post: {str(e)}")
               return ""

    Display Blog Post and Allow Document Export

    Back in our interface.py file, we want our blog post to display in the interface. Again, if you were building this as an autonomous agent, you can skip this and directly publish to your website.

    Here’s the function:

    Python
    def display_blog_post(self, blog_post):
           """Display the generated blog post with download options."""
           # Extract plain text content
           text_content = self._get_text_content(blog_post)
          
           # Display tabs for different views
           view_tab, download_tab = st.tabs(["View Blog Post", "Download Options"])
          
           with view_tab:
               # Display the markdown content
               st.markdown(text_content)
          
           with download_tab:
               # Create download buttons for different formats
               col1, col2 = st.columns(2)
              
               with col1:
                   docx_bytes = self._create_downloadable_document(blog_post, 'docx')
                   if docx_bytes:
                       st.download_button(
                           label="Download as DOCX",
                           data=docx_bytes,
                           file_name="blog_post.docx",
                           mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
                       )
              
               with col2:
                   txt_bytes = self._create_downloadable_document(blog_post, 'txt')
                   if txt_bytes:
                       st.download_button(
                           label="Download as TXT",
                           data=txt_bytes,
                           file_name="blog_post.txt",
                           mime="text/plain"
                       )

    If you want, you can also add options to export it as a document:

    Python
    def _create_downloadable_document(self, content, format: str) -> bytes:
           """Convert content to downloadable document format."""
           try:
               # Get plain text content
               text_content = self._get_text_content(content)
              
               if format == 'docx':
                   doc = docx.Document()
                   # Split content by newlines and add each paragraph
                   for paragraph in text_content.split('\n'):
                       if paragraph.strip():  # Only add non-empty paragraphs
                           doc.add_paragraph(paragraph)
                  
                   # Save to bytes
                   doc_bytes = io.BytesIO()
                   doc.save(doc_bytes)
                   doc_bytes.seek(0)
                   return doc_bytes.getvalue()
                  
               elif format == 'txt':
                   return text_content.encode('utf-8')
                  
           except Exception as e:
               st.error(f"Error creating {format} document: {str(e)}")
               return None

    Tie It All Together

    And that’s it!

    Our main.py file ties it all together:

    Python
    import os
    import asyncio
    from dotenv import load_dotenv
    import streamlit as st
    from interface import MemoConverter
    from fetcher import StartupInfoFetcher
    from sanitizer import MemoSanitizer
    from generator import BlogPostGenerator
    
    
    class MemoToBlogConverter:
       def __init__(self):
           """Initialize the main application components."""
           load_dotenv()
          
           self.interface = MemoConverter()
           self.fetcher = StartupInfoFetcher(os.getenv("EXA_API_KEY"))
           self.sanitizer = MemoSanitizer(os.getenv("ANTHROPIC_API_KEY"))
           self.generator = BlogPostGenerator(os.getenv("ANTHROPIC_API_KEY"))
      
       async def process_memo(self):
           """Process the memo and generate a blog post."""
           input_data = self.interface.create_interface()
          
           if st.button("Generate Blog Post"):
               if not input_data["memo"]:
                   st.error("Please provide an investment memo.")
                   return
              
               if not input_data["company_url"]:
                   st.error("Please provide the company website URL.")
                   return
              
               with st.spinner("Processing your memo..."):
                   # Execute the conversion pipeline
                   public_info = await self.fetcher.fetch_startup_info(
                       input_data["company_url"]
                   )
                  
                   clean_memo = await self.sanitizer.sanitize(input_data["memo"])
                  
                   blog_post = await self.generator.generate_post(
                       clean_memo,
                       public_info,
                       input_data["reference_memo"],
                       input_data["length"]
                   )
                  
                   st.success("Blog post generated successfully!")
                   self.interface.display_blog_post(blog_post)
    
    
    if __name__ == "__main__":
       converter = MemoToBlogConverter()
       asyncio.run(converter.process_memo())

    Running the Application

    All the instructions to clone and run the code are on my GitHub. To use the application:

    1. Set up your environment variables:

    Python
    ANTHROPIC_API_KEY=your_anthropic_api_key
    FIRECRAWL_API_KEY=your_firecrawl_api_key

    2. Run the Streamlit app:

    Python
    streamlit run src/main.py

    Benefits and Results

    This application provides several key benefits:

    1. Time Savings: What used to take hours can now be done in minutes

    2. Consistency: Generated posts maintain your writing style across publications

    3. Safety: Reduced risk of accidentally sharing sensitive information

    4. Flexibility: Support for various input formats and export options

    5. Scalability: Easy to process multiple memos efficiently

    Conclusion

    Ok that was a lot to take in but you can simply clone my repo and run the code yourself. Just don’t forget to add your API keys!

    The modular architecture makes it easy to enhance and customize the application as needs evolve. As I mentioned before, you can turn this into a fully autonomous agent.

    If you need any help or advice, or you want to set up agents at your fund, book a call with me.

  • Unlimited Intelligence: How Compact LLMs are Revolutionizing AI

    Unlimited Intelligence: How Compact LLMs are Revolutionizing AI

    Imagine having a personal AI assistant that works without internet, is private and local, and most importantly – is completely free. This isn’t science fiction anymore.

    Yesterday, AllenAI released OLMoE, a completely free and open-source AI model that you can download and run locally on your iPhone. As someone who’s been closely following the AI space for years, I can tell you: this is the democratization of AI we’ve been waiting for.

    Traditional cloud-based models, while impressive in their capabilities, often come with significant computational costs, latency issues, and privacy concerns. Plus there’s the fact that’s it’s controlled by a handful of companies – Google, Microsoft, OpenAI, and a few others.

    That’s why open source is so important. It gives power back to you, the consumer. With an open-source model, you can train it to be completely personalized, and it runs on your phone. Free and unlimited intelligence in the palm of your hand.

    And this is just day one. OLMoE may not be the best model available, but it’s only going to get better. In this post I explain how and what the future could look like.

    Smaller, Yet Mightier: Techniques for Efficient LLMs

    The pursuit of smaller and more efficient LLMs has given rise to a range of innovative techniques, each contributing to the goal of delivering powerful AI capabilities on resource-constrained devices. One such approach is knowledge distillation, which enables smaller models to replicate the performance of their larger counterparts by learning from their outputs. DistilBERT, for instance, retains an impressive 97% of BERT’s performance while being 40% smaller in size.

    Quantization techniques, such as binary or ternary quantization, have also played a pivotal role in reducing the computational and memory requirements of LLMs. The Slim-Llama ASIC processor, for example, achieves a remarkable 4.59x efficiency boost while supporting models with up to 3 billion parameters, all while consuming a mere 4.69 milliwatts of power.

    “The ‘bigger is better’ approach to AI is reaching its limits, and smaller models offer a more sustainable path forward,” says Sasha Luccioni, a researcher and AI lead at Hugging Face.

    Another promising technique is activation sparsity, which enforces sparsity in the activation outputs of LLMs, leading to significant reductions in memory and computational requirements. Nobel Dhar, a researcher in the field, highlights that “activation sparsity can lead to around 50% reduction in memory and computing requirements with minimal accuracy loss”.

    Outperforming the Giants: Smaller LLMs’ Competitive Edge

    Contrary to popular belief, smaller LLMs are not merely watered-down versions of their larger counterparts. In fact, they are increasingly demonstrating their ability to outperform larger models in specific tasks and benchmarks.

    The QwQ 32B model, for instance, outperformed models as large as 70B or 123B in the MMLU-Pro benchmark by effectively utilizing techniques like chain of thought and self-reflection when given sufficient tokens to process.

    Darren Oberst, an author of a detailed analysis on small language models, emphasizes that “small models are often underestimated but can be highly effective for specific tasks, especially when fine-tuned”.

    Unlocking the Potential of On-Device AI

    One of the most significant advantages of on-device AI is enhanced data privacy and security. By keeping data processing local, the risk of data breaches and unauthorized access is minimized, a critical consideration in an era where data privacy is increasingly prioritized.

    Modern mobile chipsets with Neural Processing Units (NPUs) can handle complex AI models directly on the device, creating an “air gap” between personal data and external threats.

    KV-Shield, a novel approach developed by researchers, further enhances the security of on-device LLM inference by preventing privacy-sensitive intermediate information leakage. It achieves this by permuting weight matrices and leveraging Trusted Execution Environments (TEE), addressing vulnerabilities in GPU-based LLM inference.

    Beyond privacy and security, on-device AI offers numerous other benefits, including real-time data processing, offline functionality, and cost efficiency. It enables devices to function without constant internet connectivity, making it ideal for remote or unstable network environments, while also reducing reliance on cloud infrastructure and associated operational costs.

    “Edge AI represents a fundamental shift in distributed computing, enhancing real-time processing and data privacy,” says Dr. Salman Toor, an Associate Professor at Uppsala University.

    Transforming Industries: The Impact of On-Device AI

    The implications of on-device AI extend far beyond the realm of personal devices, with the potential to transform industries as diverse as healthcare, finance, and consumer electronics.

    Autonomous vehicles is an obvious example. Cars like Teslas come equipped with sensors that collect data and need to be processed instantly to detect obstacles, avoid collisions, and so on.

    In healthcare, on-device AI significantly enhances diagnostic accuracy and efficiency. Devices can monitor data such as heart rate, oxygen levels, and blood pressure, and immediately alert medical staff if something goes wrong. The patient data is stored on the device and not transmitted to an external AI, reducing privacy concerns and ensuring compliance with healthcare regulations like HIPAA.

    Consumer electronics are also being transformed by on-device AI, with security systems being able to instantly detect movement, identify threats, and trigger alerts, without needing an internet connection at all times.

    Addressing Challenges: Balancing Performance and Sustainability

    While the potential of on-device AI and smaller LLMs is undeniable, there are challenges that must be addressed to ensure their widespread adoption and sustainable growth. One key concern is the energy consumption and carbon footprint associated with training and running these models.

    However, advancements in model compression and efficient parameterization techniques are helping to mitigate these issues. For instance, Meta’s Llama 3.2, with 1 billion and 3 billion parameter variants, consumed just over 581 MWh combined, which is about half the energy required for GPT-3. Furthermore, the training of the Llama 3.2 model resulted in 240 tons of CO2eq emissions, but nearly 100% of the electricity used was renewable, making it largely carbon neutral.

    Another challenge lies in the technical limitations of on-device AI processing, such as hardware constraints and the need for advanced model compression techniques. However, ongoing research and development in areas like pruning, quantization, and edge learning are addressing these challenges, paving the way for more efficient and capable on-device AI solutions.

    Looking Ahead: Future Innovations in On-Device AI

    While the current advancements in on-device AI are impressive, the future holds even greater promise as researchers continue to push the boundaries of what is possible. One such innovation is the Whisper-T framework, which significantly reduces latency in streaming speech processing on edge devices, achieving latency reductions of 1.6x-4.7x with per-word delays as low as 0.5 seconds and minimal accuracy loss 1.

    Memory layers, a novel approach that enhances model efficiency by adding parameters without increasing FLOPs, are also showing promising results. Models with memory layers have been shown to outperform dense models with more than twice the computation budget 2.

    The Delta framework, developed by researchers, offers a unique solution for on-device continual learning by leveraging cloud data for enrichment. This approach has been shown to improve model accuracy by up to 15.1% for visual tasks while reducing communication costs by over 90% 4.

    Another promising development is Ripple, an optimization technique that manages neuron placement to reduce I/O latency during LLM inference on smartphones. Ripple has demonstrated up to 5.93x improvements in I/O latency, paving the way for more efficient on-device AI 5.

    Conclusion: Embracing the Future of On-Device Intelligence

    The rise of smaller and smarter LLMs is more than just a technological advancement; it represents a paradigm shift in the way we approach AI development and deployment. By enabling on-device AI capabilities, we are ushering in a future where powerful intelligence is no longer tethered to the cloud or constrained by internet connectivity.

    As the demand for privacy, security, and real-time processing continues to grow, on-device AI will become an increasingly attractive solution, offering a perfect balance between performance and efficiency. The journey towards this future is already underway, driven by groundbreaking techniques like knowledge distillation, quantization, and activation sparsity, as well as innovative approaches like KV-Shield and compute-optimal sampling.

    More importantly, free and open-source AI that runs locally on your phone gives the consumers power and democratizes the use of AI. And I think that’s a better future than one where AI is controlled by a handful of companies.

  • My AI Stack in 2025: A Personal Overview

    My AI Stack in 2025: A Personal Overview

    Unless you make it your full-time job, it’s impossible to keep up with the latest developments in AI. In this post, I share my AI stack with my top tools and how I use them in my work and life.

    Work

    My work broadly falls into three categories – planning, writing, and coding. Each has its own unique AI workflow that I’ve refined over time.

    Planning & Strategy

    ChatGPT is my primary tool for planning and strategic thinking. While I heavily used GPT-4 last year, I’ve now shifted to using the reasoning models (o1 and o3) as they excel at creating structured, logical plans. These models serve as invaluable brainstorming partners when I’m:

    • Planning client projects
    • Developing startup strategies
    • Creating structured workplans
    • Organizing my daily workflow

    The collaborative chat interface works particularly well for this use case – I can bounce ideas back and forth, get feedback on my thinking, and gradually build out a master plan through discussion.

    Writing

    For writing tasks, Claude has become my go-to tool. I’ve invested time in training it to understand and adapt to my tone of voice by providing examples of my past content. This works particularly well for:

    • Email outreach to companies
    • Social media content
    • Blog posts
    • Professional communications

    My workflow with Claude is quite dynamic. Sometimes I’ll provide specific context and ask for a draft in my voice, but more often I’ll engage in a freeform conversation, letting my thoughts flow naturally while Claude helps structure them into coherent content. I particularly value Claude’s ability to ask probing questions that help extract deeper insights from my stream of consciousness.

    Development

    For coding and building, I use a combination of tools in a sequential workflow:

    1. First, I use o1 or o3 for high-level architecture planning, especially for complex systems like AI agents
    2. Then I take that structural plan to Claude, which I find particularly adept at generating clean, well-structured initial code
    3. Finally, I move to Cursor for the iterative development process, building out additional features and refining the implementation

    This multi-tool approach lets me leverage the strengths of each model at different stages of the development process.

    Research

    For research work, I’ve been using Google’s Deep Research since its release a few months ago. While I haven’t tried OpenAI’s Deep Research yet (though I hear it provides even more detailed results), I’ve actually built my own research tool using various APIs and open source models. This custom solution, while perhaps not as comprehensive as the commercial options, meets my needs well and operates at a fraction of the cost.

    Personal

    My personal use of AI broadly falls into three categories: learning, search, and advice.

    Learning

    ChatGPT is my primary learning companion, particularly GPT-4. Whether I’m trying to understand complex scientific concepts or learn something new, I find myself defaulting to ChatGPT’s interface. Its canvas feature is particularly useful for visualizing concepts – while Claude offers similar functionality, I’ve found myself naturally gravitating toward ChatGPT for this use case.

    Language learning has been revolutionized by ChatGPT’s advanced voice mode. Being able to have natural conversations with immediate pronunciation feedback has been incredibly helpful for language practice. The voice mode is also great for learning on the go – I can have educational conversations while walking down the street, making the most of otherwise idle time.

    Search

    For general search, I primarily use ChatGPT, with occasional use of Gemini or traditional Google search. Google’s integration of Gemini into search results, providing AI-generated summaries at the top, has been a welcome addition. I find myself switching between ChatGPT and Google+Gemini depending on the type of information I’m looking for.

    Personal Advice

    Claude has become my go-to for personal advice. I particularly appreciate Claude’s personality and its willingness to speak its mind more freely compared to other models. While I wouldn’t call it therapy, it serves as a helpful sounding board for personal matters – something akin to a thoughtful friend you can bounce ideas off of.

  • The Pyramid of AI Adoption: Where Does Your Business Stand?

    The Pyramid of AI Adoption: Where Does Your Business Stand?

    As a consultant working with businesses across the tech spectrum, I’ve noticed a clear pattern in how companies approach AI integration. Today, I want to share a framework I’ve developed called the Pyramid of AI Adoption that will help you benchmark where your organization stands in the AI revolution.

    The Foundation: AI Augmentation

    At its most basic level, AI adoption begins with what we call AI Augmentation. Think of this stage as giving your employees powerful tools that enhance their natural capabilities. Just as calculators revolutionized accounting by reducing manual computation time, general-purpose AI tools like ChatGPT amplify human productivity across various tasks.

    For example, a marketing professional might use ChatGPT to generate initial drafts of social media posts, which they then refine with their expertise. Similarly, teams might employ Fireflies.ai to automatically transcribe and summarize meetings, allowing participants to focus on discussion rather than note-taking.

    What makes this stage particularly significant is its accessibility. These tools require minimal technical expertise and provide immediate value. However, many organizations still hesitate to formally approve their use, creating a significant competitive disadvantage. Consider this: if your competitors’ employees can complete tasks in half the time while maintaining quality, how long can you afford to stay behind?

    The Middle Ground: AI Automation

    As organizations become more comfortable with AI, they typically progress to AI Automation. This stage represents a fundamental shift from using standalone tools to integrating AI directly into business processes. It’s similar to the transition from using individual productivity software to implementing enterprise-wide systems.

    The beauty of modern AI automation lies in its flexibility and accessibility. Using no-code platforms like Zapier or Make, businesses can create sophisticated workflows without deep technical expertise. For instance, you might set up an AI system that:

    • Automatically routes customer inquiries to appropriate departments based on content analysis
    • Processes and categorizes incoming documents and emails
    • Generates preliminary responses to common customer questions

    For more complex implementations, frameworks like Langchain enable the creation of AI agents that can handle entire business processes autonomously. These systems don’t just follow pre-programmed rules – they learn and adapt to new situations, much like human employees.

    The Summit: AI Innovation

    At the pyramid’s peak, we find AI Innovation – the stage where organizations transition from being AI consumers to AI creators. This represents a fundamental transformation in how businesses operate and compete.

    To understand this stage, consider how Netflix evolved from a DVD rental service to a content creator. Similarly, companies at this level develop AI-first products and custom AI models trained on their unique data, creating entirely new capabilities and revenue streams. This might involve:

    • Creating industry-specific language models that understand proprietary terminology and contexts
    • Developing computer vision systems tailored to specific manufacturing processes
    • Building predictive models that leverage years of accumulated business data

    This stage requires significant investment in both technical expertise and infrastructure. Organizations typically either build internal AI teams or partner with specialized development firms. While the barrier to entry is high, the potential rewards – including unique competitive advantages and new business models – can be transformative.

    Understanding Your Position and Planning Your Journey

    Evaluating your organization’s position on this pyramid requires honest assessment. Start by asking:

    • What AI tools are officially sanctioned in your workplace?
    • How integrated is AI into your daily operations?
    • Does your organization have any unique data assets that could power custom AI solutions?

    Remember that progression through these stages isn’t necessarily linear or uniform across an organization. Different departments might be at different levels, and that’s okay. The key is to have a clear understanding of where you stand and where you want to go.

    Looking Ahead

    As AI technology continues to evolve, the characteristics of each level will likely shift. What’s considered innovative today might become basic tomorrow. The most successful organizations will be those that view AI adoption not as a one-time project, but as an ongoing journey of technological evolution and business transformation.

    By understanding these stages, organizations can better plan their AI adoption strategy, allocate resources effectively, and set realistic goals for their digital transformation journey. The key isn’t to rush to the top, but to build a solid foundation and progress thoughtfully based on your organization’s specific needs and capabilities.