Category: Blog

  • AI Killed The CMS: How i Ditched Mine for Code And Markdown

    AI Killed The CMS: How i Ditched Mine for Code And Markdown

    Lee Robinson (an important guy at Cursor) wrote a banger post about migrating Cursor from a CMS back to raw code and markdown. He did it in three days, spent $260 in tokens, and deleted 322K lines of code.

    Reading it, I found myself nodding along. A lot.

    My WordPress site had become a slug. Not just for visitors (though yes, my PageSpeed Insights were embarrassing) but for me. Most of my blog posts are over 3,000 words long, and for some reason, the Gutenberg editor gets very sluggish at that point.

    I’d been thinking of rolling my own CMS but what Lee is advocating for is something much simpler. Just Markdown files.

    So I (Claude Code, I mean) rebuilt my entire blog from scratch with Astro + MDX (which lets you use JSX components in Markdown) and hosted it on Cloudflare Pages.

    We (CC and I) did it in 3 hours and $0 in tokens (ok fine I’m on the Claude Code Max plan but still, that cost is divided over all my projects so it’s near 0). Here’s how it went.

    The Problem with A CMS

    I think we can all agree that CMSes are… bloated. My WordPress setup, for example, had accumulated layers:

    • Yoast SEO (a plugin to manage meta tags… which are just HTML)
    • WP Super Cache (a plugin to make WordPress fast… which shouldn’t be slow)
    • Various security plugins (to protect against attacks… that target WordPress specifically)
    • A theme I’d customized so heavily it was basically unmaintainable
    • Overpriced Godaddy hosting. And Godaddy is trash.

    I believe we are moving to a world where humans will primarily interact with AI agents that do work on their behalf. And to Lee’s point, if you’re using AI agents in your workflows, then every layer between your agent and the final work output is an additional layer of complexity your agent has to navigate.

    When I wanted to update a post, I couldn’t just tell my agent “update the code in my latest tutorial”. I had to log into WordPress, navigate to the post, wait for the editor, update the code snippet manually, and save. The AI-powered workflow I used for everything else in my life didn’t apply to my own content.

    Wait, Do We need a CMS?

    Sanity, which is the CMS Cursor used, wrote a response to Lee’s post, and they weren’t entirely wrong. Their core arguments:

    1. What Lee built is still a proto-CMS – just distributed across git, GitHub permissions, and custom scripts
    2. Git isn’t a content collaboration tool – merge conflicts on prose are miserable

    These are valid concerns for teams. For a media company with 200 writers, approval chains, and content appearing across web, mobile, and email? Yeah, you probably need a real CMS.

    And maybe you can vibe code your own, but the edge cases might grow over time.

    But some of their other arguments don’t hold up as well:

    “Markdown files are denormalized data” – Sanity argues that if your pricing lives on three pages, you’re updating three files. But this is exactly what components solve. In Astro, you create a <Pricing /> component that pulls from a single source. Change it once, it updates everywhere it’s used.

    “Agents can grep, but can’t query” – They say you can’t grep for “all posts mentioning feature X published after September.” But AI agents don’t just grep. They use semantic search. Claude Code can read frontmatter dates, understand content contextually, and reason about what you’re asking for.

    In fact, most of Sanity’s arguments really only make sense for massive enterprise companies with complex content processes. And at that scale, the CMS isn’t the issue. It’s the process itself.

    For everyone else, everyone on WordPress, Webflow, Framer, Wix, Weebly, what have you, the “you’ll end up building a CMS” argument assumes we need CMS features.

    And my biggest revelation from this project is… maybe not? I personally need:

    • A place to write text
    • A newsletter signup form
    • Images
    • Tracking scripts (google analytics, etc)
    • Custom CTAs and components

    That’s it! Everything else is overhead.

    So yeah, I agree with Sanity. Don’t build your own CMS. But the real question isn’t should you build your own CMS (you shouldn’t), but do you actually need a CMS in the first place?

    ok no CMS, but Why Astro?

    Most frameworks like Next.js or Gatsby ship JavaScript to the browser for everything, even if a page is just static text. That’s because they hydrate the entire page as a React app.

    Astro does the opposite. By default, it renders your components to HTML at build time and ships zero JavaScript. A blog post with just text and images is just pure HTML and it loads instantly.

    When you actually need interactivity, like a newsletter form, or an interactive demo, you add a client: directive to that specific component:

    Astro
    ---
    import NewsletterForm from '../components/NewsletterForm.jsx';
    ---
    
    <article>
      <p>This is just HTML, no JS shipped...</p>
      
      <!-- Only this component gets JavaScript -->
      <NewsletterForm client:visible />
    </article>
    

    This is called Islands Architecture. Your page is a sea of static HTML with small islands of interactivity where you need them.

    For my blog, that means 95% of every page is static HTML. The only JavaScript is Kit’s embed script for newsletter signups. My pages went from 200KB+ on WordPress to under 50KB on Astro.

    What About Images?

    This was my first concern. WordPress handles image optimization automatically (with enough plugins). Would I be manually running everything through TinyPNG?

    Nope. Astro has built-in image optimization. You drop images in src/assets/, reference them in your markdown, and Astro handles the rest at build time – WebP conversion, responsive srcset, lazy loading, proper aspect ratios, other technical terms I had to learn about when trying to optimize my WP.

    What About SEO?

    Yoast SEO is really just a GUI for meta tags. In Astro, your layout handles it:

    Astro
    ---
    const { title, description, image } = Astro.props;
    ---
    <head>
      <title>{title}</title>
      <meta name="description" content={description} />
      <meta property="og:title" content={title} />
      <meta property="og:description" content={description} />
      <meta property="og:image" content={image} />
      <link rel="canonical" href={Astro.url} />
    </head>
    

    Your blog post frontmatter provides the values:

    Astro
    ---
    title: "Building AI Agents from Scratch"
    description: "A step-by-step guide to building coding agents with Python..."
    image: "./cover.png"
    ---
    

    For sitemaps: npx astro add sitemap. It auto-generates on build.

    What About Caching?

    This was my favorite realization: you don’t need caching plugins when your site is already static.

    WP Super Cache exists because WordPress dynamically generates pages on every request. That’s expensive, so you cache the output.

    Astro pre-generates everything at build time. Your pages are already HTML files sitting on a CDN. There’s nothing to cache dynamically. Cloudflare serves them from edge locations worldwide with automatic caching headers.

    My WordPress site with caching plugins was slower than my Astro site with no caching at all.

    What About Security?

    Another plugin category that becomes irrelevant. WordPress security plugins exist because WordPress has attack surfaces – a database, PHP execution, a login page, plugin vulnerabilities.

    Static HTML files can’t be hacked the same way. There’s no database to SQL inject. No PHP to exploit. No admin panel to brute force. The attack surface basically doesn’t exist.

    What About Analytics?

    On WordPress, I had a plugin for this. On Astro, it’s one line in your base layout:

    Astro
    <script defer data-domain="siddharthbharath.com" src="you-analytics-script-goes-here"></script><br>

    That’s it. Every page inherits from the layout, so every page gets tracking. Swap in Google Analytics, Meta pixel, or whatever you use, it’s all the same.

    What About Newsletter Forms?

    My blog has Kit signup forms scattered throughout posts. Would I lose the ability to embed forms easily?

    Turns out it’s simpler than WordPress. I created a reusable component:

    Astro
    ---
    const { formId, title } = Astro.props;
    ---
    
    <div class="newsletter-form">
      {title && <h3>{title}</h3>}
      <script async src={`https://f.kit.com/${formId}/script.js`}></script>
    </div>
    

    Now in any MDX post, I just drop in:

    Astro
    Some content about AI agents...
    
    <KitForm formId="abc123" title="Get the free cheat sheet" />
    
    More content here...
    

    Claude Code knows about this component. When I say “add a newsletter signup after the introduction,” it adds the <KitForm /> tag in the right place. No shortcode syntax to remember, no plugin conflicts.

    What About Code Blocks?

    Almost all of my blog content contains this. In WordPress I need a plugin. In Astro, syntax highlighting is built in via Shiki. You write fenced code blocks in markdown:

    Markdown
    ```python
    def hello():
        print("Hello, world!")
    ```
    

    Astro renders them as beautifully styled HTML at build time.

    You can configure the theme in astro.config.mjs:

    Astro
    export default defineConfig({
      markdown: {
        shikiConfig: {
          theme: 'one-dark-pro',
          wrap: true,
        },
      },
    });

    What About Video Embeds?

    YouTube and Vimeo embeds are just iframes. They work in markdown out of the box. But if you want something cleaner, you can create a component:

    Astro
    ---
    const { url, title } = Astro.props;
    const videoId = url.match(/(?:youtube\.com\/watch\?v=|youtu\.be\/)([^&]+)/)?.[1];
    ---
    
    <div class="video-embed">
      <iframe
        src={`https://www.youtube.com/embed/${videoId}`}
        title={title}
        loading="lazy"
        allowfullscreen
      />
    </div>
    

    Then in your MDX:

    Astro
    <VideoEmbed url="https://youtube.com/watch?v=abc123" title="How to build AI agents" />
    

    The loading="lazy" means videos don’t load until the user scrolls to them. Better performance than any WordPress embed plugin I’ve used.

    What About Custom Pages?

    Astro isn’t just for blogs. It’s a full static site generator. I have a homepage, about page, services page, and contact page, all custom designed.

    Each page is just a file in src/pages/:

    Astro
    src/pages/
    ├── index.astro        → Homepage with hero, featured posts, services
    ├── about.astro        → Bio, experience, personal stuff
    ├── services.astro     → Consulting offerings with CTAs
    ├── contact.astro      → Contact form
    └── blog/              → Blog listing and posts
    

    You can share layouts across pages, create reusable components, and style everything with Tailwind or plain CSS. It’s as flexible as any custom WordPress theme, but without the PHP spaghetti.

    How I Built It

    Ok, hopefully I’ve sold you on why you should roll your own non-CMS. Let me walk through exactly what I did.

    Step 1: Create the PRD

    Before touching any code, I opened Claude (the regular chat, not Claude Code) and described what I wanted:

    Markdown
    I want to rebuild my WordPress blog as a static site with Astro. 
    Here's what I need:
    
    Pages:
    - Homepage with hero, featured posts, services overview, newsletter signup
    - About page with my background and experience
    - Services page for my consulting offerings
    - Contact page with a form
    - Blog listing and individual post pages
    
    Features:
    - MDX for blog posts so I can embed components
    - Tailwind for styling
    - Kit newsletter forms
    - Syntax-highlighted code blocks
    - Image optimization
    
    I want to deploy on Cloudflare Pages. Can you create a detailed PRD 
    I can give to Claude Code?
    

    Claude generated a comprehensive PRD with the site structure, component specs, frontmatter schema, SEO requirements, and deployment steps. I refined it over a few back-and-forths until it covered everything.

    Spend time on the PRD. The better your spec, the better Claude Code’s output. I probably spent 30 minutes getting the PRD right before writing any code.

    Step 2: Scaffold the Project

    With the PRD ready, I set up the project infrastructure. Open your terminal:

    Bash
    # Create the Astro project with the blog template
    npm create astro@latest my-blog -- --template blog
    cd my-blog
    
    # Add the integrations we need
    npx astro add tailwind
    npx astro add mdx
    npx astro add sitemap
    
    # Initialize git
    git init
    git add .
    git commit -m "Initial Astro scaffold"
    
    # Create a GitHub repo and push
    gh repo create my-blog --public --source=. --push
    

    If you don’t have the GitHub CLI (gh), you can create the repo manually on GitHub and push with:

    Bash
    git remote add origin https://github.com/yourusername/my-blog.git
    git push -u origin main
    

    Step 3: Start Building with Claude Code

    Now the fun part. Navigate to your project folder and start Claude Code:

    Bash
    cd my-blog
    claude
    

    I pasted in my PRD and told Claude Code to start building:

    Markdown
    In this project, we're rebuilding my blog. Read the PRD and start working on it - PRD.md
    
    Start with the base layout, navigation, and footer. Then build out 
    the homepage. Don't import all my content yet - just create 3 sample 
    blog posts so we can nail the design first.
    

    The initial PRD included my core pages plus just three representative blog posts. This was intentional as I didn’t want to import dozens of posts before getting the design right. Better to iterate on the design with sample content, then bulk import once the templates are solid.

    Claude Code scaffolded the layouts, created the component structure, and built out the homepage. The initial output was functional but generic. That “Tailwind AI slop” look you’ve seen a thousand times.

    Step 4: Nail the Design

    This is where things got interesting. I’ve been using a specific art style for my blog images, a retro-futuristic aesthetic inspired by 70s European sci-fi comics. I wanted the site design to complement that.

    I gave Claude Code this prompt:

    Bash
    Use the frontend-design skill, then redesign the site to match this art style I use for my blog images:
    "Retro-futuristic comic illustration in ligne claire style, Moebius-inspired. Clean linework, flat bold colors, highly detailed sci-fi architecture with intricate panels and cables. Cinematic perspective, surreal atmosphere, blending futuristic technology with timeless themes. Minimal shading, vibrant but muted tones, graphic novel aesthetic, 1970s European sci-fi magazine art."
    Translate this aesthetic into web design: clean lines, bold but muted color palette, plenty of whitespace, subtle geometric accents. The typography should feel slightly retro but readable.

    Claude Code one-shotted it:

    The frontend-design skill gave it the principles for high-quality UI work, and the specific art direction gave it a target aesthetic. It’s an impressive upgrade from my current (soon to be old) website, which I painstakingly created in WordPress:

    Oh and it’s fully mobile optimized.

    Use Skills. Read my Claude Skills post to learn more.

    Migrating the Content

    With all the core pages, and the design done, I just need to import the rest of my blog content and I can launch this.

    This is where I am right now. The migration is still in progress as I publish this post, but here’s the process.

    WordPress gives you an XML export (Tools → Export → All content). The file is… not pretty. I used wordpress-export-to-markdown to convert posts:

    Bash
    npx wordpress-export-to-markdown --input export.xml --output src/content/blog
    

    Then I had Claude Code clean up the output:

    Bash
    Go through each markdown file in src/content/blog and:
    1. Fix image paths to point to src/assets/images
    2. Remove WordPress shortcodes
    3. Convert any embedded forms to <KitForm /> components
    4. Ensure frontmatter matches our schema
    5. Rename to .mdx extension
    

    It handles about 80% automatically. The remaining 20% is edge cases like weird formatting, broken embeds, posts that referenced WordPress-specific features.

    The images are the tedious part. WordPress stores them in wp-content/uploads with date-based folders. I’m downloading them all, dropping them in src/assets/images/, and letting Claude Code update the paths in each post.

    Pro tip: Don’t try to migrate everything at once. I’m doing it in batches of 10 posts at a time, verify they render correctly, commit, repeat. It’s slower but you catch issues early instead of debugging 50 broken posts at once.

    I’ll update this post once the full migration is complete. For now, the old WordPress site is still live (you’re reading it!) at the original domain while I work through the backlog.

    Deploying to Cloudflare Pages

    I looked at three options: Netlify, Vercel, and Cloudflare Pages. All three work great with Astro and have generous free tiers. Here’s why I went with Cloudflare.

    Netlify has a credit-based system that charges for builds, bandwidth, form submissions, and serverless function invocations. Each resource consumes credits from your monthly allowance. It’s flexible, but hard to predict what you’ll actually pay.

    Vercel is more straightforward. The main limit is bandwidth (100GB/month on free). That’s probably fine for most blogs, but it’s still a limit you have to think about.

    Cloudflare Pages has unlimited bandwidth on the free tier. I like unlimited and free. Great combination of words. The only limits are 500 builds per month and 100 custom domains, neither of which I’ll hit.

    Setting Up Cloudflare Pages

    1. Go to Cloudflare Pages
    2. Click “Create a project” → “Connect to Git”
    3. Select your GitHub repo
    4. Cloudflare auto-detects Astro. Build settings should be:
      • Build command: npm run build
      • Build output directory: dist
    5. Click “Save and Deploy”

    Your site is now live at your-project.pages.dev. Every push to main triggers a new deployment automatically.

    To add your custom domain:

    1. In Cloudflare Pages, go to your project → Custom domains
    2. Add your domain
    3. Update your DNS to point to Cloudflare (they’ll walk you through it)

    That’s it. Every git push triggers a build that deploys globally in about 20 seconds.

    The Speed Difference

    Here are my PageSpeed Insights scores:

    WordPress (before):

    • Performance: 67
    • First Contentful Paint: 1.7s
    • Largest Contentful Paint: 7.2s
    • Total Blocking Time: 290ms
    • Speed Index: 4.6s

    Astro on Cloudflare (after):

    • Performance: 100
    • First Contentful Paint: 0.2s
    • Largest Contentful Paint: 0.5s
    • Total Blocking Time: 0ms
    • Speed Index: 0.5s

    A perfect 100 performance score! And LCP went from 7.2 seconds to half a second. That’s a 14x improvement.

    The site just… loads. Instantly. On mobile. On slow connections. Everywhere.

    And I’m not even trying that hard. I haven’t done any advanced optimization. This is just what happens when you ship static HTML instead of a PHP application with twelve plugins.

    My New Content Workflow

    This is the part I’m most excited about.

    Now that my blog is just code, Claude Code now becomes my content creation partner.

    Writing With Claude Code

    I open my blog project in the terminal, start Claude Code, and say “let’s work on a new blog post about X.” Claude Code has access to every post I’ve ever written. It understands my writing style, my formatting patterns, which topics I’ve covered, how I structure tutorials versus opinion pieces.

    I can ask it to suggest topic ideas based on what’s resonating in the AI space, or I can come in with a specific idea and flesh it out together. The conversation might go:

    Bash
    Me: "I want to write about building AI agents from scratch. 
        What angle would complement my existing content?"
    
    Claude: [reviews my posts] "You have the Claude Code guide and the 
        context engineering piece, but nothing on the fundamentals. 
        A 'build a baby coding agent in Python' tutorial would fill 
        that gap and give readers a foundation before your advanced stuff."
    
    Me: "Perfect. Let's outline it."
    

    We go back and forth. I provide direction, Claude Code drafts sections, I course-correct, it refines. When we’re happy with the draft, it creates the MDX file in src/content/blog/ with proper frontmatter.

    Editing and Refining

    The output is a markdown file. I can open it in VS Code, Obsidian, or any text editor and make changes directly. Claude Code’s draft is a starting point, not the final product. I’ll tighten the prose, add personal anecdotes, cut sections that feel fluffy.

    Read more about how I create content with AI while avoiding slop here: My AI Writing Process

    When I’m done editing, I save the file and tell Claude Code to push:

    Bash
    claude> Commit and push the new blog post
    

    Done. Live on my site in 20 seconds.

    The Speed Difference in Practice

    From idea to published post:

    Old workflow (WordPress): 5-6 hours minimum. Lots of context switching between tools, waiting for editors to load, copying embed codes, previewing.

    New workflow (Claude Code + Astro): 2-3 hours for a substantial tutorial like this one. One tool, one conversation, one push. And most of that time is me editing.

    Should You Do this?

    This whole process took me an afternoon. And at the end of it, I realized that I’m not rolling my own CMS. I don’t need any of the standard CMS features.

    This approach works if:

    • You’re a solo blogger or small team
    • You’re comfortable with code (or have an AI coding agent)
    • Your content has one primary destination (your website)
    • You don’t need approval workflows, role-based permissions, or audit trails

    This approach may not work if:

    • You have non-technical team members who need to edit content (although maybe you can train them on this workflow)
    • You need content to flow to multiple destinations like apps and emails (although Claude Code can handle this)
    • You need structured data with relationships and queries
    • You have compliance requirements that need proper audit trails
    • You’re publishing at high velocity with multiple contributors

    Sanity’s right that large teams need a CMS. But for everyone else, the markdown + git + coding agent workflow is so much better.

    My blog is now:

    • Faster – Perfect 100 PageSpeed score, sub-second loads
    • Cheaper – $0/month vs $20/month
    • Simpler – No plugins, no database, no security updates
    • AI-native – Claude Code can create, edit, and publish content directly

    In fact, I’m also in the process of launching my new AI consultancy, Hiyaku Labs, and one of our partners, who is an expert designer, has been struggling with Framer. After showing her what I did with my personal blog, we decided to do the same for Hiyaku Labs, and built the full site in 2 hours.

    Check it out at hiyakulabs.com

    So, if your CMS-hosted site feels sluggish, or you’re struggling to get the design right, or all the extra steps and complexity are frustrating you, it might be worth a shot.

    Your content is just code now.


    I’ll update this post once the full migration is complete with any lessons learned along the way.

  • I Built an AI Chief of Staff That Runs Entirely on My Laptop

    I Built an AI Chief of Staff That Runs Entirely on My Laptop

    I spend way too much time context-switching between client emails, project notes, meeting transcripts, and CRM data. Every morning, I’m asking myself the same questions: What’s urgent? Who needs a response? What deadlines am I forgetting?

    So I built an AI assistant that answers all of this and it runs 100% locally on my MacBook.

    No API costs. No data leaving my machine. Just my documents, my AI, completely private.

    I’m calling it Vault. Not a very creative name but it works!

    Why Local AI Matters

    Here’s the thing about cloud AI: every query you send is data you’re handing over to someone else. When you’re dealing with client information, financial data, or sensitive business documents, that’s a problem.

    And then there’s cost. I’ve burned through hundreds of dollars in OpenAI API credits on projects before. For a personal knowledge base I’m querying dozens of times a day? That adds up fast.

    Local inference solves both problems. Your data stays on your machine, and once you’ve got the model running, queries are essentially free.

    The tradeoff used to be performance. Local models were slow and dumb compared to closed-source SOTA. But that’s changing fast.

    Enter Parallax

    I’m using Parallax from Gradient Network for local inference. It’s a fully decentralized inference engine for local AI models, and the setup is dead simple:

    Python
    git clone https://github.com/GradientHQ/parallax.git
    cd parallax
    
    # Enter Python virtual environment
    python3 -m venv ./venv
    source ./venv/bin/activate
    
    pip install -e '.[mac]'

    Once done, just run parallax run and you can start setting up your AI cluster on localhost:3001. Follow the instructions and you should soon be able to pick one of many LLMs and chat with it!

    The Architecture

    Of course, we’re going to do more with our local AI than just chat. Vault is a RAG (Retrieval-Augmented Generation) system and a personal AI Chief of Staff (glorified executive assistant). The idea is simple:

    1. Ingest documents from my Gmail and Google Drive into a local vector database
    2. Search for relevant chunks when the user asks a question about a project
    3. Generate an answer using the retrieved context and the local AI

    Here’s the high-level flow:

    Python
    Documents (PDF, Email, DOCX, CSV)
    
       [Chunking & Embedding]
    
       ChromaDB Vector Store
    
       Semantic Search
    
       Parallax LLM
    
       Contextual Answer
    

    Let me walk through each component.

    Document Ingestion

    The first challenge: getting all my messy data into a format the AI can work with. I’ve got PDFs, Word docs, email chains, CSV exports from my CRM, JSON files with meeting notes, the works.

    Python
    class DocumentLoader:
        """Load documents from various file formats."""
    
        SUPPORTED_EXTENSIONS = {
            '.pdf', '.md', '.txt', '.docx',  <em># Documents</em>
            '.eml',                           <em># Emails</em>
            '.csv', '.json'                   <em># Data exports</em>
        }
    
        def __init__(self, chunk_size: int = 500, chunk_overlap: int = 50):
            self.chunk_size = chunk_size
            self.chunk_overlap = chunk_overlap
    

    The chunk_size and chunk_overlap parameters are crucial. Too large, and you waste context window space. Too small, and you lose coherence. I landed on 500 characters with 50-character overlap after some experimentation.

    For emails, I extract the metadata into a structured format the AI can reason about:

    Python
    def _load_email(self, path: Path) -> str:
        """Extract content and metadata from .eml files."""
        with open(path, 'rb') as f:
            msg = email.message_from_binary_file(f, policy=policy.default)
    
        parts = [
            f"EMAIL",
            f"From: {msg['From']}",
            f"To: {msg['To']}",
            f"Subject: {msg['Subject']}",
            f"Date: {msg['Date']}",
        ]
    
        <em># Extract body</em>
        body = ""
        if msg.is_multipart():
            for part in msg.walk():
                if part.get_content_type() == "text/plain":
                    body = part.get_content()
                    break
        else:
            body = msg.get_content()
    
        parts.append(f"\n{body}")
        return "\n".join(parts)
    

    This way, when I ask “what emails need my response?”, the AI has all the metadata it needs to give a useful answer.

    Gmail and Google Drive Integration

    I didn’t want to manually export emails and documents every time. So I built integrations for Gmail and Google Drive that sync directly into the knowledge base.

    The Gmail client uses OAuth 2.0 and the Gmail API to fetch messages:

    Python
    class GmailClient:
        """Gmail API client for syncing emails to Vault."""
    
        SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
    
        def fetch_messages(self, days_back: int = 30, query: str = "",
                           max_results: int = 500) -> list[dict]:
            """Fetch messages from Gmail."""
            after_date = (datetime.now() - timedelta(days=days_back)).strftime('%Y/%m/%d')
            search_query = f"after:{after_date}"
            if query:
                search_query += f" {query}"
    
            results = self.service.users().messages().list(
                userId='me', q=search_query, maxResults=max_results
            ).execute()
    
            messages = []
            for msg_info in results.get('messages', []):
                msg = self.service.users().messages().get(
                    userId='me', id=msg_info['id'], format='full'
                ).execute()
                messages.append(self._parse_message(msg))
    
            return messages
    

    Google Drive works similarly. It fetches documents, exports Google Docs to plain text, and downloads supported file types:

    Python
    class DriveClient:
        """Google Drive client for syncing documents to Vault."""
    
        SUPPORTED_MIME_TYPES = {
            'application/pdf': '.pdf',
            'text/plain': '.txt',
            'text/markdown': '.md',
            'application/vnd.google-apps.document': '.gdoc',  <em># Export as text</em>
        }
    
        def sync_to_vault(self, vectorstore, loader, folder_id=None,
                          days_back: int = 30) -> int:
            """Sync Drive files to the knowledge base."""
            files = self.list_files(folder_id=folder_id, days_back=days_back)
    
            for file_info in files:
                content = self.download_file(file_info['id'], file_info['mimeType'])
                chunks = loader.chunk_text(content, source=f"drive://{file_info['name']}")
                vectorstore.add_chunks(chunks)
    
            return len(files)
    

    Now I can run python main.py sync gmail and have the last 30 days of emails indexed in seconds.

    Vector Storage with ChromaDB

    ChromaDB handles embedding and similarity search. The nice thing is it includes a default embedding model, so you don’t need to set up a separate embedding service:

    Python
    class VectorStore:
        """ChromaDB-based vector store for document chunks."""
    
        def __init__(self, persist_dir: str = "./data/chromadb"):
            self.persist_dir = Path(persist_dir)
            self.persist_dir.mkdir(parents=True, exist_ok=True)
    
            self.client = chromadb.PersistentClient(
                path=str(self.persist_dir),
                settings=Settings(anonymized_telemetry=False)
            )
    
            self.collection = self.client.get_or_create_collection(
                name="knowledge_base",
                metadata={"hnsw:space": "cosine"}
            )
    

    I’m using cosine similarity for the HNSW index. It works well for semantic search and is the standard choice for most text embedding models.

    Searching is a single method call:

    Python
    def search(self, query: str, n_results: int = 5) -> list[dict]:
        """Search for similar chunks."""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results,
            include=["documents", "metadatas", "distances"]
        )
        <em># ... process results</em>
    

    ChromaDB returns the top N most similar chunks, along with their similarity scores. We pass these to the LLM as context.

    The RAG Engine

    This is where it all comes together. The RAG engine:

    1. Takes a user question
    2. Retrieves relevant context from the vector store
    3. Builds a prompt with the context
    4. Sends it to Parallax for generation
    5. Streams the response back
    Python
    class RAGEngine:
        """Retrieval-Augmented Generation engine with conversation memory."""
    
        SYSTEM_PROMPT = """You are Vault, a helpful AI assistant with access to a personal knowledge base.
    Your role is to answer questions based on the provided context from the user's documents.
    
    Guidelines:
    - Answer based primarily on the provided context
    - If the context doesn't contain enough information, say so clearly
    - Cite the source documents when relevant
    - Be concise but thorough"""
    
        def build_context(self, query: str) -> tuple[str, list[dict]]:
            """Retrieve relevant context for a query."""
            results = self.vectorstore.search(query, n_results=self.n_results)
    
            if not results:
                return "", []
    
            context_parts = []
            for i, result in enumerate(results, 1):
                source = result['source'].split('/')[-1]
                context_parts.append(
                    f"[{i}] Source: {source}\n{result['content']}"
                )
    
            return "\n\n---\n\n".join(context_parts), results

    Streaming Responses

    Nobody wants to wait 30 seconds staring at a blank screen. Parallax supports streaming, so we can show tokens as they’re generated:

    Python
    def _stream_chat(self, payload: dict) -> Generator[str, None, None]:
        """Streaming chat completion."""
        with httpx.Client(timeout=120.0) as client:
            with client.stream("POST", self.api_url, json=payload) as response:
                for line in response.iter_lines():
                    if line.startswith("data: "):
                        data_str = line[6:]
                        if data_str.strip() == "[DONE]":
                            break
                        data = json.loads(data_str)
                        content = data['choices'][0].get('delta', {}).get('content', '')
                        if content:
                            yield content
    

    This is the same SSE (Server-Sent Events) format that OpenAI uses, so if you’ve worked with their streaming API before, this will look familiar.

    Conversation Memory

    A knowledge base isn’t very useful if you have to re-explain context every query. I added a simple conversation history that gets included in subsequent prompts:

    Python
    @dataclass
    class ChatSession:
        """Maintains conversation history for multi-turn chat."""
        messages: list[ChatMessage] = field(default_factory=list)
        max_history: int = 10
    
        def get_history_summary(self) -> str:
            """Get a summary of recent conversation for context."""
            summary_parts = []
            for msg in self.messages[-6:]:  <em># Last 3 exchanges</em>
                prefix = "User" if msg.role == "user" else "Assistant"
                content = msg.content[:200] + "..." if len(msg.content) > 200 else msg.content
                summary_parts.append(f"{prefix}: {content}")
            return "\n".join(summary_parts)
    

    Now I can ask “what’s the deadline for Mike’s project?” and follow up with “what did we discuss in the last meeting?” without re-specifying who Mike is.

    The Chief of Staff Experience

    This is where it gets interesting. With all my client data indexed (emails, meeting notes, CRM exports, project briefs) I can just ask natural questions:

    Python
    > What are my priorities for today?
    
    Based on your documents, here are today's priorities:
    
    1. **Acme Corp** - API integration deadline tomorrow. John emailed about
       OAuth token issues that need resolving before the Phase 2 delivery.
    
    2. **TechStart Inc** - Proposal due Friday for the mobile app project.
       Sarah wants a follow-up on the architecture recommendations.
    
    3. **GlobalBank** - URGENT: Mike emailed about the compliance review.
       They need the security documentation before end of week.
    
    4. **RetailMax** - Payment gateway integration due next week.
       Lisa asked about supporting Apple Pay in addition to Stripe.
    

    This is just semantic search + LLM reasoning over my actual documents.

    I can drill down:

    Python
    > What did Mike say about the compliance issue?
    
    According to your email from Mike (Dec 4), GlobalBank's compliance team
    has approved the project to move forward. He needs you to confirm Q1
    availability by December 10th and will send updated security requirements
    by December 15th.
    

    Or get synthesis across sources:

    Python
    > What's the status of all my active projects?
    
    You have 4 active projects:
    
    1. **Acme Corp** - API Integration, in progress, $150,000 (deadline Dec 15)
    2. **TechStart Inc** - Website Redesign, complete, awaiting final invoice
    3. **GlobalBank** - Mobile App, on hold until January, $280,000
    4. **RetailMax** - E-commerce Platform, in progress, $95,000

    Next Steps

    There are lots of directions I can take this. For starters, I’m using a pretty small model, and it still works great for answering questions about my projects, identifying my priorities and keeping me on track, and even writing emails.

    But I can build it into something more powerful that can help me with actually executing on projects, delivering client work, and so on. I just need to add more nodes and use a bigger model.

    As I continue building it out, I’ll add to this post. Sign up below to stay tuned!

  • Gemini 3 Pro: The best AI Model, by a mile

    Gemini 3 Pro: The best AI Model, by a mile

    I’m really excited by this one. When Gemini 2.5 Pro came out months ago, it was incredible, but Anthropic and OpenAI caught up quickly.

    Gemini 3 is something else altogether. It’s head and shoulders above the rest.

    I built a personal AI boxing coach that tracks my hand movements through my computer’s camera in real-time and gives me feedback on my punching combinations. It generated the entire working app in about two minutes from a single vague prompt.

    I’ll show you exactly how that works later in this post. But first, let’s look at what makes Gemini 3 different from previous models, and how it compares to Claude and GPT-5.

    PS: If you like videos, see my full breakdown here – https://youtu.be/2XKJPurzyFs

    Benchmarks

    Ok, benchmarks can be gamed and shouldn’t be used as the only metric for model selection, but it still gives us a directionally correct view of a model’s capabilities and how it compares with others.

    Gemini 3 Pro hit 37.5% on Humanity’s Last Exam without using any tools. This benchmark tests PhD-level reasoning across science, math, and complex problem-solving. A score of 37.5% means it’s solving problems that would stump most humans with advanced degrees. For context, GPT-5 scored 26.5% on the same test.

    The GPQA Diamond benchmark tells us even more about the model’s capabilities. Gemini 3 scored 91.9% on questions requiring graduate-level knowledge in physics, chemistry, and biology, putting it quite ahead of the others.

    The 23.4% score on MathArena Apex is particularly impressive because this benchmark specifically tests advanced mathematical reasoning. Other models struggled to break single digits on this test.

    This matters more than you might think. Mathematical reasoning underlies so much of what we ask AI to do, from analyzing data to writing algorithms to solving optimization problems. A model that can handle complex math can handle the logical reasoning required for most technical tasks.

    But the benchmark that matters most for my work is coding performance. Gemini 3 Pro scored 54.2% on Terminal-Bench 2.0, far ahead of the next best, which tests a model’s ability to operate a computer via terminal. This benchmark is about understanding how to navigate file systems, run commands, debug errors, install dependencies, and actually operate like a developer would.

    How It Compares to Claude and GPT-5

    Before Gemini 3, my workflow was split between models based on their strengths. Claude 4.5 Sonnet was my primary coding and writing model. The reasoning was solid, the code quality was reliable, and it rarely needed multiple iterations to get things right. It understood context well and made reasonable architectural decisions.

    GPT-5 handled everything else. Data analysis, structured tasks, anything that required processing large amounts of information quickly and presenting it in organized formats.

    Now with Gemini 3, I’m testing whether I can consolidate to a single model. The early signs are promising. The coding quality matches or exceeds Claude for the tests I’ve run so far. The reasoning feels tighter and more consistent than GPT-5. The multimodal understanding (working with images, video, and text simultaneously) is better than either competitor.

    And it’s cheaper.

    I’ll spend the next few days pushing it harder to see if these early positive impressions hold up under sustained use, but this is the first model in months that feels like it might be genuinely all-in-one capable rather than best-in-class for specific tasks.

    What I Actually Built With It

    To properly test Gemini 3’s capabilities, I needed to move beyond simple prompts and build something with real complexity. I wanted to see how it handled tasks that require understanding vague requirements, making architectural decisions, and implementing features that involve multiple moving parts.

    The Boxing Coach Demo

    I gave it this prompt: “Build me an app that is a boxing teacher, use my computer’s camera to track my hands, display on the screen some image to tell me what combination to throw, maybe paddles, and then track my hand hitting the objects.”

    This is a deliberately vague prompt. I’m describing the outcome I want without specifying the implementation details. I’m not telling it which computer vision library to use, how to structure the tracking logic, what the UI should look like, or how to handle the timing of combinations.

    Gemini 3 understood what I was asking for and went several steps further. It built real-time computer vision tracking using the computer’s camera, which is non-trivial to implement correctly. It overlaid target indicators on the screen that show where to punch.

    But it also recognized that this was meant to be a training tool, not just a detection system, so it added a complete scoring system to track accuracy, a streak counter to gamify the experience and keep you motivated, estimated calorie burn based on the activity, and multiple difficulty levels labeled “light,” “fighter,” and “champion.”

    The entire implementation took about two minutes and worked on the first try. No debugging. No iteration. It one-shotted a complex implementation that involved computer vision, real-time tracking, UI overlay, game logic, scoring mechanics, and even some basic exercise physiology calculations.

    The Personal Finance Tracker

    For the second test, I wanted to see how it handled a more practical business application. I asked it to build a personal finance expense tracker that uses AI to look at screenshots or uploaded receipts and automatically categorizes expenses.

    Gemini 3 figured out the architecture it would need (frontend interface for uploading receipts, backend processing to handle the files, AI integration for optical character recognition and categorization logic), and started building all the components.

    The receipt scanning hit an edge case during my demo. I uploaded an Apple HEIC image format and the code expected JPEG. So it’s not a God model but it’s also the kind of thing that’s trivial to fix once you identify it.

    When I uploaded a JPEG instead, it worked correctly. The model extracted the merchant name, the amount, the date, and made a reasonable guess at categorizing the expense.

    This tells me something important about the current state of AI coding assistants. Gemini 3 can build production-quality architecture and implement complex features correctly. It understands the problem domain well enough to make good decisions about structure and flow. But it still makes assumptions about inputs and edge cases that you’d catch in code review. It’s not replacing careful testing and validation, but it’s dramatically reducing the time from idea to working prototype.

    City Building Game

    The final one was a longer project, and by longer I mean it took maybe an hour of me going back and forth with Gemini 3.

    The amazing thing is that Gemini generated the base game right out of the box with one prompt – “Build me a threejs medieval city building game”

    That’s it. Most of what you see in the video above was generated from that one prompt. After that, it was mostly fine-tuning work, with me giving it some direction on the design, telling it to add new building types, add a season system, a population and happiness system, and so on.

    And the amazing thing was all my time was spent just adding new features and systems, or updating UI/UX, instead of fighting bugs because there were none!

    I cannot express to you how incredible it is that Gemini could build and expand on a codebase like this without it falling apart or breaking, even in some minor way.

    The Five Ways to Access Gemini 3

    Google being Google has like 6 or 7 different apps and platforms from which you can access the model and some of these have the same name which is confusing as hell. But I digress.

    AI Mode in Google Search

    This is the first time Google has shipped a new Gemini model directly into Google Search on day one. That’s a significant shift in strategy. Previous models launched in limited betas, gradually rolling out to small groups of users while Google monitored for problems. This is a full production deployment to billions of users immediately, which signals a level of confidence in the model’s reliability that wasn’t there for previous releases.

    AI Mode introduces “generative interfaces” that automatically design customized user experiences based on your prompt. Upload a PDF about DNA replication and it might generate both a text explanation and an interactive simulation showing how base pairs split and replicate. Ask about travel planning and it generates a magazine-style interface with photos, modules, and interactive prompts asking about your preferences for activities and dining.

    The model is making UI decisions on the fly. It’s deciding “this question would be better answered with an interactive calculator” or “this needs a visual timeline” and then building those interfaces in real-time. This is something that Perplexity has been trying to do for a while, and Google just came out and nailed it.

    The Gemini App

    This is the ChatGPT-equivalent interface available at gemini.google.com. You’ll want to select “Thinking” mode to use Gemini 3 Pro rather than the faster but less capable Flash model.

    I tested the creative writing capabilities by asking it to write about Gemini 3 in the style of a science fiction novel. The output started with “The whispers began as a faint hum, a resonance in the deep network…” and maintained that tone throughout several paragraphs.

    What struck me was how it avoided the typical AI writing tells. You know the ones I’m talking about. The “it’s not just X, it’s Y” construction that appears in every ChatGPT essay. The overuse of em-dashes that no human writer actually uses that frequently. The breathless hype that creeps into every topic, making even mundane subjects sound like earth-shattering revelations.

    Gemini 3’s output felt notably cleaner. More measured. Less like it was trying to convince me how excited I should be about the topic.

    I still wouldn’t publish it without editing (it’s AI-generated prose, not literature) but it doesn’t immediately announce itself as AI-written the way GPT outputs tend to do. That matters if you’re using AI as part of your writing process rather than as a complete replacement for human writing.

    AI Studio for Rapid Prototyping

    This is Google’s developer playground with a “Build Mode” that’s particularly useful for quick prototyping. If you’re a product manager who needs to see three variations of a feature before your next standup, or a designer who wants to test an interaction pattern before committing to a full implementation, this is where you go.

    Everything runs in the browser. You can test it immediately, see what works and what doesn’t, modify the code inline, and then download the result or push it directly to GitHub. The iteration loop is fast enough that you can explore multiple approaches in the time it would normally take to carefully code one version.

    This is where I built the boxing coach demo. I pasted in my prompt, it generated all the code, and I could immediately test it in the browser to see the camera tracking and UI overlays working in real-time.

    Gemini CLI for Development Work

    The Gemini CLI is similar to Claude Code, a command-line interface where you can ask it to build applications and it creates all the necessary files, writes the code, and sets up the project structure.

    This is where I built the personal finance tracker. I gave it one prompt describing what I wanted, and it figured out the requirements, came up with an implementation plan, asked for my Google Gemini API key (which it would need for the receipt processing functionality), and started generating files.

    The CLI is better than the AI Studio for anything beyond frontend prototypes. If you need backend services, database schemas, API integrations, or multi-file projects with proper separation of concerns, this is the right tool for the job. It understands project structure and can scaffold out complete applications rather than single-file demos.

    Google Antigravity

    Antigravity is Google’s new agentic development platform where AI agents can autonomously plan and execute complex software tasks across your editor, terminal, and browser.

    It looks like a Visual Studio Code fork, file explorer on the left, code editor in the middle, agent chat panel on the right. The interface is familiar if you’ve used any modern IDE. You can power it with Gemini 3, Anthropic’s Claude Sonnet 4.5, or OpenAI’s GPT-OSS models, which is an interesting choice. Google built an IDE and made it model-agnostic rather than locking it to their own models.

    The feature that sets Antigravity apart is Agent Manager mode. Instead of working directly in the code editor with AI assistance responding to your prompts, you can spin up multiple independent agents that run tasks in parallel. You could have one agent researching best practices for building personal finance apps, another working on the frontend implementation, and a third handling backend architecture, all running simultaneously without you needing to context-switch between them.

    This isn’t drastically different from running multiple tasks sequentially in the CLI. The underlying capability is similar. The value is in the interface. You can see what’s happening across all the agents in one view, manage them from a single place, and stay in the development environment instead of switching between terminal windows. It’s the same core capability wrapped in significantly better user experience.

    I’m planning a full deep dive on Antigravity because there’s more to explore here. Subscribe below to read it.

    Where This Fits in the AI Race

    The AI model race is now operating on a cadence where major releases from all three companies happen within weeks of each other. Each release raises the baseline for what’s expected from frontier models. Features that were impressive and novel six months ago are now table stakes that every competitive model needs to match.

    What’s interesting about Gemini 3 is that it’s not just incrementally better in one dimension. It’s showing meaningful improvements across multiple dimensions simultaneously. Better reasoning, better coding, better multimodal understanding, better interface generation.

    That’s rare. Usually you get big improvements in one area at the cost of regressions elsewhere, or small improvements across the board. Genuine leaps across multiple capabilities at once are uncommon.

    What I’m Testing Next

    I’m planning to use Gemini 3 as my primary model for the next week to see if the early positive impressions hold up under sustained use. The areas I’m specifically testing are code quality on complex refactoring tasks, reasoning performance on strategic planning problems, and reliability when building multi-file projects with proper architecture.

    I’m also diving deeper into Antigravity to understand how the multi-agent system handles coordination, how they handle conflicts when multiple agents are working on related code, and how reliable they are when running unsupervised for extended periods.

    The boxing coach and finance tracker were quick tests to see if it could handle real-time complexity and practical business logic. Next I want to see how it performs on the kind of work I do daily, building AI agents, writing technical documentation, debugging production issues, and architecting new systems from scratch.

    If it holds up across these more demanding tests, this might actually become the all-in-one model I’ve been waiting for. The real test is whether it’s still impressive after a week of daily use when the novelty has worn off.

    Have you tried Gemini 3 yet? What are you planning to build with it?

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.

  • Building an AI-Powered Market Research Agent With Parallel AI

    Building an AI-Powered Market Research Agent With Parallel AI

    When Parallel AI first announced their Deep Research API, I was intrigued. I played around with it and thought it did a great job. Of course, I pay for ChatGPT, Claude, and Gemini so I don’t really have need for another Deep Research product.

    And I’ve already built my own.

    So I set Parallel aside… until last week when they announced their Search API. My go-to for search so far has been Exa but I decided to test Parallel out for a new project I’m working on with a VC client, and I’m very impressed.

    The client wants a fully automated due diligence system. This isn’t to say they’re going to not do their own research but to do it all manually is tedious and takes dozens of analyst hours. In fact, many VCs skip this step altogether (which is why we see such insane funding rounds).

    Part of that system is conducting market research, identifying competitors in the space, where the gaps are, and how the startup they’re interested in is positioned.

    So the build I’m going to show you is a simplified version of that. Enter any startup URL and get a complete competitive analysis in a couple of minutes.

    What We’re Building

    In this tutorial, you’ll build a VC Market Research Agent that automates the entire due diligence process. Give it a startup URL, and it will:

    1. Understand the target – Extract what the company does, who they serve, and how they position themselves

    2. Find the competitors – Discover all players in the market using AI-powered search (not just the obvious ones)

    3. Analyze the landscape – Deep dive into each competitor’s strengths, weaknesses, and positioning

    4. Identify opportunities – Find the whitespaces where no one is competing effectively

    5. Generate a report – Create an investment-ready markdown document with actionable insights

    WHY Parallel?

    Parallel has a proprietary web index which they’ve apparently been building for two years. The Search API is built on top of that and designed for AI. This means it isn’t optimizing for content that a human might click on (like Google does) but content that will fully answer the task the AI is given.

    So their search architecture goes beyond keyword matching into semantic meaning, and they prioritize pages most relevant to their objective, rather than one optimized for human engagement.

    Exa is built this way too but according to the performance benchmarks, Parallel has the highest accuracy for the lowest cost.

    This is why we’re using Parallel AI. It’s specifically built for complex research tasks like this, with 47% accuracy compared to 45% for GPT-5 and 30% for Perplexity, but the cost is a lot lower.

    The Architecture: How This Agent Works

    Here’s the mental model for what we’re building:

    Bash
    Startup URL  Analyze  Discover Competitors  Analyze Each  Find Gaps  Report

    Simple, right? But each step needs careful orchestration. Let me break down the key components:

    1. Market Discovery — The detective that finds competitors

    • Uses Parallel AI’s Search API to find articles mentioning competitors
    • Extracts company names from those articles
    • Verifies each company has a real website (no hallucinations!)

    2. Competitor Analysis — The analyst that digs deep

    • Visits each competitor’s website
    • Uses Parallel AI’s Extract API to pull structured information
    • Identifies strengths, weaknesses, and positioning

    3. Opportunity Finder — The strategist that spots whitespace

    • Compares all competitors to find patterns
    • Identifies what everyone does well (table stakes)
    • Identifies what everyone struggles with (opportunities)

    4. Report Generator — The writer that synthesizes everything

    • Takes all the raw data and creates a readable narrative
    • Adds an executive summary for busy partners
    • Includes actionable recommendations

    Now let’s build each piece.

    What You’ll Need

    Before we start building, grab these:

    • Parallel AI API key – You get a bunch of free credits when you sign up
    • OpenAI API key – We’re using GPT-4o-mini to save costs on this POC as we have a number of API calls.
    • 30-60 minutes – Grab coffee, this is fun

    Let’s start with the basics. Create a new directory and set up a clean Python environment, then install dependencies:

    Bash
    pip install requests openai python-dotenv

    Now create the project structure:

    Bash
    mkdir data reports
    touch main.py market_discovery.py competitor_analysis.py report_generator.py

    Here’s what each file does:

    • main.py – The orchestrator that runs everything
    • market_discovery.py – Uses Parallel AI’s Search API to find articles mentioning competitors, and extracts them
    • competitor_analysis.py – Uses Parallel AI’s Extract API to analyze each competitor’s strengths/weaknesses
    • report_generator.py – Creates the final markdown report
    • data/ – Stores intermediate JSON files (useful for debugging)
    • reports/ – Where your final reports go

    STEP 1: Understanding the Target Startup

    Now let’s get into the code. Before we can find competitors, we need to understand what the target company actually does. Sounds obvious, but you can’t just scrape the homepage and call it a day.

    Companies describe themselves in marketing speak. “We’re transforming the future of enterprise cloud infrastructure with AI-powered solutions” tells you nothing useful for finding competitors.

    What we need:

    • Actual product description – What do they sell?
    • Target market – Who buys it?
    • Category – How would you search for alternatives?
    • Keywords – What terms would appear on competitor sites?

    Now here’s where Parallel AI’s Extract API shines. Instead of writing complex HTML parsers, we just tell it what we want and it figures out the rest.

    Open `market_discovery.py` and add this:

    Python
    response = requests.post(
            "https://api.parallel.ai/v1beta/extract",
            headers=self.headers,
            json={
                "urls": [url],
                "objective": """Extract company information:
                - Company name
                - Product/service offering
                - Target market/customers
                - Key features or value propositions
                - Industry category""",
                "excerpts": True,
                "full_content": True
            }
        )

    Let’s also extract this out into a JSON:

    Python
    structured_response = self.openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": f"""Extract structured info from this website content:
    {content}
    
    Return JSON with:
    - name: company name
    - description: 2-3 sentences about what they do
    - category: market category (e.g., "AI Infrastructure")
    - target_market: who their customers are
    - key_features: list of main features
    - keywords: 5-10 keywords for finding competitors
    
    Respond ONLY with valid JSON."""
            }],
            temperature=0.3
        )

    STEP 2: Finding the Competitors

    Now for the hard part: discovering every competitor in the market. This is where most tools fail. They either:

    • Return blog posts instead of companies
    • Miss important players
    • Include tangentially related companies
    • Hallucinate companies that don’t exist

    We’re going to solve this with a four-step verification process. It’s more complex than a single API call, but it works reliably.

    First, we’ll use the Parallel Search endpoint to find articles that talk about products in the space we’re exploring. They’ve already done the research, so we’ll piggyback on it.

    We just need to give Parallel a search objective and it figures out how to do searches. Play around with the prompt here until you find something that works:

    Python
    search_objective = f"""Find articles about {category} products like {name}
    Focus on list articles and product comparison reviews. EXCLUDE general industry overviews."""
    
    search_response = requests.post(
        "https://api.parallel.ai/v1beta/search",
        headers=self.headers,
        json={
                "objective": search_objective,
                "search_queries": keywords,
                "max_results": 5,
                "mode": "one-shot"
        }
    )

    Then, we extract company names from those articles using Parallel AI’s Extract API, which pulls out relevant snippets mentioning competitors.

    Python
    extract_response = requests.post(
            "https://api.parallel.ai/v1beta/extract",
            headers=self.headers,
            json={
                "urls": article_urls,
                "objective": f"""Extract company names mentioned as competitors in {category}.
    
    For each company: name, brief description, website URL if mentioned.
    Focus on actual companies, not blog posts.""",
                "excerpts": True
            }
        )

    As before, we use GPT-4o to parse the information out:

    Python
    companies_response = self.openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "system",
                "content": "You are a market analyst identifying DIRECT competitors. Be strict."
            }, {
                "role": "user",
                "content": f"""Extract ONLY direct competitors to: {description}
    
    CONTENT:
    {combined_content}
    
    Return JSON array:
    [{{"name": "Company", "description": "what they do", "likely_domain": "example.com"}}]
    
    RULES:
    - Only companies with THE SAME product type
    - Exclude tangentially related companies
    - Limit to {max_competitors} most competitive
    
    Respond ONLY with valid JSON."""
            }],
            temperature=0.3
        )

    And finally, we verify each company has a real website. We skip LinkedIn, Crunchbase, and Wikipedia because we want the actual company website, not their company profile.

    Python
    competitors = []
    seen_domains = set()
    
    for company in company_list[:max_competitors]:
        search_query = f"{company['name']} {company.get('likely_domain', '')} official website"
       website_search = requests.post(
            "https://api.parallel.ai/v1beta/search",
            headers=self.headers,
            json={
                    "search_queries": [search_query],
                    "max_results": 3,
                    "mode": "agentic"
            }
        )
    
        for result in website_search.json()["results"]:
            url = result["url"]
            domain = url.split("//")[1].split("/")[0].replace("www.", "")
    
            # Skip non-company sites
            if any(skip in domain for skip in ["linkedin", "crunchbase", "wikipedia"]):
                continue
    
            if domain not in seen_domains:
                seen_domains.add(domain)
                competitors.append({
                    "name": company["name"],
                    "website": url,
                    "description": company.get("description", "")
                })
                break
    
    return competitors

    STEP 3: Analyzing Each Competitor

    Now that we have a list of competitors, we need to analyze each one. This is where we dig deep into strengths, weaknesses, and positioning.

    For each competitor, we want to know:

    • What do they offer?
    • Who are their customers?
    • What are they good at? (strengths)
    • Where do they fall short? (weaknesses)
    • How do they position themselves?
    • How do they compare to our target startup?

    Add this to `competitor_analysis.py`:

    Python
    response = requests.post(
            f"{self.base_url}/v1beta/extract",
            headers=self.headers,
            json={
                "urls": [url],
                "objective": f"""Extract information about {name}:
                - Product offerings and features
                - Target market and customers
                - Pricing (if available)
                - Unique selling points
                - Technology stack (if mentioned)
                - Case studies or testimonials""",
                "excerpts": True,
                "full_content": True
            }
        )

    And now you know the drill, we call GPT-4o to parse it out:

    Python
    analysis = self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "system",
                "content": "You are a VC analyst conducting competitive analysis."
            }, {
                "role": "user",
                "content": f"""Analyze this competitor relative to our target startup.
    
    TARGET STARTUP:
    Name: {target_startup['name']}
    Description: {target_startup['description']}
    Category: {target_startup['category']}
    
    COMPETITOR: {name}
    WEBSITE CONTENT:
    {content[:6000]}
    
    Provide JSON with:
    - product_overview: what they offer
    - target_customers: who they serve
    - key_features: array of main features
    - strengths: array of 3-5 competitive advantages
    - weaknesses: array of 3-5 gaps or weak points
    - pricing_model: how they charge (if known)
    - market_position: their positioning (e.g., "Enterprise leader")
    - comparison_to_target: 2-3 sentences comparing to target
    
    Be objective. Identify both what they do well AND poorly.
    
    Respond ONLY with valid JSON."""
            }],
            temperature=0.4
        )

    Step 4: Finding Market Whitespace

    The hard work is done, we just need to pass all the information we’ve collected to OpenAI to analyze it and find whitespace. This is really just prompt engineering:

    Python
    competitor_summary = [{
            "name": comp["name"],
            "strengths": comp.get("strengths", []),
            "weaknesses": comp.get("weaknesses", []),
            "market_position": comp.get("market_position", ""),
            "features": comp.get("key_features", [])
        } for comp in competitors if comp.get("strengths")]
    
        # Identify patterns and gaps
        analysis = self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "system",
                "content": "You are a senior VC analyst identifying market opportunities."
            }, {
                "role": "user",
                "content": f"""Analyze this market for opportunities.
    
    TARGET STARTUP:
    {target_startup}
    
    COMPETITORS:
    {json.dumps(competitor_summary, indent=2)}
    
    Return JSON with:
    
    1. market_overview:
       - total_competitors: count
       - market_maturity: "emerging"/"growing"/"mature"
       - key_trends: array of trends
    
    2. competitor_patterns:
       - common_strengths: what most do well
       - common_weaknesses: shared gaps
       - positioning_clusters: how they group
    
    3. whitespaces: array of opportunities:
       - opportunity: the gap
       - why_exists: why unfilled
       - potential_value: business impact
       - difficulty: "low"/"medium"/"high"
    
    4. target_startup_positioning:
       - competitive_advantages: what target does better
       - vulnerability_areas: where at risk
       - recommended_strategy: how to win (2-3 sentences)
    
    Respond ONLY with valid JSON."""
            }],
            temperature=0.5
        )

    Step 5: Generating our report and tying it all together

    Our fifth and final step is putting together the report for the Investment Committee. We already have all the content we need for the report so it’s really just a matter of formatting it the right way:

    Python
    for comp in competitors:
            if not comp.get("strengths"):
                continue
    
            report += f"### {comp['name']}\n\n"
            report += f"**Website:** {comp['website']}\n\n"
            report += f"**Overview:** {comp.get('product_overview', 'N/A')}\n\n"
    
            report += "**Strengths:**\n"
            for strength in comp.get('strengths', []):
                report += f"- ✓ {strength}\n"
    
            report += "\n**Weaknesses:**\n"
            for weakness in comp.get('weaknesses', []):
                report += f"- ✗ {weakness}\n"
    
            report += f"\n**Comparison:** {comp.get('comparison_to_target', 'N/A')}\n\n"
            report += "---\n\n"
    
        # Add market whitespaces
        report += "## Market Opportunities\n\n"
        for opportunity in analysis.get('whitespaces', []):
            report += f"### {opportunity.get('opportunity', 'Unknown')}\n\n"
            report += f"**Why it exists:** {opportunity.get('why_exists', 'N/A')}\n\n"
            report += f"**Potential value:** {opportunity.get('potential_value', 'N/A')}\n\n"
            report += f"**Difficulty:** {opportunity.get('difficulty', 'Unknown')}\n\n"
    
        # Add strategic recommendations
        positioning = analysis.get('target_startup_positioning', {})
        report += f"## Strategic Recommendations\n\n"
        report += f"**Recommended Strategy:** {positioning.get('recommended_strategy', 'N/A')}\n\n"
        report += f"### Competitive Advantages\n\n"
        for adv in positioning.get('competitive_advantages', []):
            report += f"- {adv}\n"
    
        report += f"\n### Areas of Vulnerability\n\n"
        for vuln in positioning.get('vulnerability_areas', []):
            report += f"- {vuln}\n"

    We can also add an executive summary at the top which GPT-4 can generate for us from all the content.

    Python
    response = self.openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "system",
                "content": "You are a VC analyst writing executive summaries."
            }, {
                "role": "user",
                "content": f"""Write a 4-5 paragraph executive summary.
    
    STARTUP: {startup['name']} - {startup['description']}
    COMPETITORS: {len(competitors)} analyzed
    ANALYSIS: {analysis.get('market_overview', {})}
    
    Cover:
    1. Market opportunity
    2. Competitive dynamics
    3. Key whitespaces
    4. Target positioning
    5. Investment implications
    
    Professional, data-driven tone for VCs."""
            }],
            temperature=0.6
        )
    
        return response.choices[0].message.content

    And finally, we can create a main.py file that calls each step sequentially and passes data along. We also save our data to a folder in case something goes wrong along the way.

    Python
    # Step 1: Analyze target
        print("STEP 1: Analyzing target startup")
        startup_info = discovery.analyze_startup(startup_url, startup_name)
        print(f"✓ Analyzed {startup_info['name']} ({startup_info['category']})\n")
    
        # Save intermediate data
        with open(f"data/startup_info_{timestamp}.json", "w") as f:
            json.dump(startup_info, f, indent=2)
    
        # Step 2: Discover competitors
        print("STEP 2: Discovering competitors")
        competitors = discovery.discover_competitors(
            startup_info['category'],
            startup_info['description'],
            startup_info.get('keywords', [])
        )
        print(f"✓ Found {len(competitors)} competitors\n")
    
        with open(f"data/competitors_{timestamp}.json", "w") as f:
            json.dump(competitors, f, indent=2)
    
        # Step 3: Analyze each competitor
        print("STEP 3: Analyzing competitors")
        competitor_details = []
        for i, comp in enumerate(competitors, 1):
            print(f"  {i}/{len(competitors)}: {comp['name']}")
            details = analyzer.analyze_competitor(
                comp['website'],
                comp['name'],
                startup_info
            )
            competitor_details.append(details)
    
        print(f"✓ Completed competitor analysis\n")
    
        with open(f"data/competitor_analysis_{timestamp}.json", "w") as f:
            json.dump(competitor_details, f, indent=2)
    
        # Step 4: Identify market gaps
        print("STEP 4: Identifying market opportunities")
        market_analysis = analyzer.identify_market_gaps(
            startup_info,
            competitor_details
        )
        print(f"✓ Identified {len(market_analysis.get('whitespaces', []))} opportunities\n")
    
        with open(f"data/market_analysis_{timestamp}.json", "w") as f:
            json.dump(market_analysis, f, indent=2)
    
        # Step 5: Generate report
        print("STEP 5: Generating report")
        report_path = reporter.generate_report(
            startup_info,
            competitor_details,
            market_analysis
        )

    When I ran it on Parallel itself, I got a really good research report along with competitors like Exa and Firecrawl, plus gaps in the market.

    Extending Our System

    This POC is fairly basic so I encourage you to try other things, starting with better prompts for Parallel.

    For my client I’m extending this system by:

    • Adding more searches. Right now I’m looking for articles but I also want to search sites like ProductHunt and Reddit, PR announcements, and more
    • Enriching with founder information and funding data
    • Adding visualizations to create market maps and competitive matrices

    And while this is specific to VCs, there are so many other use cases that need search built-in, from people search for hiring to context retrieval for AI agents.

    You can try their APIs in the playground, and if you need any help, reach out to me!

  • Cartesia AI Tutorial: Build an AI Podcast Generator

    Cartesia AI Tutorial: Build an AI Podcast Generator

    I was talking to a friend recently about an idea he had for generating AI podcasts in the format of How I Built This. He wanted to be able to just enter the name of a company and get a podcast on all the details of how it was started, on demand.

    One way I’d build a system like this is first running deep research on the company, then turning it all into an engaging podcast script, and then finally converting that into a podcast with a voice AI.

    The weakest link of that system is the voice AI. More specifically, how do you generate a voice that can keep listeners engaged for an hour. And how do you do it cost effectively.

    That’s what drew me to Cartesia. Their most recent model sounds very life-like (especially in English, the other languages feel a bit flat) with the ability to play with the speed and emotion. And after meeting the CEO in a recent meetup, I decided to play around with it.

    This project is a simplified version of my friend’s idea where you can put in the URL to a blog post and it generates a podcast based on that. I’m going to be generating them in my voice so that I can turn this blog into a podcast.

    What We’re Building

    The system has three distinct stages:

    Content Extraction → Scrape and clean article text from any URL

    Script Generation → Use AI to reformat content for spoken delivery

    Voice Synthesis → Convert the script to ultra-realistic speech with Cartesia

    Each stage has a single, well-defined responsibility. This separation matters because it makes the system testable, debuggable, and extensible. Want to add multi-voice support? Just modify the voice synthesis stage. Need better content extraction? Swap out the scraper without touching anything else.

    The data flow looks like this:

    Bash
    URL  ContentFetcher  {title, content}  ContentProcessor  {script}  AudioGenerator  audio.wav

    Let’s build it.

    Setting Up The Project

    You’ll need API keys for:

    • Cartesia (get one here) – The star of the show
    • OpenAI (get one here) – For script generation
    • Firecrawl (get one here) – Optional but recommended for better content extraction

    Store these in a .env file:

    Python
    CARTESIA_API_KEY=your_key_here
    OPENAI_API_KEY=your_key_here
    FIRECRAWL_API_KEY=your_key_here  # optional

    And then install dependencies:

    Bash
    pip install cartesia openai python-dotenv requests beautifulsoup4 firecrawl-py

    Now let’s build the pipeline, starting with content extraction.

    Stage 1: Content Extraction

    The first challenge is getting clean article text from arbitrary URLs. This is harder than it sounds because every website structures content differently. Some use <article> tags, others use <div class="content">, and some wrap everything in JavaScript frameworks that require browser rendering.

    I use Firecrawl for all scraping needs. It’s an AI-powered scraper that intelligently identifies main content and handles all the other messy stuff out of the box.

    It’s a paid product so if you want a free alternative, BeautifulSoup works.

    I won’t go into how either of these works as I’ve covered them before. Our main implementation for our ContentFetcher that fetches and extracts content from the input URL is in content_fetcher.py:

    Python
    class ContentFetcher:
        def __init__(self):
            self.firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
            self.firecrawl_client = None
            if FIRECRAWL_AVAILABLE and self.firecrawl_api_key:
                self.firecrawl_client = FirecrawlApp(api_key=self.firecrawl_api_key)
                print("Using Firecrawl for enhanced content extraction")
        def fetch(self, url: str) -> Dict[str, str]:
            """Fetch content from URL with automatic fallback."""
            print(f"Fetching content from: {url}")
            # Try Firecrawl first if available
            if self.firecrawl_client:
                try:
                    return self._fetch_with_firecrawl(url)
                except Exception as e:
                    print(f"Firecrawl failed: {e}, falling back to basic scraping")

    Stage 2: Script Generation with OpenAI

    Now we have article text, but it’s not podcast-ready yet. Written content and spoken content are fundamentally different mediums:

    • Written: Can reference images (“As shown in Figure 1…”)
    • Spoken: Must describe everything verbally
    • Written: Readers can re-read complex sentences
    • Spoken: Listeners need shorter, clearer phrasing
    • Written: Acronyms like “API” are fine
    • Spoken: Need to be spelled out or expanded

    This is where AI comes in. Rather than manually rewriting articles, we can use Sonnet 4.5 or GPT-5 (although I’m using 4o in this because it’s cheaper) to automatically transform content into podcast-friendly scripts.

    Python
    class ContentProcessor:
        """Processes content into podcast format using AI."""
        def __init__(self, config: PodcastConfig):
            self.config = config
            self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        def process(self, title: str, content: str) -> dict:
            summary = self._generate_summary(title, content)
            # Format main content as podcast script
            main_script = self._format_for_podcast(title, content)
            # Build full script with intro/outro
            intro = INTRO_TEMPLATE.format(title=title, summary=summary)
            full_script = f"{intro}\n\n{main_script}\n\n{OUTRO_TEMPLATE}"
            return {
                'full_script': full_script,
                'word_count': len(full_script.split())
            }
        def _generate_summary(self, title: str, content: str) -> str:
            """Create engaging 2-3 sentence summary."""
            prompt = f"Create a 2-3 sentence summary of this article:\n\nTitle: {title}\n\n{content[:3000]}"
            response = self.client.chat.completions.create(
                model=self.config.ai_model,
                messages=[
                    {"role": "system", "content": "You create engaging podcast introductions."},
                    {"role": "user", "content": prompt}
                ],
                temperature=self.config.temperature,
                max_tokens=200
            )
            return response.choices[0].message.content.strip()
        def _format_for_podcast(self, title: str, content: str) -> str:
            """Format article as podcast script."""
            word_count = self.config.estimated_word_count
            prompt = CONTENT_FORMATTING_PROMPT.format(
                word_count=word_count,
                title=title,
                content=content
            )
            response = self.client.chat.completions.create(
                model=self.config.ai_model,
                messages=[
                    {"role": "system", "content": "You are an expert podcast script writer."},
                    {"role": "user", "content": prompt}
                ],
                temperature=self.config.temperature,
                max_tokens=word_count * 2
            )
            return response.choices[0].message.content.strip()

    Aside from the main script, we’re generating a summary that acts as our intro. Most of this is boilerplate OpenAI calls.

    The heavy lifting is done by the prompt. We’re asking OpenAI to convert the article into a script but also insert SSML (Speech Synthesis Markup Language) tags like [laughter], and pauses or breaks.

    I’ll explain more about this below. For now just use this sample prompt:

    Python
    CONTENT_FORMATTING_PROMPT = """
    You are a podcast script writer. Convert the following article into an engaging podcast script with natural emotional expression and pacing.
    
    Requirements:
    - Target length: approximately {word_count} words
    - Write in a conversational, engaging tone suitable for audio
    - Remove references to images, videos, or visual elements
    - Spell out acronyms on first use
    - Use natural speech patterns and transitions
    - Break complex ideas into digestible segments
    - Maintain the key insights and takeaways from the original content
    - Do not add meta-commentary about being a podcast
    - Write ONLY the words that should be spoken aloud
    - Use short sentences and natural paragraph breaks for pacing
    - Vary sentence length to create rhythm and emphasis
    
    SSML TAGS - Use these inline tags to enhance delivery and pacing (Cartesia TTS will interpret them):
    
    EMOTION TAGS - Add natural emotional expression at key moments:
    - [laughter] - For genuine humor or lighthearted moments
    - <emotion value="excited" /> - When discussing impressive achievements or breakthroughs
    - <emotion value="curious" /> - When posing intriguing questions or exploring unknowns
    - <emotion value="surprised" /> - For unexpected findings or revelations
    - <emotion value="contemplative" /> - During reflective or contemplative passages
    
    PAUSE/BREAK TAGS - Add dramatic pauses for emphasis:
    - <break time="0.5s"/> - Short pause (half second) for brief emphasis
    - <break time="1s"/> - Medium pause (one second) before important points
    - <break time="1.5s"/> - Longer pause for dramatic effect or topic transitions
    - Use pauses sparingly (1-3 per script) at natural transition points
    
    Cartesia also supports other SSML tags like <speed ratio="1.2"/> and  <volume ratio="0.8"/> to vary the tone for added engagement.
    
    Guidelines:
    - Use emotion tags sparingly (2-5 times per script) at natural inflection points
    - Use breaks for dramatic pauses before revealing key insights
    - Place them where a human speaker would naturally pause or change tone
    - They should feel organic, not forced
    - Example: "And then something unexpected happened <break time="0.5s"/> <emotion value="surprised" /> the results exceeded all predictions."
    - Example: "But here's the fascinating part <break time="1s"/> <emotion value="curious" /> what if we could do this at scale?"
    - Example: "After months of research, they discovered <break time="1s"/> a completely new approach."
    
    Article Title: {title}
    Article Content: {content}
    
    Generate only the podcast script below, ready to be read aloud:
    """

    Stage 3: Voice Synthesis with Cartesia

    We finally get to the fun part. Cartesia’s API is straightforward to use, but it offers some powerful features that aren’t immediately obvious from the documentation.

    First, let’s make a custom voice. Cartesia comes with plenty of voices but they also have the option to clone yours with a 10 second audio sample. And it’s quite good!

    Once we do that, we get back an ID which we pass through as a parameter (along with a number of other params) when we call the Cartesia API in audio_generator.py:

    Python
    with open(output_path, "wb") as audio_file:
        bytes_iter = self.client.tts.bytes(
            model_id="sonic-3",
            transcript=script,
            voice={
    			      "mode": "id",
    			      "id": #enter your custom voice ID here,
    	      },
            language=en,
            generation_config={
                "volume": 1.0  # Volume level (0.5 to 2.0),
                "speed": 0.9  # Speed multiplier (0.6 to 1.5),
                "emotion": "excited"
            },
            output_format={
                "container": CONTAINER,
                "sample_rate": SAMPLE_RATE,
                "encoding": ENCODING,
            },
        )
    
        for chunk in bytes_iter:
            audio_file.write(chunk)

    Model Selection: sonic-3 vs sonic-turbo

    Cartesia offers two models with different trade-offs:

    • sonic-3: 90ms latency, highest quality, most emotional range
    • sonic-turbo: 40ms latency, faster generation, still excellent quality

    For podcast generation, I use sonic-3 because emotional range matters more than latency.

    Voice and Generation Parameters

    We also pass in our custom voice ID if we have cloned our voice. Cartesia also comes with a number of other voices, each with their own characteristics. Try them out, see which ones you like, and enter those IDs instead.

    The more interesting parameters are the volume, speed, and emotion controls. What we’re passing through here are the voice defaults. In the config above I’m making the voice slightly slower than normal, and also giving it an “excited” emotion. Cartesia has dozens of different emotions that you can play with.

    But podcast hosts do not have a monotone. They vary the speed and emotion. They pause, they laugh, and more. That’s why we had our script generator introduce SSML tags directly in the script.

    Example script output:
    “And then something unexpected happened <break time=”0.5s”> [surprise] the results exceeded all predictions.”
    “But here’s the fascinating part <break time=”1s”> [curiosity] what if we could do this at scale?”

    Cartesia’s TTS engine automatically interprets these tags when generating audio. This creates podcast audio that sounds like a human narrator reacting to the material with natural pauses and emotional inflection, rather than just reading prepared text.

    And that’s how we get our engaging podcast host sound.

    Moment Of Truth

    And now we get to our moment of truth. Does it work? How does it sound?

    You’ll want to create a main.py that takes in a URL as an argument and then passes it through our system:

    Python
    try:
        # Generate podcast
        result = generate_podcast(args.url, config, args)
    
        # Print success summary
        print("\n" + "="*70)
        print("PODCAST GENERATION COMPLETE!")
        print("="*70)
        print(f"\nTitle: {result['title']}")
        print(f"Audio file: {result['audio_path']}")
        print(f"Script length: {result['word_count']} words")
    
        if args.save_script:
            script_path = os.path.join(config.output_dir, f"{result['output_name']}_script.txt")
            print(f"Script file: {script_path}")
    
        print(f"\nYour podcast is ready to share!")
        print()
    
        return 0
    
    except KeyboardInterrupt:
        print("\n\nOperation cancelled by user.")
        return 130

    You can then call this via the command line in your terminal and you’ll get a wav file output.

    I ran this through my recent blog post on Claude Skills and here’s what I got back:

    Not bad right? I think the initial voice sample I recorded to train the custom voice could have been better (clearer, more consistent). And there are some minor script issues that can be sorted out with a better prompt, or perhaps using a better model like GPT-5 or Sonnet 4.5.

    But for a POC this is quite good. And Cartesia works out to around 4c per minute which is a lot lower than Eleven Labs and other TTS models.

    What Else Can You Build

    I’m just scratching the surface of Cartesia’s offerings. They have a platform to build end-to-end voice agents that can be deployed in customer support, healthcare, finance, education, and more.

    Even with the use case I just showed you, you can build out different types of applications. One way to extend this, for example, is to go back to the original idea of taking in a topic, doing deep research and gathering a ton of content, and then turning all of that into a script and generating a two-person podcast.

    Some other TTS ideas:

    • Audiobook generation – Convert long-form content to audio
    • Accessibility tools – Make written content accessible to visually impaired users
    • Language learning – Generate pronunciation examples
    • Voice assistants – Create custom voice responses
    • Content localization – Generate audio in multiple languages (Cartesia supports 100+ languages)

    The three-stage pipeline (extract → process → synthesize) is a general-purpose pattern for text-to-speech automation.

    And if you need help building this, let me know!

  • Claude Skills Tutorial: Give your AI Superpowers

    Claude Skills Tutorial: Give your AI Superpowers

    In the Matrix, there’s a scene where Morpheus is loading training programs into Neo’s brain and he wakes up from it and says, “I know Kung Fu.”

    That’s basically what Claude skills are.

    They’re a set of instructions that teach Claude how to do a certain thing. You explain it once in a document, like a training manual, and hand that to Claude. The next time you ask Claude to do that thing, it reaches for this document, reads the instructions, and does the thing.

    You never need to explain yourself twice.

    In this article, I’ll go over everything Claude Skills related, how it works, where to use it, and even how to build one yourself.

    Got Skills?

    A Skill is essentially a self-contained “plugin” (also called an Agent Skill) packaged as a folder containing custom instructions, optional code scripts, and resource files that Claude can load when performing specialized tasks.

    In effect, a Skill teaches Claude how to handle a particular workflow or domain with expert proficiency, on demand. For example, Anthropic’s built-in Skills enable Claude to generate Excel spreadsheets with formulas, create formatted Word documents, build PowerPoint presentations, or fill PDF forms, all tasks that go beyond Claude’s base training.

    Skills essentially act as on-demand experts that Claude “calls upon” during a conversation when it recognizes that the user’s request matches the Skill’s domain. Crucially, Skills run in a sandboxed code execution environment for safety, meaning they operate within clearly defined boundaries and only perform actions you’ve allowed.

    Teach Me Sensei

    At minimum, a Skill is a folder containing a primary file named SKILL.md (along with any supplementary files or scripts). This primary file contains the Skill’s name and description.

    This is followed by a Markdown body containing the detailed instructions, examples, or workflow guidance for that Skill. The Skill folder can also include additional Markdown files (reference material, templates, examples, etc.) and code scripts (e.g. Python or JavaScript) that the Skill uses.

    The technical magic happens through something called “progressive disclosure” (which sounds like a therapy technique but is actually good context engineering).

    At startup, Claude scans every skill’s metadata for the name and description. So in context it knows that there’s a PDF skill that can extract text.

    When you’re chatting with Claude and you ask it to analyze a PDF document, it realizes it needs the PDF skill and reads the rest of the primary file. And if you uploaded any supplementary material, Claude decides which ones it needs and loads only that into context.

    So this way, a Skill can encapsulate a large amount of knowledge or code without overwhelming the context window. And if multiple Skills seem relevant, Claude can load and compose several Skills together in one session.

    Code Execution

    One powerful aspect of Skills is that they can include executable code as part of their toolkit. Within a Skill folder, you can provide scripts (Python, Node.js, Bash, etc.) that Claude may run to perform deterministic operations or heavy computation.

    For example, Anthropic’s PDF Skill comes with a Python script that can parse a PDF and extract form field data. When Claude uses that Skill to fill out a PDF, it will choose to execute the Python helper script (via the sandboxed code tool) rather than attempting to parse the PDF purely in-token.

    To maintain safety, Skills run in a restricted execution sandbox with no persistence between sessions.

    wait But WHy?

    If you’ve used Claude and Claude Code a lot, you may be thinking that you’ve already come across similar features. So let’s clear up the confusion, because Claude’s ecosystem is starting to look like the MCU. Lots of cool characters but not clear how they all fit together.

    Skills vs Projects

    In Claude, Projects are bounded workspaces where context accumulates. When you create a project, you can set project level instructions, like “always use the following brand guidelines”. You can also upload documents to the project.

    Now every time you start a new chat in that project, all those instructions and documents are loaded in for context. Over time Claude even remembers past conversations in that Project.

    So, yes, it does sound like Skills because within the scope of a Project you don’t need to repeat instructions.

    The main difference though is that Skills work everywhere. Create it once, and use it in any conversation, any project, or any chat. And with progressive disclosure, it only uses context when needed. You can also string multiple Skills together.

    In short, use Projects for broad behavior customization and persistent context, and use Skills for packaging repeatable workflows and know-how. Project instructions won’t involve coding or file management, whereas Skills require a bit of engineering to build and are much more powerful for automating work.

    Skills vs MCP

    If you’re not already familiar with Model Context Protocol, it’s just a way for Claude to connect with external data and APIs in a secure manner.

    So if you wanted Claude to be able to write to your WordPress blog, you can set up a WordPress MCP and now Claude can push content to it.

    Again, this might sound like a Skill but the difference here is that Skills are instructions that tell Claude how to do tasks, while MCP is what allows Claude to take the action. They’re complementary.

    You can even use them together, along with Projects!

    Let’s say you have a Project for writing blog content where you have guidelines on how to write. You start a chat with a new topic you want to write about and Claude writes it following your instructions.

    When the post is ready, you can use a skill to extract SEO metadata, as well as turn the content into tweets. Finally, use MCPs to push this content to your blog and various.

    Skills vs Slash Commands (Claude Code Only)

    If you’re a Claude Code user, you may have come across custom slash commands that allow you to define a certain process and then call that whenever you need.

    This is actually the closest existing Claude feature to a Skill. The main difference is that you, the user, triggers the custom slash command when you want it, and Skills can be called by Claude when it determines it needs that.

    Skills alos allow for more complexity whereas custom slash commands are for simpler tasks that you repeat often (like running a code review).

    Skills vs Subagents (Also Claude Code Only)

    Sub-agents in Claude Code refer to specialized AI agent instances that can be spawned to help the main Claude agent with specific sub-tasks. They have their own context window and operate independently.

    A sub-agent is essentially another AI persona/model instance running in parallel or on-demand, whereas a Skill is not a separate AI. It’s more like an add-on for the main Claude.

    So while a Skill can greatly expand what the single Claude instance can do, it doesn’t provide the parallel processing or context isolation benefits that sub-agents do.

    You already have skills

    It turns out you’ve been using Skills without realizing it. Anthropic built four core document skills:

    • DOCX: Word documents with tracked changes, comments, formatting preservation
    • PPTX: PowerPoint presentations with layouts, templates, charts
    • XLSX: Excel spreadsheets with formulas, data analysis, visualization
    • PDF: PDF creation, text extraction, form filling, document merging

    These skills contain highly optimized instructions, reference libraries, and code that runs outside Claude’s context window. They’re why Claude can now generate a 50-slide presentation without gasping for context tokens like it’s running a marathon.

    These are available to everyone automatically. You don’t need to enable them. Just ask Claude to create a document, and the relevant skill activates.

    Additionally, they’ve added a bunch of other skills and open-sourced them so you can see how they’re built and how it works. Just go to the Capabilities section in your Settings and toggle them on.

    How To Build Your Own Skill

    Of course the real value of skills comes from building your own, something that suits the work you do. Fortunately, it’s not too hard. There’s even a pre-built skill you may have noticed in the screen above that builds skills.

    But let’s walk through it manually so you understand what’s happening. On your computer, create a folder called team-report. Inside, create a file called SKILL.md:

    Python
    ---
    name: team-report #no capital letters allowed here.
    description: Creates standardized weekly team updates. Use when the user wants a team status report or weekly update.
    ---
    
    # Weekly Team Update Skill
    
    ## Instructions
    
    When creating a weekly team update, follow this structure:
    
    1. **Wins This Week**: 3-5 bullet points of accomplishments
    2. **Challenges**: 2-3 current blockers or concerns  
    3. **Next Week's Focus**: 3 key priorities
    4. **Requests**: What the team needs from others
    
    ## Tone
    - Professional but conversational
    - Specific with metrics where possible
    - Solution-oriented on challenges
    
    ## Example Output
    
    **Wins This Week:**
    - Shipped authentication refactor (reduced login time 40%)
    - Onboarded 2 new engineers successfully
    - Fixed 15 critical bugs from backlog
    
    **Challenges:**
    - Database migration taking longer than expected
    - Need clearer specs on project X
    
    **Next Week's Focus:**
    - Complete migration
    - Start project Y implementation  
    - Team planning for Q4
    
    **Requests:**
    - Design review for project Y by Wednesday
    - Budget approval for additional testing tools

    That’s it. That’s the skill. Zip it up and upload this to Claude (Settings > Capabilities > Upload Skill), and now Claude knows how to write your team updates.

    Leveling Up: Adding Scripts and Resources

    For more complex skills, you can add executable code. Let’s say you want a skill that validates data:

    Python
    data-validator-skill/
    ├── SKILL.md
    ├── schemas/
    │   └── customer-schema.json
    └── scripts/
        └── validate.py

    Your SKILL.md references the validation script. When Claude needs to validate data, it runs validate.py with the user’s data. The script executes outside the context window. Only the output (“Validation passed” or “3 errors found”) uses context.

    Best Practices

    1. Description is Everything

    Bad description: “Processes documents”

    Good description: “Extracts text and tables from PDF files. Use when working with PDF documents or when user mentions PDFs, forms, or document extraction.”

    Claude uses the description to decide when to invoke your skill. Be specific about what it does and when to use it.

    2. Show, Don’t Just Tell

    Include concrete examples in your skill. Show Claude what success looks like:

    Python
    ## Example Input
    "Create a Q3 business review presentation"
    
    ## Example Output
    A 15-slide PowerPoint with:
    - Executive summary (slides 1-2)
    - Key metrics dashboard (slide 3)
    - Performance by segment (slides 4-7)
    - Challenges and opportunities (slides 8-10)
    - Q4 roadmap (slides 11-13)
    - Appendix with detailed data (slides 14-15)

    3. Split When It Gets Unwieldy

    If your SKILL.md starts getting too long, split it:

    Python
    financial-modeling-skill/
    ├── SKILL.md              # Core instructions
    ├── DCF-MODELS.md         # Detailed DCF methodology  
    ├── VALIDATION-RULES.md   # Validation frameworks
    └── examples/
        └── sample-model.xlsx

    4. Test With Variations

    Don’t just test your skill once. Try:

    • Different phrasings of the same request
    • Edge cases
    • Combinations with other skills
    • Both explicit mentions and implicit triggers

    Security (do not ignore this)

    We’re going to see an explosion of AI gurus touting their Skill directory and asking you to comment “Skill” to get access.

    The problem is Skills can execute code, and if you don’t know what this code does, you may be in for a nasty surprise. A malicious skill could:

    • Execute harmful commands
    • Exfiltrate your data
    • Misuse file operations
    • Access sensitive information
    • Make unauthorized API calls (in environments with network access)

    Anthropic’s guidelines are clear: Only use skills from trusted sources. This means:

    1. You created it (and remember creating it)
    2. Anthropic created it (official skills)
    3. You thoroughly audited it (read every line, understand every script)

    So if you found it on GitHub or some influencer recommended it, stay away. At the very least, be skeptical and:

    • Read the entire SKILL.md file
    • Check all scripts for suspicious operations
    • Look for external URL fetches (big red flag)
    • Verify tool permissions requested
    • Check for unexpected network calls

    Treat skills like browser extensions or npm packages: convenient when trustworthy, catastrophic when compromised.

    Use Cases and Inspiration

    The best Skills are focused on solving a specific, repeatable task that you do in your daily life or work. This is different for everyone. So ask yourself: What do I want Claude to do better or automatically?

    I’ll give you a few examples from my work to inspire you.

    Meeting Notes and Proposals

    We all have our AI notetakers and they each give us summaries and transcripts that we don’t read. What matters to me is taking our conversation and extracting the client’s needs and requirements, and then turning that into a project proposal.

    Without Skills, I would have to upload the transcript to Claude and give it the same instructions every time to extract the biggest pain points, turn it into a proposal, and so on.

    With Skills, I can define that once, describing exactly how I want it, and upload that to Claude as my meeting analyzer skill. From now on, all I have to do is tell Claude to “analyze this meeting” and it uses the Skill to do it.

    Report Generator

    When I run AI audits for clients, I often hear people say that creating reports is very time consuming. Every week they have to gather data from a bunch of source sand then format it into a consistent report structure with graphs and summaries and so on.

    Now with Claude skills they can define that precisely, even adding scripts to generate graphs and presentation slides. All they have to do is dump the data into a chat and have it generate a report using the skill.

    Code Review

    If you’re a Claude Code user, building a custom code review skill might be worth your time. I had a custom slash command for code reviews but Skills offer a lot more customization with the ability to run scripts.

    Content Marketing

    I’ve alluded to this earlier in the post but there are plenty areas where I repeat instructions to Claude while co-creating content, and Skills allows me to abstract and automate that away.

    Practical Next Steps

    If you made it this far (seriously, thanks for reading 3,000 words about AI file management), here’s what to do:

    Immediate Actions:

    1. Enable Skills: Go to Settings > Capabilities > Skills
    2. Try Built-In Skills: Ask Claude to create a PowerPoint or Excel file
    3. Identify One Pattern: What do you ask Claude to do repeatedly?
    4. Create Your First Skill: Use the team report example as template
    5. Test and Iterate: Use it 5 times, refine based on results

    If you thought MCP was big, I think Skills have the potential to be bigger. If you need help with building more Skills, subscribe below and reach out to me.

  • Building a Competitor Intelligence Agent with Browserbase

    Building a Competitor Intelligence Agent with Browserbase

    In a previous post, I wrote about how I built a competitor monitoring system for a marketing team. We used Firecrawl to detect changes on competitor sites and blog content, and alert the marketing team with a custom report. That was the first phase of a larger project.

    The second phase was tracking the competitors’ ads and adding it to our report. The good folks at LinkedIn and Meta publish all the ads running on their platforms in a public directory. You simply enter the company name and it shows you all the ads they run. That’s the easy part.

    The tough part is automating visiting the ad libraries on a regular basis and looking for changes. Or, well, it would have been tough if I weren’t using Browserbase.

    In this tutorial, I’ll show you how I built this system, highlighting the features of Browserbase that saved me a lot of time. Whether you’re building a competitor monitoring agent, a web research tool, or any AI agent that needs to interact with real websites, the patterns and techniques here will apply.

    WHy Browserbase?

    Think of Browserbase as AWS Lambda, but for browsers. Instead of managing your own browser infrastructure with all the pain that entails, you get an API that spins up browser instances on demand, with features you need to build reliable web agents.

    Want to persist authentication across multiple scraping sessions? There’s a Contexts API for that. Need to debug why your scraper failed? Every session is automatically recorded and you can replay it like a DVR. Running into bot detection? Built-in stealth mode and residential proxies make your automation look human.

    For this project, I’m using Browserbase to handle all the browser orchestration while I focus on the actual intelligence layer: what to monitor, how to analyze it, and what insights to extract. This separation of concerns is what makes the system maintainable.

    What We’re Building: Architecture Overview

    This agent monitors competitor activity across multiple dimensions and generates actionable intelligence automatically.

    The system has five core components working together. First, there’s the browser orchestration layer using Browserbase, which handles session management, authentication, and stealth capabilities. This is the foundation that lets us reliably access ad platforms.

    Second, we have platform-specific scrapers for LinkedIn ads, Facebook ads, and landing pages. Each scraper knows how to navigate its platform, handle pagination, and extract structured data.

    Third, there’s a change detection system that tracks what we’ve seen before and identifies what’s new or different.

    Fourth, we have an analysis engine that processes the raw data to identify patterns, analyze creative themes, and detect visual changes using perceptual hashing.

    Finally, there’s an intelligence reporter that synthesizes everything and generates strategic insights using GPT-4.

    Each component is independent and can be improved or replaced without affecting the others. Want to add a new platform? Write a new scraper module. Want better AI insights? Swap out the analysis prompts. Want to store data differently? Replace the storage layer.

    Setting Up Your Environment

    First, you’ll need accounts for a few services. Sign up for Browserbase at browserbase.com and grab your API key and project ID from the dashboard. The free tier gives you enough sessions to build and test this system. If you want the AI insights feature, you’ll also need an OpenAI API key.

    Create a new project directory, set up a Python virtual environment, and install the key dependencies:

    Bash
    # Create and activate virtual environment
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
    # Install dependencies
    pip install browserbase playwright pillow imagehash openai python-dotenv requests
    playwright install chromium

    Create a .env file to store the keys you got from Browserbase.

    Bash
    # .env file
    BROWSERBASE_API_KEY=your-api-key-here
    BROWSERBASE_PROJECT_ID=your-project-id-here
    OPENAI_API_KEY=sk-your-key-here

    Building the Browser Manager: Your Gateway to Browserbase

    The browser manager is the foundation of everything. This class encapsulates all the Browserbase interaction and provides a clean interface for the rest of the system. It handles session lifecycle, connection management, and proper cleanup.

    Python
    class BrowserManager:
        def __init__(self, api_key: str, project_id: str, context_id: Optional[str] = None):
            self.api_key = api_key
            self.project_id = project_id
            self.context_id = context_id
            
            # Initialize the Browserbase SDK client
            # This handles all API communication with Browserbase
            self.bb = Browserbase(api_key=api_key)
            
            # These will hold our active resources
            # We track them as instance variables so we can clean up properly
            self.session = None
            self.playwright = None
            self.browser = None
            self.context = None
            self.page = None
    
        
    
        def connect_browser(self):
            """
            Connect to the Browserbase session using Playwright.
            
            This is where the magic happens - we're connecting to a real Chrome
            browser running in Browserbase's infrastructure. From here on, it's
            just standard Playwright code, but with all of Browserbase's superpowers.
            """
            if not self.session:
                raise RuntimeError("No session created. Call create_session() first.")
            
            
            # Initialize Playwright
            self.playwright = sync_playwright().start()
            
            # Connect to the remote browser using the session's connect URL
            # This is CDP (Chrome DevTools Protocol) under the hood
            self.browser = self.playwright.chromium.connect_over_cdp(
                self.session.connectUrl
            )
            
            # Get the default context and page that Browserbase created
            # Note: If you specified a context_id, this context will have your
            # saved authentication state automatically loaded
            self.context = self.browser.contexts[0]
            self.page = self.context.pages[0]
                    
            return self.page

    Let’s write a function to create a new Browserbase session with custom configuration.

    We’ll enable stealth to make our agent look like a real human and not trip up the bot detectors. And we’ll set up a US proxy.

    You can also set session timeouts, or keep sessions alive even if your code crashes (though we aren’t doing that here).

    Python
    def create_session(self, 
                       timeout: int = 300,
                       enable_stealth: bool = True,
                       enable_proxy: bool = True,
                       proxy_country: str = "us",
                       keep_alive: bool = False) -> Dict[str, Any]:
            
        session_config = {
            "projectId": self.project_id,
            "browserSettings": {
              "stealth": enable_stealth,
              "proxy": {
                "enabled": enable_proxy,
                "country": proxy_country
              } if enable_proxy else None
            },
           
            "timeout": timeout,
            "keepAlive": keep_alive
        }
            
            # If we have a context ID, include it to reuse authentication state
            # This is the secret sauce for avoiding repeated logins
        if self.context_id:
            session_config["contextId"] = self.context_id
            
        self.session = self.bb.sessions.create(**session_config)
        session_id = self.session.id
        connect_url = self.session.connectUrl
        replay_url = f"https://www.browserbase.com/sessions/{session_id}"
            
        return {
            "session_id": session_id,
            "connect_url": connect_url,
            "replay_url": replay_url
        }

    You’ll notice we get back a replay URL. This is where we can actually watch the browser sessions and debug what went wrong.

    Next, we connect to our browser session using Playwright, an open-source browser automation library by Microsoft.

    Python
    def connect_browser(self):
        self.playwright = sync_playwright().start()
        self.browser = self.playwright.chromium.connect_over_cdp(
            self.session.connectUrl
        )
            
        # Get the default context and page that Browserbase created
        self.context = self.browser.contexts[0]
        self.page = self.context.pages[0]
    
        return self.page

    Finally, we want to clean up all resources and close our browser sessions:

    Python
    if self.page:
        self.page.close()
    if self.context:
        self.context.close()
    if self.browser:
        self.browser.close()
    if self.playwright:
        self.playwright.stop()

    So basically you create a session with specific settings, then connect to it, do some work, disconnect, and connect again later.

    The configuration parameters I exposed are the ones I found most useful in production. Stealth mode is almost always on because modern platforms are too good at detecting automation. Proxy support is optional but recommended for platforms that rate-limit by IP.

    Creating and Managing Browserbase Contexts

    Before we build the scrapers, I want to show you one of Browserbase’s most powerful features: Contexts.

    A Context in Browserbase is like a reusable browser profile. It stores cookies, localStorage, session storage, and other browser state.

    You can create a context once with all your authentication, then reuse it across multiple browser sessions. This means you log into LinkedIn once, save that authenticated state to a context, and every future session can reuse those credentials without going through the login flow again.

    We don’t actually need this feature for scraping LinkedIn Ads Library because it’s public, but if you want to scrape another ad library that requires a login, it’s very useful. Here’s a sample function that handles the one-time authentication flow for a platform and saves the resulting authenticated state to a reusable context.

    Python
    def create_authenticated_context(api_key: str, project_id: str, 
                                     platform: str, credentials: Dict[str, str]) -> str:
    
        # Create a new context
        bb = Browserbase(api_key=api_key)
        context = bb.contexts.create(projectId=project_id)
        context_id = context.id
    
        # Create a session using this context
        # Any cookies or state we save will be persisted to the context
        with BrowserManager(api_key, project_id, context_id=context_id) as mgr:
            session_info = mgr.create_session(timeout=300)
            page = mgr.connect_browser()
            if platform == "linkedin":
                page.goto("https://www.linkedin.com/login", wait_until="networkidle")
                page.fill('input[name="session_key"]', credentials['email'])
                page.fill('input[name="session_password"]', credentials['password'])
                page.click('button[type="submit"]')
                page.wait_for_url("https://www.linkedin.com/feed/", timeout=30000)
                            
            elif platform == "facebook":
                # Similar flow for Facebook
      
        return context_id

    Authentication state is saved to the context ID which you can then reuse to avoid future logins.

    Building Platform-Specific Scrapers

    Now we get to the interesting part: actually scraping data from ad platforms. I’m only going to show you the LinkedIn ad scraper because it demonstrates several important patterns and the concepts are the same across all platforms.

    It’s really just one function that takes a Browserbase page object and returns structured data. This separation means the browser management is completely isolated from the scraping logic, which makes everything more testable and maintainable.

    First we navigate to the ad library and wait until the network is idle as it loads data dynamically. We then fill the company name into the search box, add a small delay to mimic human behaviour, then press enter.

    Python
    def scrape_linkedin_ads(page: Page, company_name: str, max_ads: int = 20) -> List[Dict[str, Any]]:
        ad_library_url = "https://www.linkedin.com/ad-library"
        page.goto(ad_library_url, wait_until="networkidle")
    
        search_box = page.locator('input[aria-label*="Search"]')
        search_box.fill(company_name)
        time.sleep(1)  # Human-like pause
        search_box.press("Enter")
        
        # Wait for results to load
        # LinkedIn's ad library is a SPA that loads content dynamically
        time.sleep(3)
        
        ads_data = []
        scroll_attempts = 0
        max_scroll_attempts = 10

    The LinkedIn ads library is a SPA that loads content dynamically so we wait for it to load before we start our scraping.

    We’re going to implement infinite scroll to load more ads. First we find ad cards currently visible, and use multiple selectors in case LinkedIn changes their markup.

    Python
    while len(ads_data) < max_ads and scroll_attempts < max_scroll_attempts:
            ad_cards = page.locator('[data-test-id*="ad-card"], .ad-library-card, [class*="AdCard"]').all()
                    
            for card in ad_cards:
                if len(ads_data) >= max_ads:
                    break
                    
                try:
                    ad_data = {
                        "platform": "linkedin",
                        "company": company_name,
                        "scraped_at": time.strftime("%Y-%m-%d %H:%M:%S")
                    }
                    
                    try:
                        headline = card.locator('h3, [class*="headline"], [data-test-id*="title"]').first
                        ad_data["headline"] = headline.inner_text(timeout=2000)
                    except:
                        ad_data["headline"] = None
                    
                    # Extract body text/description
                    try:
                        body = card.locator('[class*="description"], [class*="body"], p').first
                        ad_data["body"] = body.inner_text(timeout=2000)
                    except:
                        ad_data["body"] = None
                    
                    # Extract CTA button text if present
                    try:
                        cta = card.locator('button, a[class*="cta"], [class*="button"]').first
                        ad_data["cta_text"] = cta.inner_text(timeout=2000)
                    except:
                        ad_data["cta_text"] = None
                    
                    # Extract image URL if available
                    try:
                        img = card.locator('img').first
                        # Scroll image into view to trigger lazy loading
                        img.scroll_into_view_if_needed()
                        time.sleep(0.5)  # Give it time to load
                        ad_data["image_url"] = img.get_attribute('src')
                    except:
                        ad_data["image_url"] = None
                    
                    # Extract landing page URL
                    try:
                        link = card.locator('a[href*="http"]').first
                        ad_data["landing_url"] = link.get_attribute('href')
                    except:
                        ad_data["landing_url"] = None
                    
                    # Extract any visible metadata (dates, impressions, etc)
                    try:
                        metadata = card.locator('[class*="metadata"], [class*="stats"]').all_inner_texts()
                        ad_data["metadata"] = metadata
                    except:
                        ad_data["metadata"] = []
                    
                    # Only add the ad if we extracted meaningful data
                    if ad_data.get("headline") or ad_data.get("body"):
                        ads_data.append(ad_data)
                      
                except Exception as e:
                    print(f"Error extracting ad card: {e}")
                    continue
            
            # Scroll to load more ads
            page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            time.sleep(2)  # Wait for new content to load
            
            scroll_attempts += 1
        
        return ads_data

    I’m limiting scroll attempts to prevent infinite loops on platforms that don’t load additional content.

    I’m also adding small delays that mimic human behavior. The time dot sleep calls between actions aren’t strictly necessary for functionality, but they make the automation look more natural to bot detection systems. Real humans don’t type instantly and don’t scroll at superhuman speeds.

    You can repeat these patterns yourself to scrape other ad libraries, landing pages and so on.

    Building the Change Tracking Database

    Now we need persistence to track what we’ve seen before and identify what’s new. We’ll create a SQLite database with two main tables: one for ad snapshots, and one for tracking detected changes. Each table has the fields we need for analysis, plus a snapshot date so we can track things over time.

    I’m not going to share the code here because it’s just a bunch of SQL commands to set up the tables, like this:

    SQL
    CREATE TABLE IF NOT EXISTS ads (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        competitor_id TEXT NOT NULL,
        platform TEXT NOT NULL,
        ad_identifier TEXT,
        headline TEXT,
        body TEXT,
        cta_text TEXT,
        image_url TEXT,
        landing_url TEXT,
        metadata TEXT,
        snapshot_date DATETIME NOT NULL,
        UNIQUE(competitor_id, platform, ad_identifier, snapshot_date)
    )

    For every ad we scrape, we simply store it in the table. We also give each ad a unique identifier. Normally I would suggest hashing the data so that any change in a word or pixel gives us a new identifier. But a basic implementation can be something like

    SQL
    ad_identifier = f"{ad.get('headline', '')}:{ad.get('body', '')}"[:200]

    So if the headline or body changes, it is a new ad. We can then do something like:

    Python
    for ad in current_ads:
        ad_identifier = f"{ad.get('headline', '')}:{ad.get('body', '')}"[:200]
        cursor.execute("""
            SELECT COUNT(*) FROM ads
            WHERE competitor_id = ? AND platform = ? AND ad_identifier = ?
                """, (competitor_id, platform, ad_identifier))
                
        count = cursor.fetchone()[0]
                
        if count == 0:
            new_ads.append(ad)
               
        if new_ads:
            self._log_change(
                competitor_id=competitor_id,
                change_type="new_ads",
                platform=platform,
                change_description=f"Detected {len(new_ads)} new ads on {platform}",
                severity="high" if len(new_ads) > 5 else "medium",
                data={"ad_count": len(new_ads), "headlines": [ad.get('headline') for ad in new_ads[:5]]}
            )
            
        return new_ads

    The log change function stores it in our changes table, which we then use to generate a report.

    Generating AI-Powered Intelligence Reports

    Now we take all this raw data and turning it into actionable insights using AI. Most of this is just prompt engineering. We pass in all the data we collected and the changes we’ve detected and ask GPT-5 to analyze it and generate a report:

    Python
    prompt = f"""Generate an executive summary of competitive intelligence findings.
    
    High Priority Changes ({len(high_severity)}):
    {json.dumps([{k: v for k, v in c.items() if k in ['competitor_id', 'change_type', 'change_description']} for c in high_severity[:10]], indent=2)}
    
    Medium Priority Changes ({len(medium_severity)}):
    {json.dumps([{k: v for k, v in c.items() if k in ['competitor_id', 'change_type', 'change_description']} for c in medium_severity[:10]], indent=2)}
    
    Please provide:
    
    1. **TL;DR**: A two to three sentence summary of the most important findings
    2. **Key Threats**: Competitive moves we should be concerned about and why
    3. **Opportunities**: Gaps or weaknesses we could exploit to gain advantage
    4. **Recommended Actions**: Top three strategic priorities based on this intelligence
    
    Keep it concise and focused on actionable insights. Format in markdown."""

    Running Our System

    And that’s our competitive analysis system! You can write a main.py file that coordinates all the components we’ve built into a cohesive workflow.

    I’ve only shown you how to scrape the LinkedIn ads library but you can use similar code to do it for other platforms.

    If anything goes wrong, the Session Replays are your friends. This is where you can see our system navigate different pages, and what the DOM looks like.

    So, for example, if you’re trying to click on an element and there’s an error, you can check the session replay and see that the element didn’t load. Then you try to add a delay to let it load, and run it again.

    Browserbase also has a playground where you can iterate rapidly and run browser sessions before you figure out what works.

    Next Steps

    As I mentioned, this is part of a larger project for my client. There are so many directions you could take this.

    You could add more platforms like Twitter ads, or Google Display Network, each platform is just another scraper function using the same browser management infrastructure. You could implement trend analysis that tracks how competitor strategies evolve over months. You could create a dashboard for visualizing the intelligence using something like Streamlit.

    More importantly, these same patterns work for any AI agent that needs to interact with the web. With Browserbase, you can build:

    • Research assistants that gather information from multiple sources and synthesize it into reports.
    • Data collection agents that extract structured data from websites at scale for analysis.
    • Workflow automation that bridges systems without APIs by mimicking human browser interactions.

    If you need help, reach out to me!

  • Factory.ai: A Guide To Building A Software Development Droid Army

    Factory.ai: A Guide To Building A Software Development Droid Army

    Last week, Factory gave us a masterclass in how to launch a product in a crowded space. While every major AI company and their aunt already has a CLI coding agent, all I kept hearing about was Factory and their Droid agents.

    So, it is just another CLI coding agent or is there some sauce to the hype? In this article, I’m going to do a deep dive into how to set up Factory, build (or fix) apps with it, and all the features that make it stand out in this crowded space.

    Quick note – I’ve previously written about Claude Code and Amp, which have been my two coding agents of choice, so I’ll naturally make comparisons to them or reference some of their features in this as contrast. I’ve also written about patterns to use when coding with AI, which is model/agent/provider agnostic, so I won’t be covering them again in this post.

    Let’s dive in.

    Are These The Droids You’re Looking For?

    Fun fact, Factory incorporated as The San Francisco Droid Company but were forced to change their name because LucasFilm took offence. But yes, it’s a Star Wars reference and they kept the droids, so you’ll be seeing more Star Wars references through this post. Don’t say I didn’t warn you.

    The Droids seem to be one of the main differentiators. The core philosophy here is that software development is more than just coding and code gen. There are a bunch of tasks that many software engineers don’t particularly enjoy doing. In Factory, you just hand it off to a Droid that specializes in that task.

    They’re really just specialized agents. You can set your own up in Claude Code and Amp, but in Factory they come pre-built with optimized system prompts, specialized tools, and an appropriate model.

    Code Droid: Your main engineering Droid. Handles feature development, refactoring, bug fixes, and implementation work. This is the Droid you’ll interact with most for actual coding tasks.

    Knowledge Droid: Research and documentation specialist. Searches your codebase, docs, and the entire internet to answer complex questions. Writes specs, generates documentation, and helps you understand legacy systems.

    Reliability Droid: Your on-call specialist. Triages production alerts, performs root cause analysis, troubleshoots incidents, and documents the resolution. Saves your sleep schedule.

    Product Droid: Ticket and PM work automation. Manages your backlog, prioritizes tickets, handles assignment, and transforms rambling Slack threads into coherent product specs.

    Tutorial Droid: Helps you learn Factory itself. Think of it as your onboarding assistant.

    Installing the CLI: Getting Your Droid Army Ready

    Factory has a web interface and an IDE extension, but I’m going to focus on the CLI as it’s what most developers use these days. It’s pretty easy to install:

    Bash
    # Install droid
    curl -fsSL https://app.factory.ai/cli | sh
    
    # Navigate to your project
    cd your-project
    
    # Start your development session
    droid

    On first launch, you’ll see Droid’s welcome screen in a full-screen terminal interface. If prompted, sign in via your browser to authenticate. You start off with a bunch of free tokens, so you can use it right away.

    If you’ve used Claude Code, Amp, or any other coding CLI, you’ll find the interface familiar. In fact, it has the same “multiple modes” feature as Claude Code where you can cycle through default, automatic, and planning using shift-tab.

    If you’re in a project with existing code, start by asking droid to explain it to you. It will read your codebase and respond with insights about your project structure, test frameworks, conventions, and how everything connects.

    Specification Mode: Planning Before Building

    Now switch to Spec mode by hitting Shift-Tab and explain what you want it to do.

    Bash
    > Add a feature for users to export their personal data as JSON.
    > Include proper error handling and rate limiting to prevent abuse.
    > Follow our existing patterns for API endpoints.

    Droid generates a complete specification that includes:

    • Acceptance Criteria: What “done” looks like
    • Implementation Plan: Step-by-step approach
    • Technical Details: Libraries, patterns, security considerations
    • File Changes: Which files will be created/modified
    • Testing Strategy: What tests need to be written

    Build Mode

    You review the spec. If something’s wrong or missing, you can hit Escape and correct it. Once you’re satisfied, you have multiple options. You can accept the spec and let it run on default mode where it asks for permissions for every change. Or you can process with one of 3 levels of autonomy:

    • Proceed, manual approval (Low): Allow file edits but approve every other change
    • Proceed, allow safe commands (Medium): Droid handles reversible changes automatically, asks for risky ones
    • Proceed, allow all commands (High): Full autonomy, Droid handles everything

    Start with low autonomy and as you build trust with the tool, work your way up. Follow my patterns to ensure that if anything goes wrong, it can always be saved.

    Spec Files Are Saved

    One really interesting feature is that Droid saves approved specs as markdown files in .factory/docs/. You can toggle this on or off and specify the save directory in the settings (using the /settings command). This means:

    • You have documentation of decisions
    • New team members can understand why things were built certain ways
    • Future Droid sessions can reference these decisions

    When using Claude Code I often ask it to save the plan as a markdown, so I love that this is an automatic feature in Factory.

    Roger, Roger: Context For Your Droids

    Another differentiating feature of Factory is the way it manages context. I’ve written about this before in how to build your own coding agent, but giving your agent the right context is what makes or breaks its performance.

    Think about it, all these agents use the same underlying models, right? So why does one perform better? It’s the way they handle context. And Factory has multiple layers to it.

    Layer 1: The AGENTS.md File

    The primary context file is Agents.md, a standard file that tells AI agents how to work with your project. If you’re coming from Claude Code, it’s basically the same as the Claude.md file. It gets ingested at the start of every conversation.

    Your codebase has conventions that aren’t in the code itself, like how to run tests, code style preferences, security requirement, PR guidelines, and build/deployment processes. AGENTS.md documents these for Droids (and other AI coding tools). It’s something you should set up for every project at the start.

    If you have a Claude.md file already, just duplicate it and rename it to Agents.md. Or you can ask Droid to write one for you. It should look something like this:

    Markdown
    # MyProject
    
    Brief overview of what this project does.
    
    ## Build & Commands
    
    - Install dependencies: `pnpm install`
    - Start dev server: `pnpm dev`
    - Run tests: `pnpm test --run`
    - Run single test: `pnpm test --run <path>.test.ts`
    - Type-check: `pnpm check`
    - Auto-fix style: `pnpm check:fix`
    - Build for production: `pnpm build`
    
    ## Project Layout
    
    ├─ client/      → React + Vite frontend
    ├─ server/      → Express backend
    ├─ shared/      → Shared utilities
    └─ tests/       → Integration tests
    
    - Frontend code ONLY in `client/`
    - Backend code ONLY in `server/`
    - Shared code in `shared/`
    
    ## Development Patterns
    
    **Code Style**:
    - TypeScript strict mode
    - Single quotes, trailing commas, no semicolons
    - 100-character line limit
    - Use functional patterns where possible
    - Avoid `@ts-ignore` - fix the type issue instead
    
    **Testing**:
    - Write tests FIRST for bug fixes
    - Visual diff loop for UI changes
    - Integration tests for API endpoints
    - Unit tests for business logic
    
    **Never**:
    - Never force-push `main` branch
    - Never commit API keys or secrets
    - Never introduce new dependencies without team discussion
    - Never skip running `pnpm check` before committing
    
    ## Git Workflow
    
    1. Branch from `main` with descriptive name: `feature/<slug>` or `bugfix/<slug>`
    2. Run `pnpm check` locally before committing
    3. Force-push allowed ONLY on feature branches using `git push --force-with-lease`
    4. PR title format: `[Component] Description`
    5. PR must include:
       - Description of changes
       - Testing performed
       - Screenshots for UI changes
    
    ## Security
    
    - All API endpoints must validate input
    - Use parameterized queries for database operations
    - Never log sensitive data
    - API keys and secrets in environment variables only
    - Rate limiting on all public endpoints
    
    ## Performance
    
    - Images must be optimized before committing
    - Frontend bundles should stay under 500KB
    - API endpoints should respond in under 200ms
    - Use lazy loading for routes
    
    ## Common Commands
    
    **Reset database**:
    ```bash
    pnpm db:reset

    You can also set up multiple Agents.md files to manage context better:

    /AGENTS.md ← Repository-level conventions
    /packages/api/AGENTS.md ← API-specific conventions
    /packages/web/AGENTS.md ← Frontend-specific conventions

    Layer 2: Dynamic Code Context

    When you submit a query, Droid’s first move is usually searching the most relevant files without manually specifying them. You can of course @ mention files but it’s best to let it figure it out on its own and help it when needed.

    Since it already has an understanding of your repository from the Agents.md file, it knows where to go looking. It picks out the right sections of code, makes sure it isn’t duplicating context, and also lazy loads context (only pulls in context when necessary).

    Factory also captures build outputs, test results, and so on as you execute commands to add to the context.

    Layer 3: Tool Integrations

    One big friction point in development is dealing with context scattered across code, docs, tickets, etc.

    When you go through the sign up process in the Factory web app, the first thing it will prompt you to do is integrate your development tools, so the Droids have the context they need.

    The most essential integration is your source code repository. You can connect Factory to your GitHub or GitLab account (cloud or self-hosted) so it can access your codebase. This is required because the Droids need to read and write code on your projects.

    But the real differentiator is the integrations to other tools where context lives:

    Observability & Logs (Sentry, Datadog):

    • Error traces from production
    • Performance metrics
    • Incident history
    • Stack traces

    Documentation (Notion, Google Docs):

    • Architecture decision records (ADRs)
    • Design documents
    • Onboarding guides
    • API specifications

    Project Management (Jira, Linear):

    • Ticket descriptions and requirements
    • Acceptance criteria
    • Related issues and dependencies
    • Discussion threads

    Communication (Slack):

    • Technical discussions
    • Decisions made in channels
    • Problem-solving threads
    • Team conventions established in chat

    Version Control (GitHub, GitLab):

    • Branch strategies
    • Commit history and messages
    • Pull request discussions
    • Code review feedback

    If you connect these tools, your Droid can understand your entire project. It can see your code, read design docs, check Jira tickets, review logs from Sentry, and more, all to give you better help.

    Layer 4: Organizational Memory

    Factory maintains two types of persistent memory that survives across sessions:

    User Memory (Private to you):

    • Your development environment setup (OS, containers, tools)
    • Your work history (repos you’ve edited, features you’ve built)
    • Your preferences (diff view style, explanation depth, testing approach)
    • Your common patterns (how you structure code, naming conventions you prefer)

    Organization Memory (Shared across team):

    • Company-wide style guides and conventions
    • Security requirements and compliance rules
    • Architecture patterns and anti-patterns
    • Onboarding procedures

    How Memory Works:

    As you interact with Droids, Factory quietly records stable facts. If you say “Remember that our staging environment is at staging.company.com”, Factory saves this. Next session, Droid already knows.

    If your teammate says “Always use snake_case for API endpoints”, that goes into Org Memory. Now every developer’s Droid follows this convention automatically.

    Context In Action

    Let’s say you implementing a new feature and need to follow the architecture defined in a design doc.

    Bash
    > Implement the notification system described in this Notion doc:
    > https://notion.so/team/notification-system-architecture

    Behind the Scenes:

    1. Droid fetches Notion document content
    2. Parses architecture decisions and requirements
    3. Search finds existing notification patterns
    4. Org Memory recalls team’s event-driven architecture conventions
    5. Agents.md shows where notification code should live

    Droid implements according to:

    • Architecture specified in the doc
    • Existing patterns in your codebase
    • Team conventions from Org Memory
    • Your project structure

    Customizing Factory

    Factory.ai becomes even more powerful when you hook it into the broader ecosystem of tools and services your project uses. We’ve already discussed integrations like source control, project trackers, and knowledge bases for providing context.

    Here we’ll focus on tips for integrating external APIs or data sources into your Factory workflows, and using custom AI models or agents.

    Connecting APIs & External Data

    Suppose your project needs data from a third-party API (e.g., a weather service or your company’s internal API). While building your project, you can certainly have the AI write code to call those APIs (it’s quite good at using SDKs or HTTP requests if you provide the API docs).

    Another approach is using the web access tool if enabled: Factory’s Droids can have a web browsing tool to fetch content from URLs. You could give the AI a link to API documentation or an external knowledge source and it can then fetch and read it to inform its actions (with your permission).

    Always ensure you’re not exposing sensitive credentials in the chat. Use environment variables for any secrets.

    Using Slack and Chats

    Factory integrates with communication platforms like Slack , which means you can interact with your Droids through chat channels.

    For instance, you can mention it with questions or commands. Type “@factory summarize the changes in release 1.2” and the AI will respond in thread with answers or code suggestions.

    Ask it to fix an error“@factory help debug this error: <paste error log>” and it will go off and do it on its own.

    Customizing and Extending Agents

    You can also create Custom Droids (essentially custom sub-agents), much like you do in Claude Code. For example, you could create a “Security Auditor” droid that has a system prompt instructing it to only focus on security concerns, with tools set to read-only mode.

    You define these in .factory/droids/ as markdown files with some YAML frontmatter (name, which model to use, which tools it’s allowed, etc.) and instructions. Once enabled, your main Droid (the primary assistant) can delegate tasks to these sub-droids.

    Custom Slash Commands

    In a similar vein, you can create your own slash commands to automate routine actions or prompts. For example, you might have a /run-test command that triggers a shell script to run your test suite and returns results to the chat. The AI could then monitor those logs and alert if something looks wrong.

    Factory allows you to define these commands either as static markdown templates (the content gets injected into the conversation) or as executable scripts that actually run in your environment.

    Bring Your Own Model Key

    While Factory comes with all the latest coding models (which you can select using /model), you can also use your own key. The benefit is you still get Factory’s orchestration, memory, and interface, but with the model of your choice. You would pay your own API costs but get to use Factory for free.

    Droid Exec

    Droid Exec is Factory’s headless CLI mode: instead of an interactive chat, you run a single, non-interactive command that does the work and exits. It’s built for automation like CI pipelines, cron jobs, pre-commit hooks, and one-off batch scripts.

    So you can say something like:

    Bash
    droid exec --auto high "run tests, commit all changes, and push to main"

    And just walk away. Your droid will follow your commands and complete the task on its own.

    There’s Three Of Us and One Of Him

    As I mentioned earlier, Factory also has a web app and an IDE integration.

    The web application provides an interactive chat-based environment for your AI development assistant. On your first login, you’ll typically see a default session with Code Droid selected (the agent specialized in coding) and an empty workspace ready to connect to your code.

    You can connect directly to a remote repository on GitHub or to your local repository via the Factory Bridge app. And once you do that, you can run Factory as a cloud agent!

    The UI here is pretty much a chat interface, so you’d use it just like the terminal. You still have @ commands to select certain files or even a Google doc or Linear ticket.

    You can also upload files directly into the chat if you want the AI to consider some code, data, and even screenshots not already in the repository.

    Sessions and Collaboration

    Each chat corresponds to a session, which can be project-specific. Factory is designed for teams, so sessions can potentially be shared or revisited by your team members (for example, an ongoing “incident response” session in Slack, or a brainstorming session for a design doc).

    In the web app, you can create multiple sessions (e.g., one per feature or task) and switch between them. You can also see any sessions you started from the CLI. Useful if you want to catch up on a previous session or share with a teammate.

    Guess I’m The Commander Now

    Factory has actually been around for a couple of years, but they’ve been focused mostly on enterprise deployments. This is obvious from its team features and integrations.

    With the recent launch, it looks like they’re trying to enter the broader market, and their message seems to be that they’re a platform to deploy agents not just for code generation, but across the software development lifecycle and the tools your company uses to build and mange products.

    So if you’re a solo developer, you probably won’t notice much of a difference switching from Claude Code or Codex, aside from how the agent works in your terminal or IDE.

    But if you’re part of a larger engineering team with an existing codebase, Factory is a much different experience, especially if you plug in all your tools and set up automations where your droids can run in the background and get tasks done.

    And at that point, you can focus on the big picture while the droid army executes your vision.

    Kinda like a commander.

  • Automating Competitor Research with Firecrawl: A Comprehensive Tutorial

    Automating Competitor Research with Firecrawl: A Comprehensive Tutorial

    I recently worked with a company to help their marketing team set up a custom competitive intelligence system. They’re in a hyper-competitive space and with new AI products sprouting up in their industry every day, the list of companies they keep tabs on is multiplying.

    While the overall project is part of a larger build to eventually generate sales enablement content, BI dashboards, and competitive landing pages, I figured I’d share how I built the core piece here.

    In this deep-dive tutorial, I’ll show you how to build an automated competitor monitoring system using Firecrawl that not only tracks changes but provides actionable intelligence, with just basic Python code.

    Why Firecrawl?

    You can absolutely build your own web scraping tool. There are some packages like Beautiful Soup that make it easier. But it’s just annoying. You have to parse complex HTML and handle JS rendering. Your selectors break. You fight anti-bot measures.

    And that doesn’t even count the cleaning and structuring of extracted data. Basically, you spend more time maintaining your scraping infrastructure than actually analyzing competitive data.

    Firecrawl flips this equation. Instead of battling technical complexity, you describe what you want in plain English. Firecrawl’s AI understands context, handles the technical heavy lifting, and returns clean, structured data.

    Out of the box, it provides:

    • Automatic JavaScript rendering: No need for Selenium or Puppeteer
    • AI-powered extraction: Describe what you want in natural language
    • Clean markdown output: No HTML parsing needed
    • Built-in rate limiting: Respectful scraping by default
    • Structured data extraction: Get JSON data with defined schemas

    Think of Firecrawl as having a smart assistant who visits websites for you, understands what’s important, and returns exactly the data you need.

    The Solution Architecture

    The system has four core components working together.

    • The Data Extractor acts like a research librarian, systematically gathering information from target sources and organizing it consistently.
    • The Change Detector functions like an analyst, comparing new information against historical data to identify what’s different and why it matters.
    • The Report Generator serves as a communications specialist, transforming technical changes into business insights that inform decision-making.
    • The Storage Layer works like an institutional memory, maintaining historical context that enables trend analysis and pattern recognition.

    We’re just going to build this as a one-directional pre-defined process but if you wanted to make this agentic, each of these components would become a sub-agent

    For this tutorial, we’ll monitor Firecrawl’s own website as our “competitor.” This gives us a real, working example that you can run immediately while learning the concepts. The techniques transfer directly to monitoring actual competitors.

    Prerequisites and Setup

    Before we start coding, let’s ensure you have everything needed:

    Markdown
    # Check Python version (need 3.9+)
    python --version
    
    # Create project directory
    mkdir competitor-research
    cd competitor-research
    
    # Create virtual environment (recommended)
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
    # Install dependencies
    pip install firecrawl-py python-dotenv deepdiff

    Understanding Our Dependencies

    Each dependency serves a specific purpose in our intelligence pipeline.

    • firecrawl-py provides the official Python SDK for Firecrawl’s API, abstracting away the complexity of web scraping and data extraction.
    • python-dotenv manages environment variables securely, ensuring API keys never end up in your codebase.
    • deepdiff offers intelligent comparison of complex data structures, understanding that changing the order of items in a list might not be meaningful while changing their content definitely is.

    Create a .env file for your API key:

    Markdown
    FIRECRAWL_API_KEY=fc-your-api-key-here

    Get your free API key at firecrawl.dev. The free tier provides 500 pages per month, which is plenty for experimentation and learning the system.

    Step 1: Configuration Design

    Let’s start by defining what we want to monitor. This configuration is the brain of our system. It tells our extractor what to look for and how to interpret it. Think of this as programming your research assistant’s knowledge about what matters in competitive intelligence.

    We’re hard-coding in Firecrawl’s pages for the purposes of this demo, but you can of course extend this to dynamically take in other competitor URLs.

    Create config.py:

    Python
    MONITORING_TARGETS = {
        "pricing": {
            "url": "https://firecrawl.dev/pricing",
            "description": "Pricing plans and tiers",
            "extract_schema": {
                "type": "object",
                "properties": {
                    "plans": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "price": {"type": "string"},
                                "pages_per_month": {"type": "string"},
                                "features": {"type": "array", "items": {"type": "string"}}
                            }
                        }
                    }
                }
            }
        },
        "blog": {
            "url": "https://firecrawl.dev/blog",
            "description": "Latest blog posts",
            "extract_prompt": "Extract the titles, dates, and summaries of the latest blog posts"
        }
    }

    Design Decision: Schema vs Prompt Extraction

    Notice we’re using two different extraction methods. Each approach serves different competitive intelligence needs and understanding when to use which method is crucial for effective monitoring.

    Schema-based extraction (for the pricing page) works like filling out a standardized form. You define exactly what fields you expect and what types of data they should contain. This approach provides consistent structure across extractions, guarantees specific fields will be present or explicitly null, enables reliable numerical comparisons for metrics like prices, and works best when you know exactly what data structure to expect.

    Prompt-based extraction (for the blog) operates more like asking a smart assistant to summarize what they observe. You describe what you’re looking for in natural language, and the AI adapts to whatever it finds. This approach offers flexibility for varied content, adapts to different page layouts without breaking, handles content that might have varying formats, and uses natural language understanding to capture nuanced information.

    The choice between these methods depends on your competitive intelligence goals. Use schema extraction when you need to track specific metrics over time, compare numerical data across competitors, or ensure consistency for automated analysis. Use prompt extraction when monitoring diverse content types, tracking qualitative changes, or exploring new areas where you’re not sure what data might be valuable.

    Step 2: Building the Data Extraction Engine

    Now let’s build the component that actually fetches our competitive intelligence data. First, we define how we want to store our data:

    Python
    def _setup_database(self):
            """Create database and tables if they don't exist."""
            os.makedirs(os.path.dirname(DATABASE_PATH), exist_ok=True)
    
            conn = sqlite3.connect(DATABASE_PATH)
            cursor = conn.cursor()
    
            cursor.execute('''
                CREATE TABLE IF NOT EXISTS snapshots (
                    id INTEGER PRIMARY KEY AUTOINCREMENT,
                    target_name TEXT NOT NULL,
                    url TEXT NOT NULL,
                    data TEXT NOT NULL,
                    markdown TEXT,
                    extracted_at TIMESTAMP NOT NULL,
                    UNIQUE(target_name, extracted_at)
                )
            ''')
    
            conn.commit()
            conn.close()

    Database Design Philosophy

    The database design prioritizes simplicity for the purposes of this tutorial. SQLite requires zero configuration, creates a portable single-file database, provides sufficient capability for learning and prototyping, and comes built into Python without additional dependencies.

    Our schema intentionally focuses on snapshots rather than normalized relational data. We store both structured data as JSON and raw markdown for maximum flexibility. Timestamps enable historical analysis and trend identification. The unique constraint prevents accidental duplicate snapshots during development.

    This design works well for understanding competitive monitoring concepts and prototyping systems with moderate data volumes. However, it has limitations we’ll address in our production considerations section.

    The Extraction Logic

    Let’s now define the logic to extract data from the targets we set up in our config earlier.

    Python
    def extract_all_targets(self) -> Dict[str, Any]:
            """Extract data from all configured targets."""
            results = {}
            timestamp = datetime.now()
    
            for target_name, target_config in MONITORING_TARGETS.items():
                print(f"Extracting {target_name}...")
    
                try:
                    # Extract data based on configuration (with change tracking enabled)
                    if "extract_schema" in target_config:
                        # Use schema-based extraction
                        response = self.firecrawl.scrape(
                            target_config["url"],
                            formats=[
                                "markdown",
                                {
                                    "type": "json",
                                    "schema": target_config["extract_schema"]
                                }
                            ]
                        )
                        extracted_data = response.get("json", {})
                    elif "extract_prompt" in target_config:
                        # Use prompt-based extraction
                        response = self.firecrawl.scrape(
                            target_config["url"],
                            formats=[
                                "markdown",
                                {
                                    "type": "json",
                                    "prompt": target_config["extract_prompt"]
                                }
                            ]
                        )
                        extracted_data = response.get("json", {})
                    else:
                        # Just get markdown
                        response = self.firecrawl.scrape(
                            target_config["url"],
                            formats=["markdown"]
                        )
                        extracted_data = {}
    
                    markdown_content = response.get("markdown", "")
    
                    # Store in results
                    results[target_name] = {
                        "url": target_config["url"],
                        "data": extracted_data,
                        "markdown": markdown_content,
                        "extracted_at": timestamp.isoformat()
                    }
    
                    # Save to database
                    self._save_snapshot(
                        target_name,
                        target_config["url"],
                        extracted_data,
                        markdown_content,
                        timestamp
                    )
    
                    print(f"✓ Extracted {target_name}")
    
                except Exception as e:
                    print(f"✗ Error extracting {target_name}: {str(e)}")
                    results[target_name] = {
                        "url": target_config["url"],
                        "error": str(e),
                        "extracted_at": timestamp.isoformat()
                    }
    
            return results

    Key Design Patterns for Reliable Extraction

    The extraction logic implements several patterns that make the system robust for real-world use.

    • Graceful degradation ensures that if one target fails to extract, monitoring continues for other targets. This prevents a single problematic website from breaking your entire competitive intelligence pipeline.
    • Multiple format extraction captures both structured data and clean markdown text. The structured data enables automated analysis and comparison, while the markdown provides human-readable context and serves as a backup when structured extraction encounters unexpected page layouts.
    • Consistent timestamps ensure all targets in a single monitoring run share the same timestamp, creating coherent snapshots for historical analysis. This prevents timing discrepancies that could confuse change detection.
    • Error context preservation stores error information for debugging without crashing the system. This helps you understand why specific extractions fail and improve your monitoring configuration over time.

    Understanding Firecrawl’s Response

    When Firecrawl processes a page, it returns:

    Python
    {
        "markdown": "# Clean markdown of the page...",
        "extract": {
            # Your structured data based on schema/prompt
        },
        "metadata": {
            "title": "Page title",
            "statusCode": 200,
            # ... other metadata
        }
    }

    The markdown output represents the page content cleaned of navigation elements, advertisements, and other visual clutter. This is what makes Firecrawl superior to basic HTML scraping, you get the actual content without the noise. The extract field contains your structured data, formatted according to your schema or prompt. The metadata provides technical details about the extraction process.

    Step 3: Intelligent Change Detection

    Change detection is where our system provides real value. The goal is to understand which differences matter for competitive decision making.

    Python
    from deepdiff import DeepDiff
    
    class ChangeDetector:
        def detect_changes(self, current, previous):
            """
            Compare current snapshot with previous snapshot.
    
            This is where the magic happens - DeepDiff intelligently
            compares nested structures and gives us actionable insights.
            """
            if not previous:
                # First run - establish baseline
                return {
                    "is_first_run": True,
                    "message": "First extraction - no previous data to compare",
                    "current_data": current
                }
    
            changes = {
                "is_first_run": False,
                "changes_detected": False,
                "summary": [],
                "details": {}
            }
    
            # Compare structured data if available
            if current.get("data") and previous.get("data"):
                data_diff = DeepDiff(
                    previous["data"],
                    current["data"],
                    ignore_order=True,  # Order changes aren't usually significant
                    verbose_level=2,    # Get detailed change information
                    exclude_paths=["root['timestamp']"]  # Ignore expected changes
                )
    
                if data_diff:
                    changes["changes_detected"] = True
                    changes["details"]["data_changes"] = self._parse_deepdiff(data_diff)
    
            # Also check for significant content changes
            if current.get("markdown") and previous.get("markdown"):
                current_len = len(current["markdown"])
                previous_len = len(previous["markdown"])
    
                # Threshold of 100 chars filters out minor changes
                if abs(current_len - previous_len) > 100:
                    changes["changes_detected"] = True
                    changes["details"]["content_change"] = {
                        "previous_length": previous_len,
                        "current_length": current_len,
                        "difference": current_len - previous_len
                    }
    
            return changes

    Why DeepDiff?

    Firecrawl does have a built-in change detection feature but it’s still in beta and I didn’t want to take the risk of trying something new with my client. I might update this in the future after I’ve tried it out but for now DeepDiff is a good, free alternative.

    It understands the semantic meaning of differences rather than just identifying that something changed. So instead of flagging every tiny modification, creating noise that obscures important signals, it:

    • Handles Nested Structures: Pricing plans often have nested features, tiers, etc.
    • Ignores Irrelevant Changes: Array order changes don’t trigger false positives
    • Provides Change Context: Tells us not just what changed, but where in the structure
    • Makes Type-Aware Comparison: Knows that the string “100” and the integer 100 might represent the same value in different contexts

    Parsing DeepDiff Output

    DeepDiff returns changes in categories that we need to interpret and parse:

    • values_changed: Modified values (price changes, text updates)
    • iterable_item_added: New items in lists (new features, plans)
    • iterable_item_removed: Removed items (discontinued features)
    • dictionary_item_added: New fields (new data points)
    • dictionary_item_removed: Removed fields (deprecated info)
    Python
    def _parse_deepdiff(self, diff):
        parsed = {}
    
        # Value modifications - most common and important
        if "values_changed" in diff:
            parsed["modified"] = []
            for path, change in diff["values_changed"].items():
                parsed["modified"].append({
                    "path": self._clean_path(path),
                    "old_value": change["old_value"],
                    "new_value": change["new_value"]
                })
    
        # New items - often indicates new features or products
        if "iterable_item_added" in diff:
            parsed["added"] = []
            for path, value in diff["iterable_item_added"].items():
                parsed["added"].append({
                    "path": self._clean_path(path),
                    "value": value
                })
    
        # Removed items - could indicate discontinued offerings
        if "iterable_item_removed" in diff:
            parsed["removed"] = []
            for path, value in diff["iterable_item_removed"].items():
                parsed["removed"].append({
                    "path": self._clean_path(path),
                    "value": value
                })
    
        return parsed
    
    def _clean_path(self, path):
        """
        Convert DeepDiff's technical paths to readable descriptions.
    
        Example: "root['plans'][2]['price']" becomes "plans.2.price"
        """
        path = path.replace("root", "")
        path = path.replace("[", ".").replace("]", "")
        path = path.replace("'", "")
        return path.strip(".")

    The Importance of Thresholds

    Notice the 100-character threshold for content changes. This is intentional because not all changes are worth acting on. Small modifications like fixing typos or adjusting formatting create noise that distracts from meaningful signals. Significant changes like new sections, removed features, or substantial content additions indicate strategic shifts worth investigating.

    Setting appropriate thresholds requires understanding your competitive landscape. In fast-moving markets, you might want lower thresholds to catch early signals. In stable industries, higher thresholds prevent alert fatigue from minor updates.

    Step 4: Creating Actionable Reports

    While our change detection system identifies what’s different, the reporter system explains what those differences mean for your competitive position and what actions you should consider taking.

    All we’re doing here is sending the information we’ve gathered to OpenAI (or the LLM of your choice) to turn into a report. On our first run, we ask it to generate a baseline of our competitor and then on subsequent runs we ask it to analyze the diffs within that context and produce an actionable report.

    Most of this is just prompt engineering. Here are some basic prompts you can start with, but feel free to tweak it for your use case:

    Python
    system_prompt = """You are a competitive intelligence analyst. Your job is to analyze competitor data and changes, then generate actionable business insights.
    
    Given competitor monitoring data with DETECTED CHANGES, create a professional markdown report that includes:
    
    1. **Executive Summary** - High-level insights and key takeaways
    2. **Critical Changes** - Most important changes that require immediate attention
    3. **Strategic Implications** - What these changes mean for competitive positioning
    4. **Recommended Actions** - Specific steps the business should consider
    5. **Market Intelligence** - Broader patterns and trends observed
    
    Focus on business impact, not technical details. Be concise but insightful. Use markdown formatting with appropriate headers and bullet points."""
    
    user_prompt = f"""Analyze this competitor monitoring data and generate a competitive intelligence report focused on CHANGES DETECTED:
    
    **Date:** {timestamp.strftime('%B %d, %Y')}
    
    **Data Overview:**
    - Targets monitored: {len(analysis_data['targets_analyzed'])}
    - Changes detected: {analysis_data['changes_detected']}
    
    **Detailed Data with Changes:**
    ```json
    {json.dumps(analysis_data, indent=2, default=str)}
    ```
    Please generate a professional competitive intelligence report based on the changes detected. Focus on actionable business insights rather than technical details."""
    

    Running the System

    And those are our four components! As I mentioned earlier, I’m building this as part of a larger system for my client, so we have this set up to run automatically at regular intervals and aside from generating a report (which gets posted to slack automatically) it also updates other competitive positioning material like landing pages and sales enablement content.

    But for the purposes of this demo, we can run this manually in the command line. Create a main.py file to orchestrate the full system:

    Python
    def main():
        """Main execution function."""
        print("=" * 60)
        print("Competitor Research Automation with Firecrawl")
        print("=" * 60)
    
        # Load environment variables
        load_dotenv()
        api_key = os.getenv("FIRECRAWL_API_KEY")
    
        if not api_key:
            print("\nError: FIRECRAWL_API_KEY not found in environment variables")
            print("Please set your API key in a .env file or as an environment variable")
            print("Example: export FIRECRAWL_API_KEY='fc-your-key-here'")
            sys.exit(1)
    
        print(f"\nRun started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        print(f"Monitoring {len(MONITORING_TARGETS)} targets\n")
    
        # Initialize components
        extractor = CompetitorExtractor(api_key)
        detector = ChangeDetector()
        reporter = AIReporter()
    
        # Extract current data
        print("Extracting current data from targets...\n")
        current_results = extractor.extract_all_targets()
    
        # Get previous snapshots for comparison
        previous_snapshots = {}
        for target_name in MONITORING_TARGETS.keys():
            previous = extractor.get_previous_snapshot(target_name)
            if previous:
                previous_snapshots[target_name] = previous
    
        # Detect changes
        print("\nAnalyzing changes...")
        all_changes = detector.detect_all_changes(current_results, previous_snapshots)
    
        # Generate summary
        change_summary = detector.summarize_changes(all_changes)
    
        # Display summary in console
        print("\nSummary of Changes:")
        print("-" * 40)
        if change_summary:
            for summary_item in change_summary:
                print(summary_item)
        else:
            print("No targets monitored yet.")
    
        # Generate report
        print("\nGenerating report...")
        report_path = reporter.generate_report(current_results, all_changes, change_summary)
    
        # Final status
        print("\n" + "=" * 60)
        print("Monitoring Complete!")
        print(f"Report saved to: {report_path}")
    
        # Check if this is the first run
        if all([changes.get("is_first_run") for changes in all_changes["targets"].values()]):
            print("\nThis was the first run - baseline data has been captured.")
            print("   Run the script again later to detect changes!")
    
        print("=" * 60)

    The initial run serves as the foundation for all future competitive analysis. During this run, the system captures baseline data for each target, establishes the data structure for comparison, creates the storage schema, and validates that extraction works correctly for your chosen targets.

    After establishing your baseline, subsequent runs focus on identifying and analyzing changes that inform competitive strategy.

    Production Considerations: Understanding System Limitations

    While this tutorial creates a functional competitive monitoring system, it’s designed for demonstration and learning rather than enterprise deployment. Understanding these limitations helps you recognize when and how to evolve the system for production use.

    Database and Storage Limitations

    The SQLite database provides excellent simplicity for learning and prototyping, but it has constraints that affect production scalability. SQLite handles concurrent reads well but struggles with concurrent writes, making it unsuitable for systems that need to extract data from multiple sources simultaneously. The single-file design makes backup and replication more complex than necessary for critical business systems.

    For production systems, consider PostgreSQL or MySQL for better concurrency handling and enterprise features. Cloud databases like AWS RDS or Google Cloud SQL provide managed infrastructure, automated backups, and scaling capabilities.

    API Rate Limiting and Cost Management

    The current system makes API calls sequentially without sophisticated rate limiting or cost optimization. Firecrawl’s pricing scales with usage, so uncontrolled extraction could become expensive quickly. The system doesn’t implement intelligent scheduling based on page change frequency, meaning it might waste API calls on static content.

    Production systems should implement adaptive scheduling that checks high-priority targets more frequently, uses exponential backoff for rate limiting, implements cost monitoring and alerts, and caches results when appropriate to reduce redundant API calls.

    Error Recovery and Resilience

    The current error handling is basic and suitable for development but insufficient for production reliability. Network failures, API timeouts, and parsing errors need more sophisticated handling. The system doesn’t implement retry logic with exponential backoff or distinguish between temporary and permanent failures.

    Production systems require comprehensive logging for debugging and monitoring, retry mechanisms for transient failures, circuit breakers to prevent cascading failures, and health checks to monitor system status.

    Data Quality and Validation

    The tutorial system assumes extracted data is reliable and correctly formatted, but real-world web scraping encounters many data quality issues. Websites change their structure, introduce temporary errors, or modify content in ways that break extraction logic.

    Production systems need data validation pipelines that verify extracted data meets expected formats, detect and handle parsing failures gracefully, implement data quality scoring to identify unreliable extractions, and provide alerts when data quality degrades.

    Customizing and Extending The System

    I’ve only shown you the core functionality of scraping competitors and identifying changes. With this in place as your foundation, there’s a lot you can do to turn this into a powerful competitive intelligence system for your company:

    • Alerting system: Integrate with Slack or email to send out notifications to differerent people or teams in your organization based on the type of change.
    • Track patterns: Extend the system to track changes over longer periods of time and see patterns.
    • Add more data sources: Scrape their ads, social media, and other properties for more insights into their GTM and positioning.
    • Integrate with BI: incorporate competitive data into executive dashboards, combine it with internal metrics, and support strategic planning processes
    • Multi-competitor dashboards: Instead of just generating reports, you can create an interactive dashboard to visualize changes.
    • Auto-update your assets: As I’m doing with my client, you can automatically update your competitive positioning assets like landing pages if there’s a significant product or pricing update.

    Conclusion: From Monitoring to Intelligence

    With tools like Firecrawl, we can abstract away the scraping and monitoring infrastructure and focus on building out an actual intelligence system that suggests and even takes actions for us.

    Firecrawl also has a dashboard where you can experiment with the different scraping options and see what comes back. Give it a try and implement the code in your app.

    And if you want more tutorials on building useful AI agents, sign up below.

  • How People Really Use ChatGPT, and What It means for Businesses

    How People Really Use ChatGPT, and What It means for Businesses

    Every week, 700 million people fire up ChatGPT and send more than 18 billion messages. That’s about 10% of the world’s adults, collectively talking to a chatbot at a rate of 29,000 messages per second.

    The question is: what on earth are they talking about?

    OpenAI and a team of economists recently released a fascinating paper that digs into exactly that. It’s the first time we’ve seen a systematic breakdown of how people actually use ChatGPT in the wild.

    There’s one important caveat though: the study only looks at consumer accounts (Free, Plus, and Pro). No Teams, no Enterprise, no API. That means all the numbers you’re about to see skew toward personal usage rather than business use.

    But even with that limitation, the trends are clear. And when you combine the consumer data with what we know about enterprise usage, a bigger story emerges about how AI is reshaping both work and daily life.

    Work vs. Non-Work: AI Moves Into Daily Life

    In mid-2024, about half of consumer ChatGPT messages were work-related. Fast forward a year and non-work usage dominates. 73% of messages are about personal life, curiosity, or hobbies.

    Some of this is skew: Enterprise data isn’t in here, and yes, plenty of serious work happens on corporate accounts. But I don’t think that fully explains it. There is a real trend of people bringing ChatGPT into their everyday lives.

    That tracks with my own journey. Back in 2020, when I first used the GPT-3 API, it was strictly work. I was building a startup on top of it back then so it was all about product development, copywriting, business experiments.

    When ChatGPT launched, I still had a “work-only” account. Over time, I started asking it for personal things too. Today? I’m about 50-50. And that’s exactly what the data shows at scale.

    The paper also shows that each cohort of users increases their usage over time. Early adopters send more messages than newer ones, but even the new cohorts ramp up the longer they stick around.

    That also reflects my personal experience. The more I played with ChatGPT, the more I discovered new ways to use it, from drafting a proposal to planning a weekend trip. It went from a tool I used for certain activities to something I turn to almost immediately for any activity.

    The Big Three Use Cases

    When you zoom out, almost 80% of all usage falls into three buckets:

    1. Practical Guidance (29%): tutoring, how-to advice, creative ideation.
    2. Seeking Information (24%, up from 14%): essentially, ChatGPT-search.
    3. Writing (24%, down from 36%): drafting, editing, translating, summarizing.

    What’s fascinating here is the growth of seeking information. The move from Google to ChatGPT is real. People are asking it for information, advice, even recommendations for specific products. Personally, I’ve used it for everything from planning a trip to Barcelona to asking why so many Japanese restaurants feature a waving maneki-neko cat statue.

    There’s also a very big opportunity here in the education space. If we break it down further, 10.2% of all ChatGPT messages are tutoring and teaching requests. That’s one in every ten conversations, making ChatGPT one of the world’s largest educational platforms.

    Now, when you look at work-related queries only, writing is still king: 40% of all work-related usage is writing. And that makes sense. Everyone deals with emails and business communications.

    Interestingly, two-thirds of writing requests are edits to user-provided text (“improve this email”) rather than net-new generation (“write a blog post for me”). AI is acting more as a co-writer and editor than a ghostwriter.

    Where’s Coding?

    One surprise: only 4.2% of consumer ChatGPT usage is programming-related. Compare that to Claude, where 30%+ of conversations are coding.

    But that doesn’t mean coding with AI isn’t happening. It’s just happening elsewhere (in the API, in GitHub Copilot, in Cursor, in Claude Code). Developers don’t want to pop into a chatbot window; they want AI integrated into their IDEs and workflows.

    So the consumer product underrepresents coding’s real importance.

    Self-Expression: Smaller Than Expected

    Another surprise: “self-expression” (role play, relationships, therapy-like use) is only 4.3% of usage. That’s far smaller than some surveys had suggested.

    Part of me wonders if some of these conversations were misclassified. But if the data’s accurate, I’m actually glad. We already know AI has a sycophancy problem. The last thing we need is people turning it into their therapist en masse.

    Further on in the research, there’s more evidence to indicate this: self-expression had the highest satisfaction scores of any category. The good-to-bad ratio was almost 8:1, way higher than writing or coding. People seem happiest when using it for therapy.

    Asking vs. Doing

    The researchers also classified queries into three intents:

    • Asking: seeking info or advice (“What’s a good health plan?”).
    • Doing: asking ChatGPT to produce an output (“Rewrite this email”).
    • Expressing: sharing feelings or views (“I’m feeling stressed”).

    Across consumer usage:

    • 49% Asking
    • 40% Doing
    • 11% Expressing

    Here’s what’s interesting: Asking is growing faster than Doing, and Asking gets higher satisfaction.

    Why? Because asking for advice or information is pretty straightforward. There’s not a lot that can go wrong if you ask the AI what the capital of Canada is.

    But when people ask ChatGPT to do something, they often don’t provide enough context for a great output. In writing, for example, “write me a blog post on fitness” usually gives you generic AI slop. Having worked with multiple companies and trained professionals on how to use ChatGPT, I often see them try to get an output without adding any context or prompting the AI well.

    But, as models get better at handling sparse instructions, and as people get better at prompting, Doing will likely grow. Especially with OpenAI layering on more agentic capabilities. Today, ChatGPT is an advisor. Tomorrow, it will be a doer too.

    Who’s Using ChatGPT?

    Some demographic shifts worth noting:

    • Age: Nearly half of usage comes from people under 26.
    • Gender: Early adopters were 80% male; now, usage is slightly female-majority.
    • Geography: Fastest growth is in low- and middle-income countries.
    • Education/Occupation: More educated professionals use it for work; managers lean on it for writing, technical users for debugging/problem-solving.

    That international growth story is remarkable. We’re witnessing the birth of the first truly global intelligence amplification tool. A software developer in Lagos now has access to the same AI coding assistant as someone in San Francisco.

    For businesses, this matters. Tomorrow’s workforce is AI-native, global, and diverse. Employees (and customers) are going to bring consumer AI habits into the workplace whether enterprises are ready or not.

    ChatGPT as Decision Support

    When you look at work-related usage specifically, the majority of queries cluster around two functions:

    1. Obtaining, documenting, and interpreting information
    2. Making decisions, giving advice, solving problems, and thinking creatively

    This is the essence of decision support. And in my consulting work, it’s where I see the biggest ROI. Companies want automation, but the biggest unlock is AI that helps people make smarter, faster decisions.

    The Big Picture

    So what does all this tell us?

    For consumers: ChatGPT is increasingly a part of daily life, not just work.

    For businesses: Don’t just track “what consumers are doing with AI.” Track how those habits bleed into the workplace. Adoption starts at home, then shows up in the office.

    For the future: AI at work will center on decision support, not pure automation. The companies that understand this earliest will unlock the most value.

    The intelligence revolution is already here, 29,000 messages per second at a time. The question is whether your organization is ready for what comes next.

    Get more deep dives on AI

    Like this post? Sign up for my newsletter and get notified every time I do a deep dive like this one.