Building a Deep Research Agent with LangGraph

I was talking to a VC fund recently about their investment process. Part of their due diligence is doing thorough and deep research about the market, competition, even the founders, for every startup pitch they receive.

They use OpenAI’s Deep Research for the core research (Claude and Gemini have these features too) but there’s still a lot of manual work to give it the right context, guide the research, incorporate their previous research and data, and edit the final output to match their memo formats.

They wanted a way to integrate it into their workflow and automate the process. and that’s why they approached me.

It turns out there’s no magic to OpenAI’s Deep Research features. It’s all about solid Agent Design Principles.

And since I recently wrote a tutorial on how to build a coding agent, I figured I’d do one for Deep Research!

In this tutorial, you’ll learn how to create a Deep Research Agent, similar to OpenAI’s using LangGraph.

Want to build your own AI agents?

Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

Why Roll Your Own Deep Research?

As I mentioned, OpenAI, Claude, and Gemini all have their own Deep Research product. They’re good for general purpose usage but when you get into specific enterprise workflows or domains like law, finance, etc., there are other factors to think about:

  1. Customization & Control: You may want control over which sources are trusted, how evidence is weighed, what gets excluded. You may also want to add your own heuristics, reasoning loops, and custom output styles.
  2. Source Transparency & Auditability: You may need to choose and log sources and also store evidence trails for compliance, legal defensibility, or investor reporting.
  3. Data Privacy & Security: You may want to keep sensitive queries inside your environment, or use your own private data sources to enrich the research.
  4. Workflow Integration: Instead of copying and pasting to a web app, you can embed your own research agent in your existing workflow and trigger it automatically via an API call.
  5. Scale and Extensibility: Finally, rolling your own means you can use open source models to reduce costs at scale, and also extend it into your broader agent stack and other types of work.

I actually think there’s a pretty big market for custom deep research agents, much like we have a massive custom RAG market.

Think about how many companies spend billions of dollars on McKinsey and the like for market research. Corporations will cut those $10M retainers if an in-house DeepResearch agent produces 80% of the same work.

Why LangGraph?

We could just code this in pure Python but I wanted to use an agent framework to abstract away some of the context management stuff. And since I’ve already explored other frameworks on this blog, like Google’s ADK, I figured I’d give LangGraph a shot.

LangGraph works a bit different to other frameworks in that it lets us model any workflow as a state machine where data flows through specialized nodes, each one handling one aspect of the workflow.

This gives us some important advantages:

  • State management made simple. Every step in our deep research pipeline passes along and updates a shared state object. This makes it easy to debug and extend.
  • Graph-based execution. Instead of linear scripts, LangGraph lets you build an explicit directed graph of nodes and edges. That means you can retry, skip, or expand nodes later without rewriting your whole pipeline.
  • Reliability and observability. Built-in support for retries, checkpoints, and inspection makes it easier to trust your agent when it runs for minutes and touches dozens of APIs.
  • Future-proofing. When you want to expand from a linear flow to something collaborative, you can do it by just adding nodes and edges to the graph.

Understanding the Research Pipeline

To keep this simple, our deep research agent will follow a linear pipeline that mirrors a basic research workflow. So it’s not really an “agent”, because it follows a pre-defined flow, but I’ll explain how you can make it more agentic later.

Think about how you research a complex topic manually:

  1. You start by breaking down the big question into smaller, focused questions
  2. You search for information on each sub-topic
  3. You collect and read through multiple sources
  4. You evaluate which information is most reliable and relevant
  5. You synthesize everything into a coherent narrative

Our agent will work the same way:

Markdown
Research Question → Planner → Searcher → Fetcher → Ranker → Writer → Final Report

Each node has a specific responsibility:

Planner Node: Takes your research question and breaks it into 3-7 focused sub-questions. Also generates optimized search queries for each sub-question. If your question is vague, it asks clarifying questions first.

Searcher Node: Uses the Exa API to find relevant web sources for each search query. Smart enough to filter out low-quality sources and prioritize recent content for time-sensitive queries.

Fetcher Node: Downloads web pages and extracts clean, readable text. Handles modern JavaScript-heavy websites using Crawl4AI, removes navigation menus and ads, and splits content into manageable passages.

Ranker Node: Takes all the text passages and ranks them by relevance to the original research question. Uses neural reranking with Cohere to find the most valuable information.

Writer Node: Takes all the information and compiles it into a comprehensive executive report with proper citations, executive summary, and strategic insights.

Setting Up the Foundation

Aside from LangGraph, we’re using a few other tools to build out our app:

  • Exa: Exa is awesome for a deep research agent because of it’s AI-optimized search API that understands semantic meaning rather than just keywords.
  • Crawl4AI: This is a free library for web scraping and handles modern JavaScript-heavy websites that traditional scrapers can’t process
  • GPT-4o: We’re going to be using GPT-4o as our main model for planning our search and writing the final report. You can use GPT-5 but it’s overkill.
  • Cohere: Finally, we use Cohere to provide specialized neural reranking to identify the most relevant content that we get back from our searches.

Feel free to switch out any of these tools for something else. That’s the beauty of rolling your own deep research.

Designing the Data Models

As I mentioned earlier, LangGraph models a workflow as a state machine. So we need to start with data models that define the shared state that flows through the workflow.

Think of this state as a growing research folder that each node adds to – the planner adds sub-questions, the searcher adds sources, the fetcher adds content, and so on.

The most important model is `ResearchState`, which acts as our central data container:

Python
# src/deep_research/models/core.py
class ResearchState(BaseModel):
    # Input
    research_question: Optional[ResearchQuestion] = None

    # Intermediate states
    sub_questions: List[SubQuestion] = Field(default_factory=list)
    search_queries: List[str] = Field(default_factory=list)
    sources: List[Source] = Field(default_factory=list)

    # Final output
    research_report: Optional[ResearchReport] = None

    # Processing metadata
    status: ResearchStatus = ResearchStatus.PENDING
    current_step: str = "initialized"
    error_message: Optional[str] = None
    processing_stats: Dict[str, Any] = Field(default_factory=dict)

This state object starts with just a research question and gradually accumulates data as it moves through the pipeline. Each field represents a different stage of processing – from the initial question to sub-questions, then sources, and finally a complete report.

We also need supporting models for individual data types like `ResearchQuestion` (the input), `Source` (web pages we find), `Passage` (chunks of text from those pages), and `ResearchReport` (the final output). Each uses Pydantic for validation and includes metadata like timestamps and confidence scores.

The implementation follows the same pattern as `ResearchState` with proper field validation and default values.

Building the LangGraph Workflow

Now let’s build the core workflow that orchestrates our research pipeline. This means defining a state graph where each node can modify shared state and edges determine the flow between nodes.

Here’s how we set up our workflow structure:

Python
# Create the state graph
workflow = StateGraph(ResearchState)

# Add our six research nodes
workflow.add_node("planner", planner_node)
workflow.add_node("searcher", searcher_node) 
workflow.add_node("fetcher", fetcher_node)
workflow.add_node("ranker", ranker_node)
workflow.add_node("writer", writer_node)

# Define the linear flow
workflow.set_entry_point("planner")
workflow.add_edge("planner", "searcher")
workflow.add_edge("searcher", "fetcher") 
workflow.add_edge("fetcher", "ranker")
workflow.add_edge("ranker", "writer")
workflow.add_edge("writer", END)

# Compile into executable graph
graph = workflow.compile()

The LangGraph workflow orchestrates our research pipeline. Think of it as the conductor of an orchestra – it knows which instrument (node) should play when and ensures they all work together harmoniously.

The workflow class does three main things:

  • Graph Construction: Creates a LangGraph StateGraph and connects our six nodes in sequence.
  • Node Wrapping: Each node gets wrapped with error handling and progress reporting.
  • Execution Management: Runs the graph and handles any failures gracefully.

Want to build your own AI agents?

Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.

Implementing the Research Nodes

Now let’s build each node in our research pipeline. I’ll show you the key concepts and implementation strategies for each one, focusing on the interesting architectural decisions.

Node 1: The Planner – Breaking Down Complex Questions

The planner is the strategist of our system. It takes a potentially vague research question and transforms it into a structured research plan:

Context Clarification: If someone asks “What’s happening with AI?”, that’s too broad to research effectively. The planner detects this and generates clarifying questions:

  • “Are you interested in recent AI breakthroughs, business developments, or regulatory changes?”
  • “What’s your intended use case – research, investment, or staying informed?”
  • “Any specific AI domains of interest (like generative AI, robotics, or safety)?”

Question Decomposition: Once it has enough context, it breaks the main question into 3-7 focused sub-questions. For “latest AI safety developments,” it might generate:

  1. “What are the most recent AI safety research papers and findings from 2025?”
  2. “What regulatory developments in AI safety have occurred recently?”
  3. “What are the latest industry initiatives and standards for AI safety?”
Python
class PlannerNode:
    async def plan(self, state: ResearchState) -> ResearchState:
        self._report_progress("Analyzing research question", "planning")
        
        # Generate sub-questions
        sub_questions = await self._decompose_question(state.research_question)
        state.sub_questions = sub_questions
        
        self._report_progress(f"Generated {len(sub_questions)} sub-questions", "planning")
        return state
        
    async def decompose_question(self, state: ResearchState) -> ResearchState:
        current_date = datetime.now().strftime("%B %Y")  # "August 2025"
        
        system_prompt = f"""You are a research planning expert. 
        Current date: {current_date}
        
        Decompose this research question into 3-7 focused sub-questions that together 
        will comprehensively answer the main question. If the question asks for 
        "latest" or "recent" information, focus on finding up-to-date content."""
        
        response = await self.llm.ainvoke([system_prompt, user_question])
        # ... parsing logic to create SubQuestion objects

Node 2: The Searcher – Finding Relevant Sources

The searcher takes our optimized queries and finds relevant web sources. It uses the Exa API, which is specifically designed for AI applications and provides semantic search capabilities beyond traditional keyword matching.

The Exa API also allows us to customize our searches:

  • Source type detection: Automatically categorizes sources as academic papers, news articles, blog posts, etc.
  • Quality filtering: Filters out low-quality sources and duplicate content
  • Temporal prioritization: For time-sensitive queries, prioritizes recent sources
  • Domain filtering: Can focus on specific domains if specified
Python
class SearcherNode:
    def __init__(self, exa_api_key: str, max_sources_per_query: int = 10):
        self.exa = Exa(api_key=exa_api_key)
        self.max_sources_per_query = max_sources_per_query
    
    async def search_for_subquestion(self, subquestion: SubQuestion) -> List[Source]:
        results = []
        for query in subquestion.search_queries:
            # Use Exa's semantic search with temporal filtering
            search_results = await self.exa.search(
                query=query,
                num_results=self.max_sources_per_query,
                include_domains=["gov", "edu", "arxiv.org"],  # Prioritize authoritative sources
                start_published_date="2025-01-01"  # Recent content for temporal queries
            )
            # Convert Exa results to our Source objects

Node 3: The Fetcher – Extracting Clean Content

The fetcher downloads web pages and extracts clean, readable text. This is more complex than it sounds because modern websites are full of navigation menus, ads, cookie banners, and JavaScript-generated content.

We use Crawl4AI because it handles JavaScript-heavy sites and provides intelligent content extraction. It can distinguish between main content and page chrome (navigation, sidebars, etc.).

Python
class FetcherNode(AsyncContextNode):
     async def fetch(self, state: ResearchState) -> ResearchState:
        self._report_progress("Starting content extraction", "fetching")
        
        all_passages = []
        for source in state.sources:
            try:
                # Extract clean content using Crawl4AI
                result = await self.crawler.arun(
                    url=str(source.url),
                    word_count_threshold=10,
                    exclude_tags=['nav', 'footer', 'aside', 'header'],
                    remove_overlay_elements=True,
                )
                
                if result.success and result.markdown:
                    # Split content into manageable passages
                    passages = self._split_into_passages(result.markdown, source.id)
                    all_passages.extend(passages)
            except Exception:
                continue  # Skip failed sources
        
        state.passages = all_passages
        return state

Node 4: The Ranker – Finding the Most Relevant Information

After fetching, we might have hundreds of articles. The ranker’s job is to identify the most relevant ones for our research question.

We first cut up all the articles into overlapping passages. We then pass all those passages into Cohere’s reranking API and re-rank them against the original queries. We can then take the first x% of passages and pass them on to the next node.

By doing it this way, we eliminate a lot of the fluff that many articles tend to have and extract only the meat.

Python
class RankerNode(NodeBase):
    async def rerank_with_cohere(self, passages: List[Passage], query_text: str) -> List[Passage]:
        """Optionally rerank passages using Cohere's rerank API."""
        if not self.cohere_client or not passages:
            return passages

        try:
            # Prepare documents for reranking
            documents = [p.content for p in passages]

            # Use Cohere rerank
            rerank_response = self.cohere_client.rerank(
                model="rerank-english-v3.0",
                query=query_text,
                documents=documents,
                top_n=min(len(documents), self.rerank_top_k),
                return_documents=False,
            )

            # Reorder passages based on Cohere ranking
            reranked_passages = []
            for result in rerank_response.results:
                if result.index < len(passages):
                    passage = passages[result.index]
                    passage.rerank_score = result.relevance_score
                    reranked_passages.append(passage)

            return reranked_passages

        except Exception as e:
            print(f"Cohere reranking failed: {e}")
            # Fallback: return original passages
            return passages

Node 6: The Writer – Synthesizing the Final Report

The writer takes all the information and compiles it into a comprehensive executive report. It’s optimized for strategic decision-making with executive summaries, clear findings, and proper citations.

At the simplest level we just need to pass the original query and all the passages to an LLM (I’m using GPT-4o in this example but any LLM should do) and have it turn that into a research report.

This node is mostly prompt engineering.

Python
async def generate_research_content(self, state: ResearchState
    ) -> tuple[str, List[ExecutiveSummaryPoint]]:
        # Build context about sources  
    recent_sources = len(
        [
            s
            for s in sources_with_content
            if s.publication_date
            and (datetime.now() - s.publication_date).days < 180
        ]
    )
    source_context = f"Based on analysis of {len(sources_with_content)} sources ({recent_sources} recent)."

    system_prompt = """You are a research analyst writing a comprehensive, readable research report from web sources.
        
Your task:
1. Analyze the provided source content and synthesize key insights
2. Create a natural, flowing report that reads well
3. Organize information logically with clear sections and headings
4. Write in an engaging, accessible style suitable for executives
5. Include proper citations using [Source: URL] format
6. Identify key themes, trends, and important findings
7. Note any contradictions or conflicting information

IMPORTANT: Structure your response as follows:
---EXECUTIVE_SUMMARY---
[Write 3-5 concise bullet points that capture the key insights from your research]

---FULL_REPORT---
[Write the detailed research report with proper sections, analysis, and citations]

This format allows me to extract both the executive summary and full report from your response."""

        # Prepare source content for the LLM
    source_texts = []
    for i, source in enumerate(sources_with_content, 1):
        # Truncate very long content to fit in context window
        content = source.content or ""
        if len(content) > 8000:  # Reasonable limit per source
            content = content[:8000] + "...[truncated]"
            
        source_info = f"Source {i}: {source.title or 'Untitled'}\nURL: {source.url}\n"
        if source.publication_date:
            source_info += f"Published: {source.publication_date.strftime('%Y-%m-%d')}\n"
        source_info += f"Content:\n{content}\n"
        source_texts.append(source_info)
        
    sources_text = "\n---\n".join(source_texts)

    research_question = state.research_question
    if not research_question:
        return "No research question provided.", []

    human_prompt = f"""Research Question: {research_question.question}

Context: {research_question.context or 'General research inquiry'}

{source_context}

Source Materials:
{sources_text}

Please write a comprehensive, well-structured research report that analyzes these sources and answers the research question:"""

    try:
        messages = [
            SystemMessage(content=system_prompt),
            HumanMessage(content=human_prompt),
        ]

        response = await self.llm.ainvoke(messages)
        if isinstance(response.content, str):
            content = response.content.strip()
        else:
            content = str(response.content).strip()

        # Parse the structured response
        return self._parse_llm_response(content)

    except Exception as e:
        print(f"Research content generation failed: {e}")
        return "Unable to generate research content at this time.", []

The final output

Here’s what it looks like when everything comes together. If you have a look at the full source code on my GitHub, you’ll see that I’ve added in a CLI, but you could trigger this from any other workflow.

Python
# Install and setup
pip install -e .
export OPENAI_API_KEY="your-key"
export EXA_API_KEY="your-key"

# Run a research query
deep-research research "What are the latest developments in open source LLMs?"

When you run this command, here’s what happens behind the scenes:

  1. Context Analysis: The planner analyzes your question. If it’s vague, it presents clarifying questions:
    • “Are you interested in recent breakthroughs, regulatory developments, or industry initiatives?”
    • “What’s your intended use case – research, investment, or staying informed?”
  2. Research Planning: Based on your answers, it generates focused sub-questions:
    • “What are the most recent open source AI research papers and findings from 2025?”
    • “What developments in open source AI have occurred recently?”
  3. Intelligent Search: For each sub-question, it executes multiple searches using Exa’s semantic search, finding 50-100 relevant sources.
  4. Content Extraction: Downloads and extracts clean text from all sources using Crawl4AI, handling JavaScript and filtering out navigation/ads.
  5. Relevance Ranking: Ranks hundreds of text passages to find the most valuable information.
  6. Report Generation: Synthesizes everything into a comprehensive executive report with strategic insights.

Next Steps

As I said at the start, this is a simple linear workflow and not really an agent.

To make it more agentic, we can redesign the system around a reasoning model, with each node being a tool it can use, and a ReAct loop.

The cool thing about LangGraph is that, since we’ve already defined our tools (individual nodes) we don’t really need to change much. We simply change the graph from a linear one to a hub-and-spoke model.

So instead of one node leading to the next, we have a central LLM node, and it has two-way connections to other nodes. We send our request to the central LLM node, and it decides what tools it wants, and in which order. It can call tools multiple times, and it can also respond back to the user to check-in or clarify the direction, before executing more tool calls.

This system is much more powerful because the user and the LLM can change directions during the research process as new information comes in. In the example above, let’s say we pick up on the Vicuna model and also GPT-OSS. We may determine that since GPT-OSS is a trending topic that we should focus on that direction, and drop Vicuna.

Similarly if we’re not satisfied with the final report, we may go back and forth with our LLM to run a few more queries, verify a source, or fine tune the structure.

And if we want to add new tools, like a source verification tool, we simply define a new node, and add a two way connection to our central node.

Conclusion

By combining LangGraph’s workflow capabilities with specialized APIs like Exa and Crawl4AI, we created a system that automates the research process from question to comprehensive report.

While the big AI labs have built impressive research products, you now have the blueprint to build something equally powerful (and more customized) for your specific needs.

Want to build your own AI agents?

Sign up for my newsletter covering everything from the tools, APIs, and frameworks you need, to building and serving your own multi-step AI agents.