From Internal Memo to Public Story: Building a Content Agent for VCs

I was speaking to a VC last month and they mentioned their marketing strategy is to publish their investment memos as blog posts.

It’s really one of the best ways for a VC to grow their brand. Bessemer and other big funds do it often.

The only problem is it takes time. Smaller VCs don’t have the bandwidth or resources to turn their private investment memos into public ones.

The VC I spoke to said it took them hours to remove sensitive information from their investment memos, and rewrite them in a consistent style for the blog. So they weren’t able to do it consistently.

Well, AI can solve that. Today, we’ll build an AI agent that automates this process using Claude and modern web scraping tools.

The Solution

I built a fully autonomous agent for the VC that gets triggered when they create a new investment memo and automatically publishes a draft post to their blog. This entire workflow is invisible and sits inside their operations.

For the purposes of this blog post, I’m going to simplify it. Instead of automatically triggering it (since I don’t know what your operations look like), we’ll create an interface with Streamlit where you can manually trigger it.

Our interface will:

  • Accept investment memos in various formats (text, PDF, DOCX)
  • Use Claude to identify and remove sensitive information
  • Scrape the company’s website for public information
  • Generate a polished blog post matching your writing style
  • Provide easy export options for the final content

Implementation

Setting Up the Project

This entire project with code is open source on my GitHub for you to clone and run locally. After that you can modify the code and customize it to your fund.

You can also take the core agent logic and have it triggered by your existing workflow and even publish to your website.

I’ve set up the files with this structure:

Python
memo-converter/
├── requirements.txt
├── README.md
├── .env
└── src/
    ├── __init__.py
    ├── main.py
    ├── interface.py
    ├── fetcher.py
    ├── sanitizer.py
    └── generator.py

The Interface file is the code for our Streamlit interface. You can upload your investment memo to it, enter the URL of the company you’re investing in, and also upload a blog post whose style you want to match (ideally one of your own).

Fetcher fetches more information about the company you’re investing in. Sanitizer cleans your investment memo. Generator creates the blog post.

At the end of it, the Interface display the final post.

Building the Interface

The interface.py file is where we’ll create our Streamlit interface. Streamlit is an open-source Python package that makes it easy to build visual interfaces for data and AI.

Our interface will provide a clean, intuitive way to input memos and reference content. As you can see in the code, we use tabs to organize different input methods and provide document preview functionality. We’re also going to add a little slider to control how long we want the final blog post to be.

Python
def create_interface(self) -> Dict:
       st.title("Investment Memo to Marketing Blog Post Converter")
      
       # Create tabs for different input methods
       tab1, tab2 = st.tabs(["Paste Memo", "Upload Document"])
      
       # Initialize memo variable
       memo = None
      
       with tab1:
           memo_text = st.text_area(
               "Investment Memo",
               height=300,
               placeholder="Paste your investment memo here..."
           )
           if memo_text:
               memo = memo_text
              
       with tab2:
           uploaded_file = st.file_uploader(
               "Upload Memo Document",
               type=['txt', 'docx', 'pdf']
           )
           if uploaded_file:
               memo = self._read_document(uploaded_file)
               if memo:
                   st.success("Document successfully loaded!")
                   with st.expander("Preview Document Content"):
                       st.text(memo[:500] + "...")


       # Reference memo input
       st.subheader("Reference Content")
       reference_tab1, reference_tab2 = st.tabs(["Paste Reference", "Upload Reference"])
      
       reference_memo = None
       with reference_tab1:
           reference_text = st.text_area(
               "Paste a previous public memo for tone/style reference",
               height=200,
               placeholder="Paste a previous public memo here..."
           )
           if reference_text:
               reference_memo = reference_text
              
       with reference_tab2:
           reference_file = st.file_uploader(
               "Upload Reference Document",
               type=['txt', 'docx', 'pdf'],
               key="reference_uploader"
           )
           if reference_file:
               reference_memo = self._read_document(reference_file)
               if reference_memo:
                   st.success("Reference document loaded!")
      
       # Company URL input
       company_url = st.text_input(
           "Company Website URL",
           placeholder="https://company.com"
       )
      
       # Only keep the length slider
       length = st.slider(
           "Target Blog Length (words)",
           min_value=500,
           max_value=2000,
           value=1000,
           step=100
       )
      
       return {
           "memo": memo,
           "reference_memo": reference_memo,
           "company_url": company_url,
           "length": length
       }

We’re allowing document upload so we need a function to help us read this:

Python
def _read_document(self, uploaded_file) -> Optional[str]:
       """Extract text from uploaded document."""
       if uploaded_file is None:
           return None
          
       try:
           file_extension = uploaded_file.name.split('.')[-1].lower()
          
           if file_extension == 'txt':
               return uploaded_file.getvalue().decode('utf-8')
              
           elif file_extension == 'docx':
               doc = docx.Document(io.BytesIO(uploaded_file.getvalue()))
               return '\n'.join([paragraph.text for paragraph in doc.paragraphs])
              
           elif file_extension == 'pdf':
               pdf_reader = PdfReader(io.BytesIO(uploaded_file.getvalue()))
               text = ''
               for page in pdf_reader.pages:
                   text += page.extract_text() + '\n'
               return text
              
           else:
               st.error(f"Unsupported file format: {file_extension}")
               return None
              
       except Exception as e:
           st.error(f"Error reading document: {str(e)}")
           return None

Sanitizing Sensitive Information

The first thing we want to do when our interface accepts a new investment memo is sanitize it and remove sensitive information.

If you were building this as an agent, you would skip the interface and trigger the agent here.

Instead of using a rule-based approach, we can just ask an LLM (Claude in this case) to read through it, remove sensitive information, and return a clean version.

You can use any other LLM. It’s probably more optimal to use a smaller and cheaper model like GPT 3.5 turbo, but I just prefer Claude when it comes to working with content.

You’ll find this code in sanitize.py:

Python
import anthropic
import streamlit as st


class MemoSanitizer:
   def __init__(self, api_key: str):
       """Initialize the Claude client."""
       self.client = anthropic.Anthropic(api_key=api_key)
  
   async def sanitize(self, memo_text: str) -> str:
       """Use Claude to identify and remove sensitive information."""
       try:
           message = self.client.messages.create(
               model="claude-3-5-sonnet-20241022",
               max_tokens=8192,
               temperature=0,
               system="You are an expert at identifying sensitive information in VC investment memos. Your task is to identify and remove sensitive information while preserving the key insights and analysis.",
               messages=[
                   {
                       "role": "user",
                       "content": [
                           {
                               "type": "text",
                               "text": f"""Please analyze this investment memo and create a version with all sensitive information removed.
                               Sensitive information includes but is not limited to:
                               - Specific financial metrics (revenue, growth rates, burn rate, etc.)
                               - Valuation details and cap table information
                               - Customer names and specific deal values
                               - Internal strategic discussions
                               - Detailed technical information not public
                               - Specific product roadmap details
                              
                               Memo:
                               {memo_text}
                              
                               Return ONLY the sanitized version, with no explanation or additional text."""
                           }
                       ]
                   }
               ]
           )
          
           return message.content
          
       except Exception as e:
           st.error(f"Error sanitizing memo: {str(e)}")
           return ""

Gathering Public Information

You’ll notice our interface accepts a URL for the startup you’re investing in. We use Firecrawl to scrape the company’s public website and get more information about it to add to our marketing post.

If your investment memo already contains a lot of information about the company, you may not even need this.

All this code goes in the fetcher.py file:

Python
from firecrawl import FirecrawlApp
import streamlit as st


class StartupInfoFetcher:
   def __init__(self, api_key: str):
       """Initialize the Firecrawl client."""
       self.client = FirecrawlApp(api_key=api_key)
  
   async def fetch_startup_info(self, company_url: str) -> str:
       """Fetch website content using Firecrawl.
      
       Args:
           company_url (str): URL of the company website
          
       Returns:
           str: Website content in markdown format
       """
       try:
           response = self.client.scrape_url(
               url=company_url,
               params={
                   'formats': ['markdown']
               }
           )
          
           # Return the markdown content
           return response.get('markdown', '')
          
       except Exception as e:
           st.error(f"Error fetching website content: {str(e)}")
           return ""

Generating the Blog Post

Ok, now we have all the pieces we need to generate our final blog post. We got public information about the company from the fetcher, a clean investment memo from the sanitizer, and a reference memo from the interface.

Using another Claude instance, we can generate the final blog post, using the reference memo to match your writing style.

As you can see in the generator.py file, most of the code is really just a well-crafted prompt:

Python
import anthropic
import streamlit as st


class BlogPostGenerator:
   def __init__(self, api_key: str):
       """Initialize the Claude client."""
       self.client = anthropic.Anthropic(api_key=api_key)
  
   async def generate_post(
       self,
       clean_memo: str,
       public_info: str,
       reference_memo: str,
       target_length: int
   ) -> str:
       """Generate a polished blog post using Claude."""
       try:
           message = self.client.messages.create(
               model="claude-3-5-sonnet-20241022",
               max_tokens=8192,
               temperature=0.7,
               system="You are an expert at writing compelling VC investment blog posts that share insights while maintaining confidentiality.",
               messages=[
                   {
                       "role": "user",
                       "content": [
                           {
                               "type": "text",
                               "text": f"""Create a compelling blog post about an investment, using the following information and guidelines:


                               Clean Investment Memo:
                               {clean_memo}


                               Public Information from Company Website:
                               {public_info}


                               Reference Memo (for tone/style):
                               {reference_memo}


                               Guidelines:
                               - Match the tone/style of the reference memo
                               - Target length: {target_length} words
                               - Focus on market insights and investment thesis
                               - Only include public information or high-level insights
                               - Structure the post with clear sections and engaging headlines
                              
                               Return ONLY the blog post, with no explanation or additional text."""
                           }
                       ]
                   }
               ]
           )
          
           return message.content
          
       except Exception as e:
           st.error(f"Error generating blog post: {str(e)}")
           return ""

Display Blog Post and Allow Document Export

Back in our interface.py file, we want our blog post to display in the interface. Again, if you were building this as an autonomous agent, you can skip this and directly publish to your website.

Here’s the function:

Python
def display_blog_post(self, blog_post):
       """Display the generated blog post with download options."""
       # Extract plain text content
       text_content = self._get_text_content(blog_post)
      
       # Display tabs for different views
       view_tab, download_tab = st.tabs(["View Blog Post", "Download Options"])
      
       with view_tab:
           # Display the markdown content
           st.markdown(text_content)
      
       with download_tab:
           # Create download buttons for different formats
           col1, col2 = st.columns(2)
          
           with col1:
               docx_bytes = self._create_downloadable_document(blog_post, 'docx')
               if docx_bytes:
                   st.download_button(
                       label="Download as DOCX",
                       data=docx_bytes,
                       file_name="blog_post.docx",
                       mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
                   )
          
           with col2:
               txt_bytes = self._create_downloadable_document(blog_post, 'txt')
               if txt_bytes:
                   st.download_button(
                       label="Download as TXT",
                       data=txt_bytes,
                       file_name="blog_post.txt",
                       mime="text/plain"
                   )

If you want, you can also add options to export it as a document:

Python
def _create_downloadable_document(self, content, format: str) -> bytes:
       """Convert content to downloadable document format."""
       try:
           # Get plain text content
           text_content = self._get_text_content(content)
          
           if format == 'docx':
               doc = docx.Document()
               # Split content by newlines and add each paragraph
               for paragraph in text_content.split('\n'):
                   if paragraph.strip():  # Only add non-empty paragraphs
                       doc.add_paragraph(paragraph)
              
               # Save to bytes
               doc_bytes = io.BytesIO()
               doc.save(doc_bytes)
               doc_bytes.seek(0)
               return doc_bytes.getvalue()
              
           elif format == 'txt':
               return text_content.encode('utf-8')
              
       except Exception as e:
           st.error(f"Error creating {format} document: {str(e)}")
           return None

Tie It All Together

And that’s it!

Our main.py file ties it all together:

Python
import os
import asyncio
from dotenv import load_dotenv
import streamlit as st
from interface import MemoConverter
from fetcher import StartupInfoFetcher
from sanitizer import MemoSanitizer
from generator import BlogPostGenerator


class MemoToBlogConverter:
   def __init__(self):
       """Initialize the main application components."""
       load_dotenv()
      
       self.interface = MemoConverter()
       self.fetcher = StartupInfoFetcher(os.getenv("EXA_API_KEY"))
       self.sanitizer = MemoSanitizer(os.getenv("ANTHROPIC_API_KEY"))
       self.generator = BlogPostGenerator(os.getenv("ANTHROPIC_API_KEY"))
  
   async def process_memo(self):
       """Process the memo and generate a blog post."""
       input_data = self.interface.create_interface()
      
       if st.button("Generate Blog Post"):
           if not input_data["memo"]:
               st.error("Please provide an investment memo.")
               return
          
           if not input_data["company_url"]:
               st.error("Please provide the company website URL.")
               return
          
           with st.spinner("Processing your memo..."):
               # Execute the conversion pipeline
               public_info = await self.fetcher.fetch_startup_info(
                   input_data["company_url"]
               )
              
               clean_memo = await self.sanitizer.sanitize(input_data["memo"])
              
               blog_post = await self.generator.generate_post(
                   clean_memo,
                   public_info,
                   input_data["reference_memo"],
                   input_data["length"]
               )
              
               st.success("Blog post generated successfully!")
               self.interface.display_blog_post(blog_post)


if __name__ == "__main__":
   converter = MemoToBlogConverter()
   asyncio.run(converter.process_memo())

Running the Application

All the instructions to clone and run the code are on my GitHub. To use the application:

1. Set up your environment variables:

Python
ANTHROPIC_API_KEY=your_anthropic_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key

2. Run the Streamlit app:

Python
streamlit run src/main.py

Benefits and Results

This application provides several key benefits:

1. Time Savings: What used to take hours can now be done in minutes

2. Consistency: Generated posts maintain your writing style across publications

3. Safety: Reduced risk of accidentally sharing sensitive information

4. Flexibility: Support for various input formats and export options

5. Scalability: Easy to process multiple memos efficiently

Conclusion

Ok that was a lot to take in but you can simply clone my repo and run the code yourself. Just don’t forget to add your API keys!

The modular architecture makes it easy to enhance and customize the application as needs evolve. As I mentioned before, you can turn this into a fully autonomous agent.

If you need any help or advice, or you want to set up agents at your fund, book a call with me.