I was speaking to a VC last month and they mentioned their marketing strategy is to publish their investment memos as blog posts.
It’s really one of the best ways for a VC to grow their brand. Bessemer and other big funds do it often.
The only problem is it takes time. Smaller VCs don’t have the bandwidth or resources to turn their private investment memos into public ones.
The VC I spoke to said it took them hours to remove sensitive information from their investment memos, and rewrite them in a consistent style for the blog. So they weren’t able to do it consistently.
Well, AI can solve that. Today, we’ll build an AI agent that automates this process using Claude and modern web scraping tools.
The Solution
I built a fully autonomous agent for the VC that gets triggered when they create a new investment memo and automatically publishes a draft post to their blog. This entire workflow is invisible and sits inside their operations.
For the purposes of this blog post, I’m going to simplify it. Instead of automatically triggering it (since I don’t know what your operations look like), we’ll create an interface with Streamlit where you can manually trigger it.
Our interface will:
- Accept investment memos in various formats (text, PDF, DOCX)
- Use Claude to identify and remove sensitive information
- Scrape the company’s website for public information
- Generate a polished blog post matching your writing style
- Provide easy export options for the final content
Implementation
Setting Up the Project
This entire project with code is open source on my GitHub for you to clone and run locally. After that you can modify the code and customize it to your fund.
You can also take the core agent logic and have it triggered by your existing workflow and even publish to your website.
I’ve set up the files with this structure:
memo-converter/
├── requirements.txt
├── README.md
├── .env
└── src/
├── __init__.py
├── main.py
├── interface.py
├── fetcher.py
├── sanitizer.py
└── generator.py
The Interface file is the code for our Streamlit interface. You can upload your investment memo to it, enter the URL of the company you’re investing in, and also upload a blog post whose style you want to match (ideally one of your own).
Fetcher fetches more information about the company you’re investing in. Sanitizer cleans your investment memo. Generator creates the blog post.
At the end of it, the Interface display the final post.
Building the Interface
The interface.py file is where we’ll create our Streamlit interface. Streamlit is an open-source Python package that makes it easy to build visual interfaces for data and AI.
Our interface will provide a clean, intuitive way to input memos and reference content. As you can see in the code, we use tabs to organize different input methods and provide document preview functionality. We’re also going to add a little slider to control how long we want the final blog post to be.
def create_interface(self) -> Dict:
st.title("Investment Memo to Marketing Blog Post Converter")
# Create tabs for different input methods
tab1, tab2 = st.tabs(["Paste Memo", "Upload Document"])
# Initialize memo variable
memo = None
with tab1:
memo_text = st.text_area(
"Investment Memo",
height=300,
placeholder="Paste your investment memo here..."
)
if memo_text:
memo = memo_text
with tab2:
uploaded_file = st.file_uploader(
"Upload Memo Document",
type=['txt', 'docx', 'pdf']
)
if uploaded_file:
memo = self._read_document(uploaded_file)
if memo:
st.success("Document successfully loaded!")
with st.expander("Preview Document Content"):
st.text(memo[:500] + "...")
# Reference memo input
st.subheader("Reference Content")
reference_tab1, reference_tab2 = st.tabs(["Paste Reference", "Upload Reference"])
reference_memo = None
with reference_tab1:
reference_text = st.text_area(
"Paste a previous public memo for tone/style reference",
height=200,
placeholder="Paste a previous public memo here..."
)
if reference_text:
reference_memo = reference_text
with reference_tab2:
reference_file = st.file_uploader(
"Upload Reference Document",
type=['txt', 'docx', 'pdf'],
key="reference_uploader"
)
if reference_file:
reference_memo = self._read_document(reference_file)
if reference_memo:
st.success("Reference document loaded!")
# Company URL input
company_url = st.text_input(
"Company Website URL",
placeholder="https://company.com"
)
# Only keep the length slider
length = st.slider(
"Target Blog Length (words)",
min_value=500,
max_value=2000,
value=1000,
step=100
)
return {
"memo": memo,
"reference_memo": reference_memo,
"company_url": company_url,
"length": length
}
We’re allowing document upload so we need a function to help us read this:
def _read_document(self, uploaded_file) -> Optional[str]:
"""Extract text from uploaded document."""
if uploaded_file is None:
return None
try:
file_extension = uploaded_file.name.split('.')[-1].lower()
if file_extension == 'txt':
return uploaded_file.getvalue().decode('utf-8')
elif file_extension == 'docx':
doc = docx.Document(io.BytesIO(uploaded_file.getvalue()))
return '\n'.join([paragraph.text for paragraph in doc.paragraphs])
elif file_extension == 'pdf':
pdf_reader = PdfReader(io.BytesIO(uploaded_file.getvalue()))
text = ''
for page in pdf_reader.pages:
text += page.extract_text() + '\n'
return text
else:
st.error(f"Unsupported file format: {file_extension}")
return None
except Exception as e:
st.error(f"Error reading document: {str(e)}")
return None
Sanitizing Sensitive Information
The first thing we want to do when our interface accepts a new investment memo is sanitize it and remove sensitive information.
If you were building this as an agent, you would skip the interface and trigger the agent here.
Instead of using a rule-based approach, we can just ask an LLM (Claude in this case) to read through it, remove sensitive information, and return a clean version.
You can use any other LLM. It’s probably more optimal to use a smaller and cheaper model like GPT 3.5 turbo, but I just prefer Claude when it comes to working with content.
You’ll find this code in sanitize.py:
import anthropic
import streamlit as st
class MemoSanitizer:
def __init__(self, api_key: str):
"""Initialize the Claude client."""
self.client = anthropic.Anthropic(api_key=api_key)
async def sanitize(self, memo_text: str) -> str:
"""Use Claude to identify and remove sensitive information."""
try:
message = self.client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=8192,
temperature=0,
system="You are an expert at identifying sensitive information in VC investment memos. Your task is to identify and remove sensitive information while preserving the key insights and analysis.",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": f"""Please analyze this investment memo and create a version with all sensitive information removed.
Sensitive information includes but is not limited to:
- Specific financial metrics (revenue, growth rates, burn rate, etc.)
- Valuation details and cap table information
- Customer names and specific deal values
- Internal strategic discussions
- Detailed technical information not public
- Specific product roadmap details
Memo:
{memo_text}
Return ONLY the sanitized version, with no explanation or additional text."""
}
]
}
]
)
return message.content
except Exception as e:
st.error(f"Error sanitizing memo: {str(e)}")
return ""
Gathering Public Information
You’ll notice our interface accepts a URL for the startup you’re investing in. We use Firecrawl to scrape the company’s public website and get more information about it to add to our marketing post.
If your investment memo already contains a lot of information about the company, you may not even need this.
All this code goes in the fetcher.py file:
from firecrawl import FirecrawlApp
import streamlit as st
class StartupInfoFetcher:
def __init__(self, api_key: str):
"""Initialize the Firecrawl client."""
self.client = FirecrawlApp(api_key=api_key)
async def fetch_startup_info(self, company_url: str) -> str:
"""Fetch website content using Firecrawl.
Args:
company_url (str): URL of the company website
Returns:
str: Website content in markdown format
"""
try:
response = self.client.scrape_url(
url=company_url,
params={
'formats': ['markdown']
}
)
# Return the markdown content
return response.get('markdown', '')
except Exception as e:
st.error(f"Error fetching website content: {str(e)}")
return ""
Generating the Blog Post
Ok, now we have all the pieces we need to generate our final blog post. We got public information about the company from the fetcher, a clean investment memo from the sanitizer, and a reference memo from the interface.
Using another Claude instance, we can generate the final blog post, using the reference memo to match your writing style.
As you can see in the generator.py file, most of the code is really just a well-crafted prompt:
import anthropic
import streamlit as st
class BlogPostGenerator:
def __init__(self, api_key: str):
"""Initialize the Claude client."""
self.client = anthropic.Anthropic(api_key=api_key)
async def generate_post(
self,
clean_memo: str,
public_info: str,
reference_memo: str,
target_length: int
) -> str:
"""Generate a polished blog post using Claude."""
try:
message = self.client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=8192,
temperature=0.7,
system="You are an expert at writing compelling VC investment blog posts that share insights while maintaining confidentiality.",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": f"""Create a compelling blog post about an investment, using the following information and guidelines:
Clean Investment Memo:
{clean_memo}
Public Information from Company Website:
{public_info}
Reference Memo (for tone/style):
{reference_memo}
Guidelines:
- Match the tone/style of the reference memo
- Target length: {target_length} words
- Focus on market insights and investment thesis
- Only include public information or high-level insights
- Structure the post with clear sections and engaging headlines
Return ONLY the blog post, with no explanation or additional text."""
}
]
}
]
)
return message.content
except Exception as e:
st.error(f"Error generating blog post: {str(e)}")
return ""
Display Blog Post and Allow Document Export
Back in our interface.py file, we want our blog post to display in the interface. Again, if you were building this as an autonomous agent, you can skip this and directly publish to your website.
Here’s the function:
def display_blog_post(self, blog_post):
"""Display the generated blog post with download options."""
# Extract plain text content
text_content = self._get_text_content(blog_post)
# Display tabs for different views
view_tab, download_tab = st.tabs(["View Blog Post", "Download Options"])
with view_tab:
# Display the markdown content
st.markdown(text_content)
with download_tab:
# Create download buttons for different formats
col1, col2 = st.columns(2)
with col1:
docx_bytes = self._create_downloadable_document(blog_post, 'docx')
if docx_bytes:
st.download_button(
label="Download as DOCX",
data=docx_bytes,
file_name="blog_post.docx",
mime="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
)
with col2:
txt_bytes = self._create_downloadable_document(blog_post, 'txt')
if txt_bytes:
st.download_button(
label="Download as TXT",
data=txt_bytes,
file_name="blog_post.txt",
mime="text/plain"
)
If you want, you can also add options to export it as a document:
def _create_downloadable_document(self, content, format: str) -> bytes:
"""Convert content to downloadable document format."""
try:
# Get plain text content
text_content = self._get_text_content(content)
if format == 'docx':
doc = docx.Document()
# Split content by newlines and add each paragraph
for paragraph in text_content.split('\n'):
if paragraph.strip(): # Only add non-empty paragraphs
doc.add_paragraph(paragraph)
# Save to bytes
doc_bytes = io.BytesIO()
doc.save(doc_bytes)
doc_bytes.seek(0)
return doc_bytes.getvalue()
elif format == 'txt':
return text_content.encode('utf-8')
except Exception as e:
st.error(f"Error creating {format} document: {str(e)}")
return None
Tie It All Together
And that’s it!
Our main.py file ties it all together:
import os
import asyncio
from dotenv import load_dotenv
import streamlit as st
from interface import MemoConverter
from fetcher import StartupInfoFetcher
from sanitizer import MemoSanitizer
from generator import BlogPostGenerator
class MemoToBlogConverter:
def __init__(self):
"""Initialize the main application components."""
load_dotenv()
self.interface = MemoConverter()
self.fetcher = StartupInfoFetcher(os.getenv("EXA_API_KEY"))
self.sanitizer = MemoSanitizer(os.getenv("ANTHROPIC_API_KEY"))
self.generator = BlogPostGenerator(os.getenv("ANTHROPIC_API_KEY"))
async def process_memo(self):
"""Process the memo and generate a blog post."""
input_data = self.interface.create_interface()
if st.button("Generate Blog Post"):
if not input_data["memo"]:
st.error("Please provide an investment memo.")
return
if not input_data["company_url"]:
st.error("Please provide the company website URL.")
return
with st.spinner("Processing your memo..."):
# Execute the conversion pipeline
public_info = await self.fetcher.fetch_startup_info(
input_data["company_url"]
)
clean_memo = await self.sanitizer.sanitize(input_data["memo"])
blog_post = await self.generator.generate_post(
clean_memo,
public_info,
input_data["reference_memo"],
input_data["length"]
)
st.success("Blog post generated successfully!")
self.interface.display_blog_post(blog_post)
if __name__ == "__main__":
converter = MemoToBlogConverter()
asyncio.run(converter.process_memo())
Running the Application
All the instructions to clone and run the code are on my GitHub. To use the application:
1. Set up your environment variables:
ANTHROPIC_API_KEY=your_anthropic_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
2. Run the Streamlit app:
streamlit run src/main.py
Benefits and Results
This application provides several key benefits:
1. Time Savings: What used to take hours can now be done in minutes
2. Consistency: Generated posts maintain your writing style across publications
3. Safety: Reduced risk of accidentally sharing sensitive information
4. Flexibility: Support for various input formats and export options
5. Scalability: Easy to process multiple memos efficiently
Conclusion
Ok that was a lot to take in but you can simply clone my repo and run the code yourself. Just don’t forget to add your API keys!
The modular architecture makes it easy to enhance and customize the application as needs evolve. As I mentioned before, you can turn this into a fully autonomous agent.
If you need any help or advice, or you want to set up agents at your fund, book a call with me.