SurfSense: Your Personal AI Research Agent That Never Forgets a Webpage

If you’ve ever found yourself drowning in a sea of bookmarks, scattered notes, and random screenshots while trying to recall that one article or tutorial you swore you saved, SurfSense might just be the lifebuoy you’ve been waiting for. Positioned as a hybrid between Google’s NotebookLM and the AI-powered search assistant Perplexity, SurfSense aims to deliver a powerful, customizable research assistant that integrates seamlessly with your personal knowledge base — and it’s open source and self-hostable, to boot.

The dream here is simple: surf the web as usual, save any content you want (be it a social media chat, calendar invite, email, or that elusive recipe), and then ask your AI assistant—your personal SurfSense knowledge base—questions about what you’ve saved. No more "brain freeze" when trying to remember where you saw that snippet or who mentioned that crucial fact. As one Reddit community member put it, SurfSense acts like a knowledge graph brain for your entire web browsing history.

Unlike cloud-only solutions, SurfSense is designed to be self-hosted and privacy-conscious. It works flawlessly with local large language models (LLMs), including Ollama’s offerings, which means you don’t have to send your sensitive data to the cloud if you don’t want to. It supports over 150 LLMs and 6000+ embedding models, letting you tailor your AI research agent to your exact needs and budget constraints.

The backend tech stack reads like a developer’s dream: FastAPI for speedy API services, PostgreSQL enhanced with the pgvector extension for efficient vector similarity search, SQLAlchemy and Alembic for ORM and migrations, and LangChain for orchestrating LLM workflows. The frontend combines Next.js, React, TypeScript, and a slew of UI libraries like Tailwind CSS and Framer Motion, delivering a snappy and modern user experience.

The SurfSense browser extension, built on the Plasmo framework, is a key part of the experience. It lets you save snapshots of webpages—even those behind authentication walls—directly into your knowledge base. Unlike crude web scrapers, SurfSense reads directly from the DOM, ensuring accurate and clean data capture. You can dynamically bookmark content from any page, clear inactive history sessions, and batch-save your browsing history.

SurfSense is not a one-click cloud service; it’s a DIY research assistant for serious users who want control and customization. The recommended deployment method is via Docker and Docker Compose, which ensures consistency across environments. You’ll need to:

Install the PostgreSQL pgvector extension to handle vector search.
Obtain API keys, including a Google OAuth client ID/secret for authentication (SurfSense currently requires Google OAuth).
Get an Unstructured.io API key to parse uploaded documents.
Set up Firecrawl API keys for web crawling (with Playwright support coming soon).
Configure various environment variables to specify embedding models, reranker models, and LLMs routed through LiteLLM.

SurfSense is still marked as “actively being developed” and not quite production-ready, but it already offers a compelling toolkit that blends the best of NotebookLM’s private knowledge management and Perplexity’s AI-powered search. It stands out for its:

Privacy-first approach with local LLM compatibility.
Rich integrations across popular productivity platforms and content types.
Advanced AI research features like hierarchical RAG and hybrid search.
Open source community-driven development, with a Discord channel welcoming feedback and contributions.

If you’re a researcher, developer, or power user tired of losing track of your digital breadcrumbs, SurfSense is worth a look. Its GitHub repo is open for exploration and contributions, and with active development and community engagement, it promises to grow into a robust, privacy-respecting AI research companion.

Never forget what you see on the internet again. SurfSense might just be the personal AI librarian your brain has been begging for.