A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Turn your documents into agent memory. MCP server with hybrid semantic + keyword search over emails, PDFs, notes, and pa
Your second brain, wired for AI.
Textrawl is a personal knowledge server with persistent memory, searchable documents, and proactive insights. Import your emails, PDFs, and notes — then ask questions, recall past conversations, and discover connections you missed. Access it all through MCP, the web dashboard, or the REST API.
┌──────────────┐ ┌───────────────────────────────────────────┐
│ │ │ Your Second Brain │
│ MCP Client │◄───────►│ │
│ (Claude, │ MCP │ Documents Memory Conversations │
│ ChatGPT) │ │ ┌────────┐ ┌───────┐ ┌────────────┐ │
└──────────────┘ │ │ Emails │ │ Facts │ │Past sessions│ │
│ │ PDFs │ │People │ │ Summaries │ │
┌──────────────┐ │ │ Notes │ │ Links │ │ Context │ │
│ Dashboard │◄───────►│ └───┬────┘ └──┬────┘ └─────┬──────┘ │
│ (Web UI) │ REST │ └──────────┼──────────────┘ │
└──────────────┘ │ ▼ │
│ ┌───────────────────────────┐ │
│ │ Hybrid Search + Fusion │ │
│ └───────────────────────────┘ │
│ │ │
│ ▼ │
│ Insights · Daily Briefing │
└───────────────────────────────────────────┘
▲
│
┌──────────┴──────────┐
│ │
Desktop App CLI Tools
(drag & drop) (batch import)
Beyond keyword search. Most search tools only match exact words. Textrawl combines semantic understanding (finds "automobile" when you search "car") with traditional keyword matching — so you get relevant results without missing exact phrases.
Your data, your choice. Use OpenAI's embeddings for best accuracy, Google AI for multimodal support, or run locally with Ollama and local Postgres to keep document text and embeddings on your machine.
Import everything. Emails from Gmail exports, PDFs from your research, saved web pages, images, audio files, Google Takeout archives — Textrawl converts them into searchable knowledge where the relevant converter/provider is configured.
| Feature | Description |
|---|---|
| Hybrid Search | Vector similarity + full-text search with Reciprocal Rank Fusion |
| Persistent Memory | Remember facts about people, projects, and concepts across sessions |
| Conversation Recall | Save and query past conversation context across sessions |
| Proactive Insights | Automatically discover connections, patterns, and outliers in your knowledge |
| Daily Briefing | Summary of recent additions, new insights, and resurfaced knowledge |
| Unified RAG | ask tool searches documents, memory, and conversations in one query |
| Web Dashboard | Command center with knowledge explorer, timeline, agent orchestration, and applets |
| Multimodal | Process images (Claude vision) and audio (Whisper transcription) alongside documents |
| Desktop App | Drag-and-drop file conversion and upload (macOS, Windows, Linux) |
| Multi-Format | PDF, DOCX, XLSX, PPTX, HTML, MBOX/EML emails, Google Takeout |
| MCP + REST + WebSocket | MCP tools, REST API, and real-time WebSocket events |
| Agent Discovery | A2A protocol at /.well-known/agent.json for agent-to-agent interaction |
| Flexible Embeddings | OpenAI, Google AI, or Ollama (free, local) |
| Smart Chunking | Paragraph-aware splitting with overlap for context |
| CLI Tools | Batch processing for large archives |
| Cloud Ready | Deploy to Docker, Cloud Run, or any container platform; large uploads require GCS/Cloud Tasks configuration |
Textrawl is self-hosted, but data leaves your machine when you configure cloud services. Document text, chunks, embeddings, extracted memories, conversation summaries, images, or audio may be sent to providers such as OpenAI/Google embeddings, Anthropic/OpenAI/Google extraction, Neon/Supabase/RDS, Cloud Run, or GCS. For sensitive data, prefer Ollama/local Postgres and disable cloud LLM extraction/insights.
git clone https://github.com/jeffgreendesign/textrawl.git
cd textrawl
pnpm install
pnpm setup # Interactive setup for credentials
pnpm dev # Start the server
Create a free project at neon.tech
Copy the pooled connection string from the Neon dashboard into DATABASE_URL in .env
Run the base schema against your database:
psql $DATABASE_URL -f scripts/setup-db.sql
(Use setup-db-ollama.sql for Ollama / setup-db-ollama-v2.sql for nomic-embed-text-v2-moe / setup-db-google.sql for Google AI)
(Optional) For memory tools: psql $DATABASE_URL -f scripts/setup-db-memory.sql
(Optional) For conversation tools: psql $DATABASE_URL -f scripts/setup-db-conversation.sql
(Optional) For Row Level Security hardening: psql $DATABASE_URL -f scripts/security-rls.sql
Add to your Claude config (~/Library/Application Support/Claude/claude_desktop_config.json). Create this file if it doesn't exist:
{
"mcpServers": {
"textrawl": {
"command": "npx",
"args": [
"mcp-remote",
"http://localhost:3000/mcp",
"--header",
"Accept: application/json, text/event-stream"
]
}
}
}
Note: Requires Node.js 22+. If using nvm, ensure your default is set: nvm alias default 22
If you've set API_BEARER_TOKEN in .env, add the auth header:
"--header",
"Authorization: Bearer <your-token>"
Restart Claude Desktop - you'll now see Textrawl's tools available.
ChatGPT Desktop supports MCP servers natively (Pro/Plus required):
http://localhost:3000/mcpAuthorization: Bearer <your-token> headerSee OpenAI MCP documentation for details.
Imported documents, extracted memories, and conversation summaries are stored in your configured database/storage until deleted. Treat a Textrawl server as single-tenant unless you have added your own user isolation. Set API_BEARER_TOKEN, restrict CORS with ALLOWED_ORIGINS, and avoid importing third-party or private data without consent. Use forget_entity and delete_conversation to remove memory/conversation data, and list_documents/update_document to audit imported documents.
Option A: Desktop App (easiest)
pnpm desktop:dev
Drag files onto the window to convert and upload.
Option B: CLI (for batch imports)
pnpm convert -- mbox ~/Mail/archive.mbox
pnpm upload -- ./converted/
| Guide | Description |
|---|---|
| Database Sizing | Vector dimensions, index counts, and storage estimates by embedding provider |
| CLI Tools | Batch conversion and upload from command line |
| Security | Row Level Security and access controls |
| Variable | Required | Description |
|---|---|---|
DATABASE_URL | Yes | Neon (or any PostgreSQL) pooled connection string |
DATABASE_URL_UNPOOLED | No | Direct connection for schema migrations (optional) |
EMBEDDING_PROVIDER | No | openai (default), ollama, or google |
OPENAI_API_KEY | If OpenAI | For text-embedding-3-small (1536d) |
OLLAMA_BASE_URL | If Ollama | Default: http://localhost:11434 |
OLLAMA_MODEL | If Ollama | Default: nomic-embed-text |
GOOGLE_AI_API_KEY | If Google | For gemini-embedding-2-preview (3072d) |
GOOGLE_EMBEDDING_MODEL | If Google | Default: gemini-embedding-2-preview |
API_BEARER_TOKEN | Prod only | Min 32 chars (openssl rand -hex 32) |
PORT | No | Default: 3000 |
LOG_LEVEL | No | debug, info, warn, error |
ALLOWED_ORIGINS | No | Comma-separated CORS origins |
ENABLE_MEMORY | No | Enable memory tools (default: true); requires setup-db-memory.sql |
ENABLE_CONVERSATIONS | No | Enable conversation memory tools (default: true); requires setup-db-conversation.sql |
ENABLE_INSIGHTS | No | Enable proactive insight tools (default: true) |
ENABLE_MEMORY_EXTRACTION | No | Enable LLM-based memory extraction (default: false) |
ANTHROPIC_API_KEY | If extraction | Required for extract_memories tool |
EXTRACTION_MODEL | No | Model for extraction (default: claude-haiku-4-5-20251001) |
INSIGHT_MODEL | No | Model for insight synthesis (default: claude-sonnet-4-6) |
COMPACT_RESPONSES | No | Token-efficient responses (default: true) |
CHUNKING_MODE | No | fixed (default) or semantic (embedding-based splits) |
SEMANTIC_SIMILARITY_THRESHOLD | No | Semantic split sensitivity 0–1 (default: 0.5) |
REDIS_URL | No | Redis URL for shared rate limiting across instances |
GOOGLE_CLIENT_ID | OAuth | OAuth 2.0 client ID (all four OAuth vars required together) |
GOOGLE_CLIENT_SECRET | OAuth | OAuth 2.0 client secret |
OAUTH_JWT_SECRET | OAuth | Min 32-char secret for JWT signing |
OAUTH_ALLOWED_EMAILS | OAuth | Comma-separated email allowlist (optional) |
OAUTH_SERVER_URL | OAuth | Public server URL for OAuth redirect |
PG_REPORT_DIR | No | pg_analyze report directory (default: ./reports/pg-analysis) |
Textrawl advertises one of three tool surfaces via the MCP_TOOLSET environment
variable. The default normal surface is a small set of workflow tools — the
recommended interface for personal/family assistants. This follows the current MCP
maintainer guidance (Anthropic, Writing effective tools for AI agents): fewer
tools, consolidated by workflow into distinct, typed, well-named tools — not a
single "intent" dispatcher (which would discard per-tool schemas, annotations, and
the name signals models use to pick tools). The MCP spec defines no standard
tool-filtering primitive, so MCP_TOOLSET is a server-local convention; host/harness
lazy tool loading remains the canonical way to surface fewer tools at a time.
MCP_TOOLSET | Tools advertised |
|---|---|
normal (default) | Workflow tools: ask, search, get_document, capture, daily_briefing, timeline (+ remember when ENABLE_MEMORY=true). Set EXPOSE_ADMIN_TOOLS=true to also expose read-only diagnostics (health_check, get_stats, insight + Postgres tools). |
full | Workflow tools + diagnostics + all original granular tools (backward compatible). |
legacy | Exactly the original tool set (no workflow tools). |
Recommended exposure: personal/family bots → MCP_TOOLSET=normal and
EXPOSE_ADMIN_TOOLS=false (so destructive forget_entity/delete_conversation are
never reachable); admin/dev contexts → MCP_TOOLSET=full.
Model guidance (GPT-5.5, Claude Sonnet, Claude Haiku): express intent through the
workflow tool that matches the job — ask to answer a question across sources,
search for raw matches, capture to save a note or URL, remember to store facts,
timeline/daily_briefing for temporal/briefing views. Use audience to scope who
an answer is for; family_shared/public_safe automatically exclude private memory,
conversations, and insights.
Read-only tools (search, get_document, list_documents, query_memory, query_conversations, get_stats, health_check) include outputSchema and return structuredContent for programmatic consumption alongside the text content response.
normal surface)| Tool | Description |
|---|---|
capture | Save content to the knowledge base — kind="note" (title+content) or kind="url" (fetch + clip). Consolidates add_note + save_url. |
remember | Write structured knowledge to the memory graph — facts and/or relations. Consolidates remember_fact + build_knowledge + relate_entities. |
ask, search, get_document, daily_briefing, and timeline (documented below)
complete the workflow surface. remember is only advertised when ENABLE_MEMORY=true
(omitted otherwise). ask/search accept audience + allowCrossProfile for
privacy scoping.
| Tool | Description |
|---|---|
search | Hybrid semantic + full-text search. Set includeMemories/includeConversations for cross-source fusion. audience scopes private sources. |
get_document | Retrieve document by ID |
list_documents | List with pagination and filtering |
update_document | Update title and/or tags |
add_note | Add markdown note to knowledge base |
Enable with ENABLE_MEMORY=true (default). Requires scripts/setup-db-memory.sql or setup-db-memory-ollama.sql.
| Tool | Description |
|---|---|
remember_fact | Store facts about entities (people, projects, concepts) |
build_knowledge | Store multiple facts and relations in a single batch call |
query_memory | Query the memory graph (mode: 'search' | 'entity' | 'list') |
relate_entities | Create relationships between entities |
forget_entity | Delete an entity and all its memories |
extract_memories | Extract entities and facts from text using LLM |
Enable with ENABLE_CONVERSATIONS=true (default). Requires running one of the conversation schema scripts:
scripts/setup-db-conversation.sql (OpenAI embeddings, 1536d)scripts/setup-db-conversation-ollama.sql (Ollama v1 - nomic-embed-text, 1024d)scripts/setup-db-conversation-ollama-v2.sql (Ollama v2 - nomic-embed-text-v2-moe, 768d)scripts/setup-db-conversation-google.sql (Google AI - gemini-embedding-2-preview, 3072d)| Tool | Description |
|---|---|
save_conversation_context | Save conversation summary and turns for recall |
query_conversations | Query past conversations (mode: 'search' | 'get' | 'list') |
delete_conversation | Delete a conversation session |
Enable with ENABLE_INSIGHTS=true (default).
| Tool | Description |
|---|---|
get_insights | View discovered cross-source connections and patterns |
discover_connections | Trigger an insight scan across the knowledge base |
dismiss_insight | Dismiss an insight from the queue |
| Tool | Description |
|---|---|
get_stats | Statistics across all features (scope: 'all' | 'knowledge' | 'memory' | 'conversations' | 'insights') |
health_check | Check the health of all subsystems — returns pass/fail per component with overall status (verbose for latency and counts) |
| Tool | Description |
|---|---|
ask | Unified RAG search across all knowledge sources (documents, memory, conversations, insights) |
daily_briefing | Generate a daily briefing with recent additions, new insights, and resurfaced knowledge |
save_url | Fetch a URL, convert HTML to markdown, extract metadata, and save as a document |
timeline | Browse knowledge chronologically within a date range, optionally filtered by topic |
Enabled when DATABASE_URL is configured. Connects directly to Postgres.
| Tool | Description |
|---|---|
pg_analyze | Run comprehensive Postgres health analysis with table stats, index health, vacuum status, bloat estimates, and Textrawl-specific checks |
pg_recommendations | Get actionable optimization recommendations filtered by severity |
pg_report_history | View past analysis reports and compare trends over time |
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | required | Search query |
limit | number | 10 | Max results (1-50) |
fullTextWeight | number | 1.0 | Keyword weight (0-2) |
semanticWeight | number | 1.0 | Semantic weight (0-2) |
minScore | number | 0 | Min relevance threshold (0-1) |
tags | string[] | - | Filter by tags (AND logic) |
sourceType | string | - | note, file, or url |
curl -X POST http://localhost:3000/api/upload \
-H "Authorization: Bearer <your-token>" \
-F "file=@document.pdf" \
-F "title=Optional Title" \
-F "tags=tag1,tag2"
Limits: 10MB max file size, 10 uploads/min
Formats: .pdf, .docx, .txt, .md
Response:
{
"success": true,
"documentId": "uuid",
"title": "Document Title",
"tags": ["tag1", "tag2"],
"chunksCreated": 12
}
GET /health - Basic healthGET /health/ready - Readiness probe (checks DB)GET /health/live - Liveness probedocker-compose up -d
docker-compose logs -f
# Create secrets in Secret Manager first
export GCP_PROJECT_ID=your-project-id
./scripts/deploy.sh
pnpm dev # Watch mode
pnpm build # Production build
pnpm start # Run production
pnpm typecheck # Type check
pnpm lint # Biome lint check
pnpm quality # Lint + typecheck combined
pnpm inspector # MCP Inspector
pnpm setup # Generate .env with secure token
pnpm desktop:dev # Run desktop app
pnpm docs:dev # Run docs site
Run PostgreSQL + pgvector locally:
# Start local Postgres with pgvector
docker-compose -f docker-compose.local.yml up -d
# Initialize the database schema
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db.sql
# Optional: Start pgAdmin at http://localhost:5050
docker-compose -f docker-compose.local.yml --profile tools up -d
Run embeddings locally with Ollama instead of OpenAI:
# Start Postgres + Ollama
docker-compose -f docker-compose.local.yml --profile ollama up -d
# Pull the embedding model (~274MB)
docker exec textrawl-ollama ollama pull nomic-embed-text
# Use the Ollama-specific schema (1024 dimensions)
docker exec -i textrawl-postgres psql -U postgres -d textrawl < scripts/setup-db-ollama.sql
Set in .env:
EMBEDDING_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text
Supported Ollama models: nomic-embed-text (1024d), nomic-embed-text-v2-moe (768d, recommended for new installs), mxbai-embed-large (1024d)
Note: Each provider uses different embedding dimensions: OpenAI 1536d, Ollama 1024d (or 768d for v2-moe), Google AI 3072d. Use the matching schema:
setup-db.sql(OpenAI),setup-db-ollama.sql(Ollama 1024d),setup-db-ollama-v2.sql(Ollama 768d), orsetup-db-google.sql(Google AI). You cannot mix providers without re-embedding all documents.
| Issue | Solution |
|---|---|
| Can't connect to database | Check DATABASE_URL is set to your Neon pooled connection string |
| No search results | Check chunks table has embeddings; lower minScore |
| MCP tools not in Claude | Restart Claude Desktop; check curl http://localhost:3000/health |
| Rate limit exceeded | API: 100/min, Upload: 10/min |
| CodeQL Analyze job fails with SARIF/default setup error | This repo uses advanced CodeQL workflow (.github/workflows/codeql.yml); keep GitHub Code Scanning Default Setup disabled in repo settings |
Contributions welcome! See CONTRIBUTING.md for guidelines.
MIT - see LICENSE
MCP server integration for DaVinci Resolve Studio
mcp-language-server gives MCP enabled clients access semantic tools like get definition, references, rename, and diagnos
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots