A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
AI-powered documentation RAG system with MCP server for Claude Code. Search and retrieve technical documentation on-dema
A lightweight, installable Python package that provides RAG (Retrieval Augmented Generation) access to technical documentation through an MCP (Model Context Protocol) server. This enables LLMs to search and retrieve relevant documentation on-demand.
# Install globally with pipx in editable mode (keeps dependencies isolated)
pipx install -e /opt/claude-ops/doc-rag
# Verify installation
docrag --help
# Optional: Install Playwright browsers (for scraping)
pipx runpip docrag install playwright
pipx run --spec docrag playwright install chromium
Note: The -e flag installs in "editable" mode, which means changes to the source code are immediately reflected without reinstalling.
# Clone or navigate to the project directory
cd /opt/claude-ops/doc-rag
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install in development mode
pip install -e ".[dev]"
# Install Playwright browsers (for scraping)
playwright install chromium
cd /opt/claude-ops/doc-rag
./update.sh
This script will:
cd /opt/claude-ops/doc-rag
make update
For editable installs (installed with -e):
cd /opt/claude-ops/doc-rag
git pull origin main
# No reinstall needed - changes are already active!
For regular installs (installed without -e):
cd /opt/claude-ops/doc-rag
git pull origin main
pipx uninstall docrag && pipx install -e .
# or for pip: pip install -e . --force-reinstall
# Check git status
cd /opt/claude-ops/doc-rag
git log -1 --oneline
# Test the installation
docrag --version
docrag --help
docrag init
This creates the configuration directory at ~/.docrag/ with the following structure:
~/.docrag/
├── config.json # Global configuration
├── collections/ # Documentation collections
└── vectordb/ # LanceDB storage
# Add documentation from a local directory
docrag add brightsign --source /path/to/brightsign/docs --description "BrightSign player documentation"
# Or add without source initially
docrag add venafi --description "Venafi TPP API documentation"
docrag list
# Search across all active collections
docrag search "how to initialize the player"
# Search a specific collection
docrag search "authentication methods" --collection venafi --limit 10
docrag serve
The server will listen on stdio for connections from Claude Code.
docrag initInitialize DocRAG configuration directory.
docrag add <name>Add a new documentation collection.
Options:
-s, --source PATH - Source directory containing documentation-d, --description TEXT - Description of the collectionExample:
docrag add qumu --source ~/docs/qumu --description "Qumu video platform docs"
docrag listList all documentation collections with their status.
docrag update <name> <source>Update an existing collection with new documents.
Example:
docrag update brightsign ~/docs/brightsign/updated
docrag remove <name>Remove a documentation collection (with confirmation).
docrag search <query>Search documentation from the CLI for testing.
Options:
-c, --collection TEXT - Specific collection to search-l, --limit INTEGER - Number of results (default: 5)Example:
docrag search "websocket connection" --collection brightsign
docrag serveStart the MCP server for Claude Code integration.
docrag scrape <url>Scrape documentation from websites.
Options:
-o, --output PATH - Output directory (required)--smart, --use-crawl4ai - Use AI-powered Crawl4AI scraper (recommended)--no-llm - Disable LLM extraction (faster, still better than basic)--llm-provider TEXT - LLM provider (default: openai/gpt-4o-mini)--playwright - Use Playwright for dynamic content (basic scraper)--max-pages INTEGER - Maximum pages to scrape (default: 1000)Examples:
# Basic scraping
docrag scrape https://docs.example.com --output ./docs
# Smart scraping with AI (recommended)
docrag scrape https://docs.example.com --output ./docs --smart
# Smart scraping without LLM (faster, no API key needed)
docrag scrape https://docs.example.com --output ./docs --smart --no-llm
# Limit pages
docrag scrape https://docs.example.com --output ./docs --max-pages 100
Smart Scraping Features:
To enable smart scraping:
# Install Crawl4AI
pipx inject docrag crawl4ai
# Optional: Set OpenAI API key for LLM-powered extraction
export OPENAI_API_KEY='your-key-here'
Add DocRAG to your Claude Code MCP configuration (~/.config/claude-code/mcp_settings.json or similar):
{
"mcpServers": {
"docrag": {
"command": "docrag",
"args": ["serve"],
"env": {}
}
}
}
If using the full path:
{
"mcpServers": {
"docrag": {
"command": "/home/claude-admin/.local/bin/docrag",
"args": ["serve"],
"env": {}
}
}
}
After adding the configuration, restart Claude Code to load the MCP server.
Once connected, Claude Code can use two tools:
search_docs: Search through indexed documentation collections
Query: "how to handle authentication in BrightSign"
Collection: (optional) "brightsign"
Limit: (optional) 5
list_collections: List all available documentation collections
Claude will automatically use these tools when working on projects that need documentation access.
config.py) - Manages configuration and collection metadataembeddings.py) - Generates embeddings using sentence-transformersvectordb.py) - LanceDB wrapper for vector storage and searchindexer.py) - Intelligent document chunking and indexingserver.py) - MCP server implementationcli.py) - Command-line interface~/.docrag/
├── config.json # Global configuration
│ └── {
│ "active_collections": ["brightsign", "venafi"],
│ "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
│ "chunk_size": 512,
│ "chunk_overlap": 50
│ }
├── collections/
│ ├── brightsign/
│ │ ├── metadata.json # Collection metadata
│ │ └── source_docs/ # Original documents
│ ├── venafi/
│ └── qumu/
└── vectordb/
└── lancedb/ # Vector storage (one table per collection)
Global configuration is stored in ~/.docrag/config.json:
{
"active_collections": ["brightsign", "venafi"],
"embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
"chunk_size": 512,
"chunk_overlap": 50
}
Collection metadata is stored in ~/.docrag/collections/<name>/metadata.json:
{
"name": "brightsign",
"source_type": "local",
"source_path": "/path/to/docs",
"created_at": "2025-10-28T10:00:00",
"updated_at": "2025-10-28T10:00:00",
"doc_count": 150,
"description": "BrightSign player documentation"
}
docrag/
├── docrag/
│ ├── __init__.py
│ ├── cli.py # CLI commands
│ ├── server.py # MCP server
│ ├── indexer.py # Document indexing
│ ├── vectordb.py # Vector database
│ ├── embeddings.py # Embeddings
│ ├── config.py # Configuration
│ └── scrapers/ # Web scrapers
│ ├── __init__.py
│ ├── base.py
│ └── generic.py
├── tests/
├── pyproject.toml
├── README.md
└── DOCRAG_MVP_BUILD_GUIDE.md
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Format with black
black docrag/
# Lint with ruff
ruff check docrag/
Run docrag init first to create the configuration directory.
Add a collection with docrag add <name> --source <path>.
The first time you run DocRAG, it will download the sentence-transformers model (~100MB). Ensure you have internet connectivity.
If using scrapers, run playwright install chromium.
MIT
Ryan - Built for homelab and Claude Code integration
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
mcp-language-server gives MCP enabled clients access semantic tools like get definition, references, rename, and diagnos
Browser automation using accessibility snapshots instead of screenshots
MCP server integration for DaVinci Resolve Studio