A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
🕷️ A lightweight Model Context Protocol (MCP) server that exposes Crawl4AI web scraping and crawling capabilities as to
🕷️ A lightweight Model Context Protocol (MCP) server that exposes Crawl4AI web scraping and crawling capabilities as tools for AI agents.
Similar to Firecrawl's API but self-hosted and free. Perfect for integrating web scraping into your AI workflows with OpenAI Agents SDK, Cursor, Claude Code, and other MCP-compatible tools.
scrape, crawl, crawl_site, crawl_sitemap via stdio MCP serverChoose between Docker (recommended) or manual installation:
Docker eliminates all setup complexity and provides a consistent environment:
# No setup required! Just pull and run the published image
docker pull uysalsadi/crawl4ai-mcp-server:latest
# Test it works
python test-config.py
# Use directly in MCP configurations (see examples below)
# Clone the repository
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
# Quick build and test (simplified)
docker build -f Dockerfile.simple -t crawl4ai-mcp .
echo '{"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {"protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {"name": "test", "version": "1.0"}}}' | docker run --rm -i crawl4ai-mcp
# Or use helper script (full build with Playwright)
./docker-run.sh build
./docker-run.sh test
./docker-run.sh run
Docker Quick Commands:
./docker-run.sh build - Build the image./docker-run.sh run - Run MCP server (stdio mode)./docker-run.sh test - Run smoke tests./docker-run.sh dev - Development mode with shell access# Clone and setup
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Install Playwright browsers
python -m playwright install chromium
# Test basic functionality
python -m crawler_agent.smoke_client
# Test adaptive crawling
python test_adaptive.py
# Set your OpenAI API key
export OPENAI_API_KEY="your-key-here"
# Docker: Run the example agent
docker-compose run --rm -e OPENAI_API_KEY crawl4ai-mcp python -m crawler_agent.agents_example
# Manual: Run the example agent
python -m crawler_agent.agents_example
The MCP server exposes 4 production-ready tools with content hiding features:
scrapeFetch a single URL and return markdown content.
Arguments:
url (required): The URL to scrapeoutput_dir (optional): If provided, persists content to disk and returns metadata onlycrawler: Optional Crawl4AI crawler config overridesbrowser: Optional Crawl4AI browser config overridesscript: Optional C4A-Script for page interactiontimeout_sec: Request timeout (default: 45s, max: 600s)Returns (without output_dir):
{
"url": "https://example.com",
"markdown": "# Page Title\n\nContent...",
"links": ["https://example.com/page1", "..."],
"metadata": {}
}
Returns (with output_dir):
{
"run_id": "scrape_20250815_122811_4fb188",
"file_path": "/path/to/output/pages/example.com_index.md",
"manifest_path": "/path/to/output/manifest.json",
"bytes_written": 230
}
crawlMulti-page breadth-first crawling with filtering and adaptive stopping.
Arguments:
seed_url (required): Starting URL for the crawlmax_depth: Maximum link depth to follow (default: 1, max: 4)max_pages: Maximum pages to crawl (default: 5, max: 100)same_domain_only: Stay within the same domain (default: true)include_patterns: Regex patterns URLs must matchexclude_patterns: Regex patterns to exclude URLsadaptive: Enable adaptive crawling (default: false)output_dir (optional): If provided, persists content to disk and returns metadata onlycrawler, browser, script, timeout_sec: Same as scrapeReturns (without output_dir):
{
"start_url": "https://example.com",
"pages": [
{
"url": "https://example.com/page1",
"markdown": "Content...",
"links": ["..."]
}
],
"total_pages": 3
}
Returns (with output_dir):
{
"run_id": "crawl_20250815_122828_464944",
"pages_ok": 3,
"pages_failed": 0,
"manifest_path": "/path/to/output/manifest.json",
"bytes_written": 690
}
crawl_siteComprehensive site crawling with persistence (always requires output_dir).
Arguments:
entry_url (required): Starting URL for site crawloutput_dir (required): Directory to persist resultsmax_depth: Maximum crawl depth (default: 2, max: 6)max_pages: Maximum pages to crawl (default: 200, max: 5000)Returns:
{
"run_id": "site_20250815_122851_0e2455",
"output_dir": "/path/to/output",
"manifest_path": "/path/to/output/manifest.json",
"pages_ok": 15,
"pages_failed": 2,
"bytes_written": 45672
}
crawl_sitemapSitemap-based crawling with persistence (always requires output_dir).
Arguments:
sitemap_url (required): URL to sitemap.xmloutput_dir (required): Directory to persist resultsmax_entries: Maximum sitemap entries to process (default: 1000)Returns:
{
"run_id": "sitemap_20250815_123006_667d71",
"output_dir": "/path/to/output",
"manifest_path": "/path/to/output/manifest.json",
"pages_ok": 25,
"pages_failed": 0,
"bytes_written": 123456
}
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
params = StdioServerParameters(
command="python",
args=["-m", "crawler_agent.mcp_server"]
)
async with stdio_client(params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Scrape a single page
result = await session.call_tool("scrape", {
"url": "https://example.com"
})
# Crawl with adaptive stopping
result = await session.call_tool("crawl", {
"seed_url": "https://docs.example.com",
"max_pages": 10,
"adaptive": True
})
from agents import Agent, Runner
from agents.mcp.server import MCPServerStdio
async with MCPServerStdio(
params={
"command": "python",
"args": ["-m", "crawler_agent.mcp_server"]
},
cache_tools_list=True
) as server:
agent = Agent(
name="Research Assistant",
instructions="Use scrape and crawl tools to research topics.",
mcp_servers=[server],
mcp_config={"convert_schemas_to_strict": True}
)
result = await Runner.run(
agent,
"Research the latest AI safety papers"
)
This project supports both Cursor and Claude Code with synchronized configuration:
.cursorrules for AI assistant behavior and project guidanceCLAUDE.md for comprehensive project context and memory bankmcp__crawl4ai-mcp__* tool calls~/.claude/claude_desktop_config.json or project config: .mcp.jsonBoth Cursor and Claude Code can use the Dockerized MCP server:
// .mcp.json (project-scoped configuration)
{
"mcpServers": {
"crawl4ai-mcp": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--volume", "./crawls:/app/crawls",
"uysalsadi/crawl4ai-mcp-server:latest"
],
"env": {
"CRAWL4AI_MCP_LOG": "INFO"
}
}
}
}
The Docker approach eliminates all manual setup and provides a consistent environment:
# 1. Clone and build
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
./docker-run.sh build
# 2. Test the installation
./docker-run.sh test
# 3. Run MCP server
./docker-run.sh run
| Command | Description |
|---|---|
./docker-run.sh build | Build the Docker image |
./docker-run.sh run | Run MCP server in stdio mode |
./docker-run.sh test | Run smoke tests |
./docker-run.sh dev | Development mode with shell access |
./docker-run.sh stop | Stop running containers |
./docker-run.sh clean | Remove containers and images |
./docker-run.sh logs | Show container logs |
The docker-compose.yml provides multiple service configurations:
crawl4ai-mcp: Production MCP servercrawl4ai-mcp-dev: Development container with shell accesscrawl4ai-mcp-test: Test runner for smoke testsAdd to ~/.claude/claude_desktop_config.json:
{
"mcpServers": {
"crawl4ai-mcp": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--volume", "/tmp/crawl4ai-crawls:/app/crawls",
"uysalsadi/crawl4ai-mcp-server:latest"
],
"env": {
"CRAWL4AI_MCP_LOG": "INFO"
}
}
}
}
✅ Copy this exact config - it uses the published Docker image!
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"crawl4ai-mcp": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--volume", "/tmp/crawl4ai-crawls:/app/crawls",
"uysalsadi/crawl4ai-mcp-server:latest"
],
"env": {
"CRAWL4AI_MCP_LOG": "INFO"
}
}
}
}
✅ This configuration is tested and working!
Add to your project's .mcp.json:
{
"mcpServers": {
"crawl4ai-mcp": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"--volume", "./crawls:/app/crawls",
"uysalsadi/crawl4ai-mcp-server:latest"
],
"env": {
"CRAWL4AI_MCP_LOG": "INFO"
}
}
}
}
✅ This project already includes this configuration - see .mcp.json
After setting up any MCP configuration:
Test the Docker image works:
python test-config.py
Restart your editor (Cursor/Claude Code) to reload MCP configuration
Verify tools are available:
crawl4ai-mcp in the MCP tools panelscrape, crawl, crawl_site, crawl_sitemapIf tools don't appear, check:
✅ Zero Setup: No need for Python venv, pip, or Playwright installation
✅ Consistent Environment: Same behavior across all platforms
✅ Isolated Dependencies: No conflicts with your system Python
✅ Easy Updates: docker pull to get latest version
✅ Portable: Works anywhere Docker runs
✅ Volume Persistence: Crawl outputs saved to host filesystem
Set these when running Docker containers:
# Using docker-compose
OPENAI_API_KEY=your-key docker-compose up crawl4ai-mcp
# Using docker run directly
docker run --rm -i \
-e OPENAI_API_KEY=your-key \
-e CRAWL4AI_MCP_LOG=DEBUG \
-v ./crawls:/app/crawls \
crawl4ai-mcp-server:latest
TARGET_URL: Default URL for smoke testing (default: https://modelcontextprotocol.io/docs)RESEARCH_TASK: Custom research task for agents exampleOPENAI_API_KEY: Required for OpenAI Agents SDKThe server blocks these URL patterns by default:
localhost, 127.0.0.1, ::1file:// schemes.local, .internal, .lan domainsWhen adaptive: true is set, the crawler uses a simple content-based stopping strategy:
Use regex patterns to control which URLs are crawled:
await session.call_tool("crawl", {
"seed_url": "https://docs.example.com",
"include_patterns": [r"/docs/", r"/api/"],
"exclude_patterns": [r"/old/", r"\.pdf$"]
})
Pass custom browser/crawler settings:
await session.call_tool("scrape", {
"url": "https://example.com",
"browser": {"headless": True, "viewport": {"width": 1280, "height": 720}},
"crawler": {"verbose": True}
})
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ AI Agent │───▶│ MCP Server │───▶│ Crawl4AI │
│ (Cursor/Agents) │ │ (stdio/stdio) │ │ (Playwright) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ Safety Guards │
│ (URL validation) │
└──────────────────┘
For maintainers who want to publish updates to the Docker registry:
# 1. Login to Docker Hub (one time setup)
./docker-push.sh login
# 2. Build, push, and test everything
./docker-push.sh all
# Or do steps individually:
./docker-push.sh build # Build and tag image
./docker-push.sh push # Push to Docker Hub
./docker-push.sh test # Test the published image
The image is published at: uysalsadi/crawl4ai-mcp-server
uysalsadi/crawl4ai-mcp-server:latestuysalsadi/crawl4ai-mcp-server:v1.0.0Users can pull and use the image without any local setup:
docker pull uysalsadi/crawl4ai-mcp-server:latest
crawler_agent/
├── __init__.py
├── mcp_server.py # Main MCP server
├── safety.py # URL safety validation
├── adaptive_strategy.py # Adaptive crawling logic
├── smoke_client.py # Basic testing client
└── agents_example.py # OpenAI Agents SDK example
# Test MCP server
python -m crawler_agent.smoke_client
# Test adaptive crawling
python test_adaptive.py
# Test with agents (requires OPENAI_API_KEY)
python -m crawler_agent.agents_example
.venv before development work.cursorrules for Cursor or CLAUDE.md for Claude Codepython -m crawler_agent.smoke_client for validationMIT License - see LICENSE file for details.
This project uses Crawl4AI by UncleCode for web scraping capabilities. Crawl4AI is an excellent open-source LLM-friendly web crawler and scraper.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
If this project helped you, please give it a star! It helps others discover the project.
Found a bug or have a feature request? Please open an issue.
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
MCP server integration for DaVinci Resolve Studio
mcp-language-server gives MCP enabled clients access semantic tools like get definition, references, rename, and diagnos