Crawl4AI MCP Server

🕷️ A lightweight Model Context Protocol (MCP) server that exposes Crawl4AI web scraping and crawling capabilities as tools for AI agents.

Similar to Firecrawl's API but self-hosted and free. Perfect for integrating web scraping into your AI workflows with OpenAI Agents SDK, Cursor, Claude Code, and other MCP-compatible tools.

Features

🔧 MCP Tools: Exposes 4 powerful tools: scrape, crawl, crawl_site, crawl_sitemap via stdio MCP server
🌐 Web Scraping: Single-page scraping with markdown extraction
🕷️ Web Crawling: Multi-page breadth-first crawling with depth control
🧠 Adaptive Crawling: Smart crawling that stops when sufficient content is gathered
🛡️ Safety: Blocks internal networks, localhost, and private IPs
📱 Agent Ready: Works with OpenAI Agents SDK, Cursor, and Claude Code
⚡ Fast: Powered by Playwright and Crawl4AI's optimized extraction

🚀 Quick Start

Choose between Docker (recommended) or manual installation:

Option A: Docker (Recommended) 🐳

Docker eliminates all setup complexity and provides a consistent environment:

Option A1: Use Pre-built Image (Fastest) ⚡

hljs language-bash

# No setup required! Just pull and run the published image
docker pull uysalsadi/crawl4ai-mcp-server:latest

# Test it works
python test-config.py

# Use directly in MCP configurations (see examples below)

Option A2: Build Yourself

hljs language-bash

# Clone the repository
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server

# Quick build and test (simplified)
docker build -f Dockerfile.simple -t crawl4ai-mcp .
echo '{"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {"protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {"name": "test", "version": "1.0"}}}' | docker run --rm -i crawl4ai-mcp

# Or use helper script (full build with Playwright)
./docker-run.sh build
./docker-run.sh test
./docker-run.sh run

Docker Quick Commands:

./docker-run.sh build - Build the image
./docker-run.sh run - Run MCP server (stdio mode)
./docker-run.sh test - Run smoke tests
./docker-run.sh dev - Development mode with shell access

Option B: Manual Installation

hljs language-bash

# Clone and setup
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Install Playwright browsers
python -m playwright install chromium

# Test basic functionality
python -m crawler_agent.smoke_client

# Test adaptive crawling
python test_adaptive.py

Use with OpenAI Agents SDK

hljs language-bash

# Set your OpenAI API key
export OPENAI_API_KEY="your-key-here"

# Docker: Run the example agent
docker-compose run --rm -e OPENAI_API_KEY crawl4ai-mcp python -m crawler_agent.agents_example

# Manual: Run the example agent
python -m crawler_agent.agents_example

🛠️ Tools Reference

The MCP server exposes 4 production-ready tools with content hiding features:

`scrape`

Fetch a single URL and return markdown content.

Arguments:

url (required): The URL to scrape
output_dir (optional): If provided, persists content to disk and returns metadata only
crawler: Optional Crawl4AI crawler config overrides
browser: Optional Crawl4AI browser config overrides
script: Optional C4A-Script for page interaction
timeout_sec: Request timeout (default: 45s, max: 600s)

Returns (without output_dir):

hljs language-json

{
  "url": "https://example.com",
  "markdown": "# Page Title\n\nContent...",
  "links": ["https://example.com/page1", "..."],
  "metadata": {}
}

Returns (with output_dir):

hljs language-json

{
  "run_id": "scrape_20250815_122811_4fb188",
  "file_path": "/path/to/output/pages/example.com_index.md",
  "manifest_path": "/path/to/output/manifest.json",
  "bytes_written": 230
}

`crawl`

Multi-page breadth-first crawling with filtering and adaptive stopping.

Arguments:

seed_url (required): Starting URL for the crawl
max_depth: Maximum link depth to follow (default: 1, max: 4)
max_pages: Maximum pages to crawl (default: 5, max: 100)
same_domain_only: Stay within the same domain (default: true)
include_patterns: Regex patterns URLs must match
exclude_patterns: Regex patterns to exclude URLs
adaptive: Enable adaptive crawling (default: false)
output_dir (optional): If provided, persists content to disk and returns metadata only
crawler, browser, script, timeout_sec: Same as scrape

Returns (without output_dir):

hljs language-json

{
  "start_url": "https://example.com",
  "pages": [
    {
      "url": "https://example.com/page1",
      "markdown": "Content...",
      "links": ["..."]
    }
  ],
  "total_pages": 3
}

Returns (with output_dir):

hljs language-json

{
  "run_id": "crawl_20250815_122828_464944",
  "pages_ok": 3,
  "pages_failed": 0,
  "manifest_path": "/path/to/output/manifest.json",
  "bytes_written": 690
}

`crawl_site`

Comprehensive site crawling with persistence (always requires output_dir).

Arguments:

entry_url (required): Starting URL for site crawl
output_dir (required): Directory to persist results
max_depth: Maximum crawl depth (default: 2, max: 6)
max_pages: Maximum pages to crawl (default: 200, max: 5000)
Additional config options for filtering and performance

Returns:

hljs language-json

{
  "run_id": "site_20250815_122851_0e2455",
  "output_dir": "/path/to/output",
  "manifest_path": "/path/to/output/manifest.json",
  "pages_ok": 15,
  "pages_failed": 2,
  "bytes_written": 45672
}

`crawl_sitemap`

Sitemap-based crawling with persistence (always requires output_dir).

Arguments:

sitemap_url (required): URL to sitemap.xml
output_dir (required): Directory to persist results
max_entries: Maximum sitemap entries to process (default: 1000)
Additional config options for filtering and performance

Returns:

hljs language-json

{
  "run_id": "sitemap_20250815_123006_667d71",
  "output_dir": "/path/to/output",
  "manifest_path": "/path/to/output/manifest.json", 
  "pages_ok": 25,
  "pages_failed": 0,
  "bytes_written": 123456
}

💡 Usage Examples

Standalone MCP Server

hljs language-python

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

params = StdioServerParameters(
    command="python",
    args=["-m", "crawler_agent.mcp_server"]
)

async with stdio_client(params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()
        
        # Scrape a single page
        result = await session.call_tool("scrape", {
            "url": "https://example.com"
        })
        
        # Crawl with adaptive stopping
        result = await session.call_tool("crawl", {
            "seed_url": "https://docs.example.com",
            "max_pages": 10,
            "adaptive": True
        })

OpenAI Agents SDK Integration

hljs language-python

from agents import Agent, Runner
from agents.mcp.server import MCPServerStdio

async with MCPServerStdio(
    params={
        "command": "python", 
        "args": ["-m", "crawler_agent.mcp_server"]
    },
    cache_tools_list=True
) as server:
    
    agent = Agent(
        name="Research Assistant",
        instructions="Use scrape and crawl tools to research topics.",
        mcp_servers=[server],
        mcp_config={"convert_schemas_to_strict": True}
    )
    
    result = await Runner.run(
        agent, 
        "Research the latest AI safety papers"
    )

Cursor/Claude Code Integration

This project supports both Cursor and Claude Code with synchronized configuration:

Cursor Setup

Uses .cursorrules for AI assistant behavior and project guidance
Automatic MCP tool detection and integration
Project-specific rules and workflow guidance

Claude Code Setup

Uses CLAUDE.md for comprehensive project context and memory bank
Native MCP integration via mcp__crawl4ai-mcp__* tool calls
Global config: ~/.claude/claude_desktop_config.json or project config: .mcp.json

Docker Integration for Both Environments

Both Cursor and Claude Code can use the Dockerized MCP server:

hljs language-json

// .mcp.json (project-scoped configuration)
{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--volume", "./crawls:/app/crawls",
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

Dual Environment Features

Synchronized Documentation: Changes to documentation are maintained across both environments
Shared MCP Configuration: Both use the same MCP server and tool schemas
Docker Compatibility: Consistent environment across both Cursor and Claude Code
Cross-Compatible: Projects work seamlessly whether using Cursor or Claude Code

🐳 Docker Usage

Quick Start with Docker

The Docker approach eliminates all manual setup and provides a consistent environment:

hljs language-bash

# 1. Clone and build
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
./docker-run.sh build

# 2. Test the installation
./docker-run.sh test

# 3. Run MCP server
./docker-run.sh run

Docker Commands Reference

Command	Description
`./docker-run.sh build`	Build the Docker image
`./docker-run.sh run`	Run MCP server in stdio mode
`./docker-run.sh test`	Run smoke tests
`./docker-run.sh dev`	Development mode with shell access
`./docker-run.sh stop`	Stop running containers
`./docker-run.sh clean`	Remove containers and images
`./docker-run.sh logs`	Show container logs

Docker Compose Services

The docker-compose.yml provides multiple service configurations:

crawl4ai-mcp: Production MCP server
crawl4ai-mcp-dev: Development container with shell access
crawl4ai-mcp-test: Test runner for smoke tests

MCP Integration with Docker

Global MCP Configuration (Claude Code)

Add to ~/.claude/claude_desktop_config.json:

hljs language-json

{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i", 
        "--volume", "/tmp/crawl4ai-crawls:/app/crawls",
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

✅ Copy this exact config - it uses the published Docker image!

Global MCP Configuration (Cursor)

Add to ~/.cursor/mcp.json:

hljs language-json

{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--volume", "/tmp/crawl4ai-crawls:/app/crawls", 
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

✅ This configuration is tested and working!

Project-Scoped Configuration

Add to your project's .mcp.json:

hljs language-json

{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--volume", "./crawls:/app/crawls",
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

✅ This project already includes this configuration - see .mcp.json

Configuration Validation

After setting up any MCP configuration:

Test the Docker image works:
hljs language-bash
```
python test-config.py
```
Restart your editor (Cursor/Claude Code) to reload MCP configuration
Verify tools are available:
- Look for crawl4ai-mcp in the MCP tools panel
- Should see 4 tools: scrape, crawl, crawl_site, crawl_sitemap

If tools don't appear, check:

Docker is running and image is accessible
MCP configuration file syntax is valid JSON
Editor has been restarted after config changes

Docker Advantages

✅ Zero Setup: No need for Python venv, pip, or Playwright installation
✅ Consistent Environment: Same behavior across all platforms
✅ Isolated Dependencies: No conflicts with your system Python
✅ Easy Updates: docker pull to get latest version
✅ Portable: Works anywhere Docker runs
✅ Volume Persistence: Crawl outputs saved to host filesystem

Environment Variables

Set these when running Docker containers:

hljs language-bash

# Using docker-compose
OPENAI_API_KEY=your-key docker-compose up crawl4ai-mcp

# Using docker run directly
docker run --rm -i \
  -e OPENAI_API_KEY=your-key \
  -e CRAWL4AI_MCP_LOG=DEBUG \
  -v ./crawls:/app/crawls \
  crawl4ai-mcp-server:latest

⚙️ Configuration

Environment Variables

TARGET_URL: Default URL for smoke testing (default: https://modelcontextprotocol.io/docs)
RESEARCH_TASK: Custom research task for agents example
OPENAI_API_KEY: Required for OpenAI Agents SDK

Safety Settings

The server blocks these URL patterns by default:

localhost, 127.0.0.1, ::1
Private IP ranges (RFC 1918)
file:// schemes
.local, .internal, .lan domains

🚀 Advanced Features

Adaptive Crawling

When adaptive: true is set, the crawler uses a simple content-based stopping strategy:

Stops when total content exceeds 5,000 characters
Prevents over-crawling for information gathering tasks
More sophisticated LLM-based strategies can be added

Custom Filtering

Use regex patterns to control which URLs are crawled:

hljs language-python

await session.call_tool("crawl", {
    "seed_url": "https://docs.example.com",
    "include_patterns": [r"/docs/", r"/api/"],
    "exclude_patterns": [r"/old/", r"\.pdf$"]
})

Browser Configuration

Pass custom browser/crawler settings:

hljs language-python

await session.call_tool("scrape", {
    "url": "https://example.com",
    "browser": {"headless": True, "viewport": {"width": 1280, "height": 720}},
    "crawler": {"verbose": True}
})

🏗️ Architecture

hljs language-scss

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   AI Agent      │───▶│   MCP Server     │───▶│   Crawl4AI      │
│ (Cursor/Agents) │    │ (stdio/stdio)    │    │ (Playwright)    │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │
                                ▼
                       ┌──────────────────┐
                       │  Safety Guards   │
                       │ (URL validation) │
                       └──────────────────┘

📦 Publishing to Docker Hub

For maintainers who want to publish updates to the Docker registry:

Publishing Process

hljs language-bash

# 1. Login to Docker Hub (one time setup)
./docker-push.sh login

# 2. Build, push, and test everything
./docker-push.sh all

# Or do steps individually:
./docker-push.sh build    # Build and tag image
./docker-push.sh push     # Push to Docker Hub
./docker-push.sh test     # Test the published image

Docker Hub Repository

The image is published at: uysalsadi/crawl4ai-mcp-server

Latest: uysalsadi/crawl4ai-mcp-server:latest
Versioned: uysalsadi/crawl4ai-mcp-server:v1.0.0

Usage Statistics

Users can pull and use the image without any local setup:

hljs language-bash

docker pull uysalsadi/crawl4ai-mcp-server:latest

🔧 Development

Project Structure

hljs language-graphql

crawler_agent/
├── __init__.py
├── mcp_server.py         # Main MCP server
├── safety.py            # URL safety validation
├── adaptive_strategy.py  # Adaptive crawling logic
├── smoke_client.py       # Basic testing client
└── agents_example.py     # OpenAI Agents SDK example

Testing

hljs language-bash

# Test MCP server
python -m crawler_agent.smoke_client

# Test adaptive crawling
python test_adaptive.py

# Test with agents (requires OPENAI_API_KEY)
python -m crawler_agent.agents_example

Contributing

Environment Setup: Always activate .venv before development work
Documentation: Follow the Documentation Memory Rule - synchronize changes across CLAUDE.md, .cursorrules, and README.md
Code Style: Follow .cursorrules for Cursor or CLAUDE.md for Claude Code
Testing: Use python -m crawler_agent.smoke_client for validation
Safety: Ensure security guards are maintained for all new features
Dual Compatibility: Verify changes work in both Cursor and Claude Code environments

📚 References

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

This project uses Crawl4AI by UncleCode for web scraping capabilities. Crawl4AI is an excellent open-source LLM-friendly web crawler and scraper.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

⭐ Support

If this project helped you, please give it a star! It helps others discover the project.

🐛 Issues

Found a bug or have a feature request? Please open an issue.

Crawl4AI MCP Server

🕷️ A lightweight Model Context Protocol (MCP) server that exposes Crawl4AI web scraping and crawling capabilities as tools for AI agents.

Similar to Firecrawl's API but self-hosted and free. Perfect for integrating web scraping into your AI workflows with OpenAI Agents SDK, Cursor, Claude Code, and other MCP-compatible tools.

Features

🔧 MCP Tools: Exposes 4 powerful tools: scrape, crawl, crawl_site, crawl_sitemap via stdio MCP server
🌐 Web Scraping: Single-page scraping with markdown extraction
🕷️ Web Crawling: Multi-page breadth-first crawling with depth control
🧠 Adaptive Crawling: Smart crawling that stops when sufficient content is gathered
🛡️ Safety: Blocks internal networks, localhost, and private IPs
📱 Agent Ready: Works with OpenAI Agents SDK, Cursor, and Claude Code
⚡ Fast: Powered by Playwright and Crawl4AI's optimized extraction

🚀 Quick Start

Choose between Docker (recommended) or manual installation:

Option A: Docker (Recommended) 🐳

Docker eliminates all setup complexity and provides a consistent environment:

Option A1: Use Pre-built Image (Fastest) ⚡

hljs language-bash

# No setup required! Just pull and run the published image
docker pull uysalsadi/crawl4ai-mcp-server:latest

# Test it works
python test-config.py

# Use directly in MCP configurations (see examples below)

Option A2: Build Yourself

hljs language-bash

# Clone the repository
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server

# Quick build and test (simplified)
docker build -f Dockerfile.simple -t crawl4ai-mcp .
echo '{"jsonrpc": "2.0", "id": 1, "method": "initialize", "params": {"protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {"name": "test", "version": "1.0"}}}' | docker run --rm -i crawl4ai-mcp

# Or use helper script (full build with Playwright)
./docker-run.sh build
./docker-run.sh test
./docker-run.sh run

Docker Quick Commands:

./docker-run.sh build - Build the image
./docker-run.sh run - Run MCP server (stdio mode)
./docker-run.sh test - Run smoke tests
./docker-run.sh dev - Development mode with shell access

Option B: Manual Installation

hljs language-bash

# Clone and setup
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

# Install Playwright browsers
python -m playwright install chromium

# Test basic functionality
python -m crawler_agent.smoke_client

# Test adaptive crawling
python test_adaptive.py

Use with OpenAI Agents SDK

hljs language-bash

# Set your OpenAI API key
export OPENAI_API_KEY="your-key-here"

# Docker: Run the example agent
docker-compose run --rm -e OPENAI_API_KEY crawl4ai-mcp python -m crawler_agent.agents_example

# Manual: Run the example agent
python -m crawler_agent.agents_example

🛠️ Tools Reference

The MCP server exposes 4 production-ready tools with content hiding features:

`scrape`

Fetch a single URL and return markdown content.

Arguments:

url (required): The URL to scrape
output_dir (optional): If provided, persists content to disk and returns metadata only
crawler: Optional Crawl4AI crawler config overrides
browser: Optional Crawl4AI browser config overrides
script: Optional C4A-Script for page interaction
timeout_sec: Request timeout (default: 45s, max: 600s)

Returns (without output_dir):

hljs language-json

{
  "url": "https://example.com",
  "markdown": "# Page Title\n\nContent...",
  "links": ["https://example.com/page1", "..."],
  "metadata": {}
}

Returns (with output_dir):

hljs language-json

{
  "run_id": "scrape_20250815_122811_4fb188",
  "file_path": "/path/to/output/pages/example.com_index.md",
  "manifest_path": "/path/to/output/manifest.json",
  "bytes_written": 230
}

`crawl`

Multi-page breadth-first crawling with filtering and adaptive stopping.

Arguments:

seed_url (required): Starting URL for the crawl
max_depth: Maximum link depth to follow (default: 1, max: 4)
max_pages: Maximum pages to crawl (default: 5, max: 100)
same_domain_only: Stay within the same domain (default: true)
include_patterns: Regex patterns URLs must match
exclude_patterns: Regex patterns to exclude URLs
adaptive: Enable adaptive crawling (default: false)
output_dir (optional): If provided, persists content to disk and returns metadata only
crawler, browser, script, timeout_sec: Same as scrape

Returns (without output_dir):

hljs language-json

{
  "start_url": "https://example.com",
  "pages": [
    {
      "url": "https://example.com/page1",
      "markdown": "Content...",
      "links": ["..."]
    }
  ],
  "total_pages": 3
}

Returns (with output_dir):

hljs language-json

{
  "run_id": "crawl_20250815_122828_464944",
  "pages_ok": 3,
  "pages_failed": 0,
  "manifest_path": "/path/to/output/manifest.json",
  "bytes_written": 690
}

`crawl_site`

Comprehensive site crawling with persistence (always requires output_dir).

Arguments:

entry_url (required): Starting URL for site crawl
output_dir (required): Directory to persist results
max_depth: Maximum crawl depth (default: 2, max: 6)
max_pages: Maximum pages to crawl (default: 200, max: 5000)
Additional config options for filtering and performance

Returns:

hljs language-json

{
  "run_id": "site_20250815_122851_0e2455",
  "output_dir": "/path/to/output",
  "manifest_path": "/path/to/output/manifest.json",
  "pages_ok": 15,
  "pages_failed": 2,
  "bytes_written": 45672
}

`crawl_sitemap`

Sitemap-based crawling with persistence (always requires output_dir).

Arguments:

sitemap_url (required): URL to sitemap.xml
output_dir (required): Directory to persist results
max_entries: Maximum sitemap entries to process (default: 1000)
Additional config options for filtering and performance

Returns:

hljs language-json

{
  "run_id": "sitemap_20250815_123006_667d71",
  "output_dir": "/path/to/output",
  "manifest_path": "/path/to/output/manifest.json", 
  "pages_ok": 25,
  "pages_failed": 0,
  "bytes_written": 123456
}

💡 Usage Examples

Standalone MCP Server

hljs language-python

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

params = StdioServerParameters(
    command="python",
    args=["-m", "crawler_agent.mcp_server"]
)

async with stdio_client(params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()
        
        # Scrape a single page
        result = await session.call_tool("scrape", {
            "url": "https://example.com"
        })
        
        # Crawl with adaptive stopping
        result = await session.call_tool("crawl", {
            "seed_url": "https://docs.example.com",
            "max_pages": 10,
            "adaptive": True
        })

OpenAI Agents SDK Integration

hljs language-python

from agents import Agent, Runner
from agents.mcp.server import MCPServerStdio

async with MCPServerStdio(
    params={
        "command": "python", 
        "args": ["-m", "crawler_agent.mcp_server"]
    },
    cache_tools_list=True
) as server:
    
    agent = Agent(
        name="Research Assistant",
        instructions="Use scrape and crawl tools to research topics.",
        mcp_servers=[server],
        mcp_config={"convert_schemas_to_strict": True}
    )
    
    result = await Runner.run(
        agent, 
        "Research the latest AI safety papers"
    )

Cursor/Claude Code Integration

This project supports both Cursor and Claude Code with synchronized configuration:

Cursor Setup

Uses .cursorrules for AI assistant behavior and project guidance
Automatic MCP tool detection and integration
Project-specific rules and workflow guidance

Claude Code Setup

Uses CLAUDE.md for comprehensive project context and memory bank
Native MCP integration via mcp__crawl4ai-mcp__* tool calls
Global config: ~/.claude/claude_desktop_config.json or project config: .mcp.json

Docker Integration for Both Environments

Both Cursor and Claude Code can use the Dockerized MCP server:

hljs language-json

// .mcp.json (project-scoped configuration)
{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--volume", "./crawls:/app/crawls",
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

Dual Environment Features

Synchronized Documentation: Changes to documentation are maintained across both environments
Shared MCP Configuration: Both use the same MCP server and tool schemas
Docker Compatibility: Consistent environment across both Cursor and Claude Code
Cross-Compatible: Projects work seamlessly whether using Cursor or Claude Code

🐳 Docker Usage

Quick Start with Docker

The Docker approach eliminates all manual setup and provides a consistent environment:

hljs language-bash

# 1. Clone and build
git clone https://github.com/uysalsadi/crawl4ai-mcp-server.git
cd crawl4ai-mcp-server
./docker-run.sh build

# 2. Test the installation
./docker-run.sh test

# 3. Run MCP server
./docker-run.sh run

Docker Commands Reference

Command	Description
`./docker-run.sh build`	Build the Docker image
`./docker-run.sh run`	Run MCP server in stdio mode
`./docker-run.sh test`	Run smoke tests
`./docker-run.sh dev`	Development mode with shell access
`./docker-run.sh stop`	Stop running containers
`./docker-run.sh clean`	Remove containers and images
`./docker-run.sh logs`	Show container logs

Docker Compose Services

The docker-compose.yml provides multiple service configurations:

crawl4ai-mcp: Production MCP server
crawl4ai-mcp-dev: Development container with shell access
crawl4ai-mcp-test: Test runner for smoke tests

MCP Integration with Docker

Global MCP Configuration (Claude Code)

Add to ~/.claude/claude_desktop_config.json:

hljs language-json

{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i", 
        "--volume", "/tmp/crawl4ai-crawls:/app/crawls",
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

✅ Copy this exact config - it uses the published Docker image!

Global MCP Configuration (Cursor)

Add to ~/.cursor/mcp.json:

hljs language-json

{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--volume", "/tmp/crawl4ai-crawls:/app/crawls", 
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

✅ This configuration is tested and working!

Project-Scoped Configuration

Add to your project's .mcp.json:

hljs language-json

{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "--volume", "./crawls:/app/crawls",
        "uysalsadi/crawl4ai-mcp-server:latest"
      ],
      "env": {
        "CRAWL4AI_MCP_LOG": "INFO"
      }
    }
  }
}

✅ This project already includes this configuration - see .mcp.json

Configuration Validation

After setting up any MCP configuration:

Test the Docker image works:
hljs language-bash
```
python test-config.py
```
Restart your editor (Cursor/Claude Code) to reload MCP configuration
Verify tools are available:
- Look for crawl4ai-mcp in the MCP tools panel
- Should see 4 tools: scrape, crawl, crawl_site, crawl_sitemap

If tools don't appear, check:

Docker is running and image is accessible
MCP configuration file syntax is valid JSON
Editor has been restarted after config changes

Docker Advantages

Environment Variables

Set these when running Docker containers:

hljs language-bash

# Using docker-compose
OPENAI_API_KEY=your-key docker-compose up crawl4ai-mcp

# Using docker run directly
docker run --rm -i \
  -e OPENAI_API_KEY=your-key \
  -e CRAWL4AI_MCP_LOG=DEBUG \
  -v ./crawls:/app/crawls \
  crawl4ai-mcp-server:latest

⚙️ Configuration

Environment Variables

TARGET_URL: Default URL for smoke testing (default: https://modelcontextprotocol.io/docs)
RESEARCH_TASK: Custom research task for agents example
OPENAI_API_KEY: Required for OpenAI Agents SDK

Safety Settings

The server blocks these URL patterns by default:

localhost, 127.0.0.1, ::1
Private IP ranges (RFC 1918)
file:// schemes
.local, .internal, .lan domains

🚀 Advanced Features

Adaptive Crawling

When adaptive: true is set, the crawler uses a simple content-based stopping strategy:

Stops when total content exceeds 5,000 characters
Prevents over-crawling for information gathering tasks
More sophisticated LLM-based strategies can be added

Custom Filtering

Use regex patterns to control which URLs are crawled:

hljs language-python

await session.call_tool("crawl", {
    "seed_url": "https://docs.example.com",
    "include_patterns": [r"/docs/", r"/api/"],
    "exclude_patterns": [r"/old/", r"\.pdf$"]
})

Browser Configuration

Pass custom browser/crawler settings:

hljs language-python

await session.call_tool("scrape", {
    "url": "https://example.com",
    "browser": {"headless": True, "viewport": {"width": 1280, "height": 720}},
    "crawler": {"verbose": True}
})

🏗️ Architecture

hljs language-scss

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   AI Agent      │───▶│   MCP Server     │───▶│   Crawl4AI      │
│ (Cursor/Agents) │    │ (stdio/stdio)    │    │ (Playwright)    │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │
                                ▼
                       ┌──────────────────┐
                       │  Safety Guards   │
                       │ (URL validation) │
                       └──────────────────┘

📦 Publishing to Docker Hub

For maintainers who want to publish updates to the Docker registry:

Publishing Process

hljs language-bash

# 1. Login to Docker Hub (one time setup)
./docker-push.sh login

# 2. Build, push, and test everything
./docker-push.sh all

# Or do steps individually:
./docker-push.sh build    # Build and tag image
./docker-push.sh push     # Push to Docker Hub
./docker-push.sh test     # Test the published image

Docker Hub Repository

The image is published at: uysalsadi/crawl4ai-mcp-server

Latest: uysalsadi/crawl4ai-mcp-server:latest
Versioned: uysalsadi/crawl4ai-mcp-server:v1.0.0

Usage Statistics

Users can pull and use the image without any local setup:

hljs language-bash

docker pull uysalsadi/crawl4ai-mcp-server:latest

🔧 Development

Project Structure

hljs language-graphql

crawler_agent/
├── __init__.py
├── mcp_server.py         # Main MCP server
├── safety.py            # URL safety validation
├── adaptive_strategy.py  # Adaptive crawling logic
├── smoke_client.py       # Basic testing client
└── agents_example.py     # OpenAI Agents SDK example

Testing

hljs language-bash

# Test MCP server
python -m crawler_agent.smoke_client

# Test adaptive crawling
python test_adaptive.py

# Test with agents (requires OPENAI_API_KEY)
python -m crawler_agent.agents_example

Contributing

Environment Setup: Always activate .venv before development work
Documentation: Follow the Documentation Memory Rule - synchronize changes across CLAUDE.md, .cursorrules, and README.md
Code Style: Follow .cursorrules for Cursor or CLAUDE.md for Claude Code
Testing: Use python -m crawler_agent.smoke_client for validation
Safety: Ensure security guards are maintained for all new features
Dual Compatibility: Verify changes work in both Cursor and Claude Code environments

📚 References

📄 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

This project uses Crawl4AI by UncleCode for web scraping capabilities. Crawl4AI is an excellent open-source LLM-friendly web crawler and scraper.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

⭐ Support

If this project helped you, please give it a star! It helps others discover the project.

🐛 Issues

Found a bug or have a feature request? Please open an issue.

crawl4ai-mcp-server

Crawl4AI MCP Server

Features

🚀 Quick Start

Option A: Docker (Recommended) 🐳

Option A1: Use Pre-built Image (Fastest) ⚡

Option A2: Build Yourself

Option B: Manual Installation

Use with OpenAI Agents SDK

🛠️ Tools Reference

scrape

crawl

crawl_site

crawl_sitemap

💡 Usage Examples

Standalone MCP Server

OpenAI Agents SDK Integration

Cursor/Claude Code Integration

Cursor Setup

Claude Code Setup

Docker Integration for Both Environments

Dual Environment Features

🐳 Docker Usage

Quick Start with Docker

Docker Commands Reference

Docker Compose Services

MCP Integration with Docker

Global MCP Configuration (Claude Code)

Global MCP Configuration (Cursor)

Project-Scoped Configuration

Configuration Validation

Docker Advantages

Environment Variables

⚙️ Configuration

Environment Variables

Safety Settings

🚀 Advanced Features

Adaptive Crawling

Custom Filtering

Browser Configuration

🏗️ Architecture

📦 Publishing to Docker Hub

Publishing Process

Docker Hub Repository

Usage Statistics

🔧 Development

Project Structure

Testing

Contributing

📚 References

📄 License

🙏 Acknowledgments

🤝 Contributing

⭐ Support

🐛 Issues

Similar Packages

crawl4ai-mcp-server

Crawl4AI MCP Server

Features

🚀 Quick Start

Option A: Docker (Recommended) 🐳

Option A1: Use Pre-built Image (Fastest) ⚡

Option A2: Build Yourself

Option B: Manual Installation

Use with OpenAI Agents SDK

🛠️ Tools Reference

scrape

crawl

crawl_site

crawl_sitemap

💡 Usage Examples

Standalone MCP Server

OpenAI Agents SDK Integration

Cursor/Claude Code Integration

Cursor Setup

Claude Code Setup

Docker Integration for Both Environments

Dual Environment Features

🐳 Docker Usage

Quick Start with Docker

`scrape`

`crawl`

`crawl_site`

`crawl_sitemap`

`scrape`

`crawl`

`crawl_site`

`crawl_sitemap`