A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API
Turn websites into clean markdown, JSON, and LLM-ready context.
CLI, MCP server, REST API, and SDKs for AI agents and RAG pipelines.
Most web scraping tools give your agent one of two bad outputs:
webclaw.io is the hosted web extraction API for webclaw. This repo contains the open-source CLI, MCP server, extraction engine, and self-hostable server.
webclaw turns a URL into clean content your tools can actually use.
webclaw https://example.com --format markdown
# Example Domain
This domain is for use in illustrative examples in documents.
You may use this domain in literature without prior coordination or asking for permission.
Use it from the terminal, wire it into Claude/Cursor through MCP, call the hosted API from your app, or self-host the OSS server.
The fastest way to connect webclaw to Claude Code, Claude Desktop, Cursor, Windsurf, OpenCode, Codex CLI, and other MCP-compatible tools:
npx create-webclaw
The installer detects supported clients and configures the MCP server for you.
brew tap 0xMassi/webclaw
brew install webclaw
Download macOS and Linux binaries from GitHub Releases.
docker run --rm ghcr.io/0xmassi/webclaw https://example.com
cargo install --git https://github.com/0xMassi/webclaw.git webclaw-cli
cargo install --git https://github.com/0xMassi/webclaw.git webclaw-mcp
If building from source fails because native build tools are missing, install the platform prerequisites:
| OS | Command |
|---|---|
| Debian / Ubuntu | sudo apt install -y pkg-config libssl-dev cmake clang git build-essential |
| Fedora / RHEL | sudo dnf install -y pkg-config openssl-devel cmake clang git make gcc |
| Arch | sudo pacman -S pkg-config openssl cmake clang git base-devel |
| macOS | xcode-select --install |
webclaw https://stripe.com --format markdown
webclaw https://docs.anthropic.com --format llm
webclaw https://example.com/blog/post --only-main-content
webclaw https://example.com \
--include "article, main, .content" \
--exclude "nav, footer, .sidebar, .ad"
webclaw https://docs.rust-lang.org --crawl --depth 2 --max-pages 50
webclaw https://github.com --brand
webclaw https://example.com/pricing --format json > pricing-old.json
webclaw https://example.com/pricing --diff-with pricing-old.json
webclaw ships with an MCP server for AI agents.
npx create-webclaw
Manual config:
{
"mcpServers": {
"webclaw": {
"command": "~/.webclaw/webclaw-mcp"
}
}
}
Then ask your agent things like:
Scrape these competitor pricing pages and summarize the differences.
Crawl this documentation site and prepare clean context for a RAG index.
Extract the brand colors, fonts, and logos from this company website.
| Tool | What it does | Local |
|---|---|---|
scrape | Extract one URL as markdown, text, JSON, LLM format, or HTML | Yes |
crawl | Follow same-origin links and extract discovered pages | Yes |
map | Discover URLs without extracting every page | Yes |
batch | Scrape multiple URLs in parallel | Yes |
extract | Convert page content into structured data | Yes, with local or configured LLM |
summarize | Summarize a page | Yes, with local or configured LLM |
diff | Compare page content snapshots | Yes |
brand | Extract colors, fonts, logos, and metadata | Yes |
search | Search the web and scrape results | Hosted API |
research | Multi-source research workflow | Hosted API |
npm install @webclaw/sdk
pip install webclaw
go get github.com/0xMassi/webclaw-go
import { Webclaw } from "@webclaw/sdk";
const client = new Webclaw({ apiKey: process.env.WEBCLAW_API_KEY! });
const page = await client.scrape({
url: "https://example.com",
formats: ["markdown"],
only_main_content: true,
});
console.log(page.markdown);
from webclaw import Webclaw
client = Webclaw(api_key="wc_your_key")
page = client.scrape(
"https://example.com",
formats=["markdown"],
only_main_content=True,
)
print(page.markdown)
curl -X POST https://api.webclaw.io/v1/scrape \
-H "Authorization: Bearer $WEBCLAW_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"],
"only_main_content": true
}'
| Format | Use it when you need |
|---|---|
markdown | Clean page content with structure preserved |
llm | Compact context for agents and RAG pipelines |
text | Plain text with minimal formatting |
json | Structured metadata, links, images, and extracted fields |
html | Cleaned HTML for custom processing |
The CLI and MCP server work locally without an account for the core extraction path.
Use the hosted API at webclaw.io when you need:
export WEBCLAW_API_KEY=wc_your_key
webclaw https://example.com --cloud
| Use case | Example |
|---|---|
| AI agent web access | Give Claude, Cursor, or another MCP client clean page context |
| RAG ingestion | Crawl docs, help centers, blogs, and knowledge bases |
| Competitor monitoring | Track pricing pages, changelogs, docs, and product pages |
| Structured extraction | Turn messy pages into typed JSON for automations |
| Research workflows | Search, scrape, summarize, and cite multiple sources |
| Brand intelligence | Extract logos, colors, fonts, and social metadata |
webclaw/
crates/
webclaw-core HTML to markdown, text, JSON, and LLM-ready output
webclaw-fetch Fetching, crawling, batching, and mapping
webclaw-llm Local and hosted LLM provider support
webclaw-pdf PDF text extraction
webclaw-mcp MCP server for AI agents
webclaw-cli Command-line interface
webclaw-core is pure extraction logic: no network I/O, small surface area, and usable independently from the fetching layer.
| Variable | Description |
|---|---|
WEBCLAW_API_KEY | Hosted API key |
OLLAMA_HOST | Ollama URL for local LLM features |
OPENAI_API_KEY | OpenAI-compatible LLM provider key |
OPENAI_BASE_URL | OpenAI-compatible base URL |
ANTHROPIC_API_KEY | Anthropic-compatible LLM provider key |
ANTHROPIC_BASE_URL | Anthropic-compatible base URL |
WEBCLAW_PROXY | Single proxy URL |
WEBCLAW_PROXY_FILE | Proxy pool file |
The most useful contributions right now are practical and small:
Good first places to start:
If a page extracts badly, include:
URL:
Command or API request:
Expected output:
Actual output:
Format used: markdown / llm / text / json / html
CLI, MCP, SDK, or API:
Please remove secrets, cookies, private tokens, and customer data from logs before posting.
|
| ColdProxy supports webclaw as an Infrastructure Partner, providing residential IPv4, residential IPv6, and datacenter IPv6 proxy infrastructure across 195+ countries for public data collection, regional testing, monitoring, and web scraping workflows. Explore ColdProxy's latest plans and available offers directly on the website. |
|
Quantum Proxies provides fast, reliable residential and ISP proxy infrastructure for developers running large-scale extraction workloads.
Get 20% off any plan with code WEBCLAW20 at
quantumproxies.net.
|
|
Proxy-Seller maintains a global network of residential and datacenter proxies optimized for web extraction at scale.
The service supports high-volume concurrent scraping, geographic rotation, and integration with web extraction tools.
Use code WBC15 for 15% off IPv4, IPv6, ISP, and Residential proxies, and 10% off Mobile at
proxy-seller.com.
|
|
RapidProxy delivers fast, reliable proxy infrastructure for large-scale data collection.
With 90M+ residential IPs, smart rotation, high concurrency, AI-powered CAPTCHA bypass, and non-expiring traffic, it helps keep scraping workflows stable at scale.
Use code webclaw for 10% off, or
Try it free.
|
Third-party plugins that integrate webclaw with AI agent platforms:
| Plugin | Platform | What it does |
|---|---|---|
| openclaw-webclaw | OpenClaw | Native webclaw v1 API plugin with 9 tools: scrape, search, crawl, extract, summarize, diff, map, batch, brand |
| hermes-webclaw | Hermes Agent | Web search provider and 9 dedicated tools for the full v1 API surface. Install with hermes plugins install jal-co/hermes-webclaw |
Built a webclaw integration? Open a PR to add it here.
Thanks to everyone improving webclaw through issues, examples, docs, bug reports, and pull requests.
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F
Unity MCP acts as a bridge between AI assistants and your Unity Editor. Give your LLM tools to manage assets, control sc