A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
An MCP (Model Context Protocol) server that can scrape web pages and extract content using CSS selectors. Built with den
An MCP (Model Context Protocol) server that can scrape web pages and extract content using CSS selectors. Built with deno-dom for fast HTML parsing.
Most LLM clients already have some HTTP fetching capabilities, but fetching a page directly often returns a lot of unnecessary content. This not only confuses the LLM, but also quickly fills up the context window.
That's where this MCP comes in—it enables targeted scraping using CSS selectors, so you only extract the content you actually need.
See ZedExample.md for a real-world usage example.
deno run --allow-net jsr:@sigma/scrap-mcp
You can also run this with Bun and Node.js using bunx and npx respectively:
bunx rjsr @sigma/scrap-mcp
npx rjsr @sigma/scrap-mcp
scrape_pageThe main tool for scraping web pages and extracting content.
Parameters:
url (string, required): The URL of the page to scrapequery_selector (string, required): CSS selector to query elementsReturn Format:
Found X elements matching selector "SELECTOR" on URL:
Element 1: TEXT_CONTENT
Element 2: TEXT_CONTENT
...
Extract all headings:
{
"url": "https://example.com",
"query_selector": "h1, h2, h3"
}
Extract all paragraphs:
{
"url": "https://example.com",
"query_selector": "p"
}
Extract content from specific classes:
{
"url": "https://news.ycombinator.com",
"query_selector": ".titleline > a"
}
Extract all links:
{
"url": "https://example.com",
"query_selector": "a"
}
Extract navigation items:
{
"url": "https://deno.land",
"query_selector": "nav a"
}
Extract elements with specific attributes:
{
"url": "https://example.com",
"query_selector": "a[href^='https://']"
}
Extract form inputs:
{
"url": "https://example.com",
"query_selector": "input[type='text'], input[type='email']"
}
h1 - All H1 headings.className - All elements with class "className"#elementId - Element with ID "elementId"* - All elementsdiv p - All paragraphs inside div elementsdiv > p - Direct paragraph children of div elementsh1 + p - Paragraphs immediately following H1 elementsh1 ~ p - All paragraphs that are siblings after H1 elements[href] - All elements with href attributea[title] - All links with title attributea[href^="https://"] - Links starting with "https://"a[href$=".pdf"] - Links ending with ".pdf"a[href*="github"] - Links containing "github"li:first-child - First list itemli:last-child - Last list itemli:nth-child(2n) - Even-numbered list itemsp:not(.special) - Paragraphs without "special" class.article-content p, .article-content h2 - Paragraphs and H2s in article
contentnav ul li a - Navigation linkstable tr:nth-child(odd) td - Cells in odd table rowsform input[required] - Required form inputs@modelcontextprotocol/sdk@1.8.0 - MCP SDK for server implementation@b-fuze/deno-dom@^0.1.49 - Fast DOM parser for HTML contentzod@3.24.2 - Runtime type validation and schema definitionThe server provides comprehensive error handling for:
All errors are returned as readable text messages through the MCP protocol.
--allow-net - To fetch web pages from the internet"Permission denied" errors:
# Ensure all required permissions are granted
deno run --allow-net jsr:@sigma/scrap-mcp
"No elements found" with valid selector:
MIT License - see LICENSE file for details
MCP server integration for DaVinci Resolve Studio
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
A Jetbrains IDE IntelliJ plugin aimed to provide coding agents the ability to leverage intelliJ's indexing of the codeba