A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Retio-pagemap
PageMap converts raw HTML (100K+ tokens) into structured, AI-readable page maps (2-5K tokens) — a 97% token reduction. It works as an MCP server, Python SDK, and CLI, supporting 16 page types and 30+ e-commerce sites. Agents can read, click, type, and navigate any web page.
"Give your agent eyes and hands on the web."
Playwright MCP dumps 50-540KB accessibility snapshots per page, overflowing context windows after 2-3 navigations. Firecrawl and Jina convert HTML to markdown — read-only, no interaction.
PageMap gives your agent a compressed, actionable view of any web page:
| PageMap | Playwright MCP | Firecrawl | Jina Reader | |
|---|---|---|---|---|
| Tokens / page | 2-5K | 6-50K | 10-50K | 10-50K |
| Interaction | click / type / select / hover | Raw tree parsing | Read-only | Read-only |
| Multi-page sessions | Unlimited | Breaks at 2-3 pages | N/A | N/A |
| Task success (94 tasks) | 84.7% | 61.5% | 64.5% | 57.8% |
| Avg tokens / task | 2,710 | 13,737 | 13,888 | 11,424 |
| Cost / 94 tasks | $1.06 | $4.09 | $3.98 | $2.26 |
Benchmarked across 11 e-commerce sites, 94 static tasks, 7 conditions. 8,100+ tests passing.
Chromium is auto-installed on first use — no manual playwright install needed.
pip install retio-pagemap
Add to Claude Code, Cursor, Windsurf, or Claude Desktop:
{
"mcpServers": {
"pagemap": {
"command": "uvx",
"args": ["retio-pagemap"]
}
}
}
Claude Desktop (macOS): Use the absolute path to
uvx— runwhich uvx(e.g./opt/homebrew/bin/uvx).
VS Code (Copilot): Use
"servers"instead of"mcpServers"in.vscode/mcp.json.
docker run -p 8000:8000 retio1001/pagemap --transport http
Not just reading — your agent can click buttons, fill forms, select options, manage tabs, and navigate across pages. 13 tools cover the full browsing workflow:
get_page_map · execute_action · fill_form · scroll_page · wait_for · take_screenshot · get_page_state · navigate_back · batch_get_page_map · open_tab · switch_tab · list_tabs · close_tab
PageMap automatically classifies pages and applies optimized extraction for each type:
product_detail · listing · search_results · article · news · video · login · form · checkout · dashboard · help_faq · settings · error · documentation · landing · blocked
Built-in support for 30+ major e-commerce sites across 4 tiers:
Structured extraction of prices, options (size/color), ratings, availability — with automatic cookie consent handling and login barrier detection.
PageMap detects problems and tells your agent what to do:
barrier field with the diagnosis and suggested next stepsrole="dialog" + HTML regex 2-phase detection. Promotional popups (newsletter, exit-intent) auto-dismissedto_delta_packet() serializer emits digest-bound evidence units, claim candidates, provenance, and authority flags for downstream memory/review systems without changing the default MCP outputLocale auto-detected from URL. Token budgets adjusted for CJK scripts.
| Language | Locale | Language | Locale |
|---|---|---|---|
| English | en | Chinese | zh |
| Korean | ko | Spanish | es |
| Japanese | ja | Italian | it |
| French | fr | Portuguese | pt |
| German | de | Dutch | nl |
Default mode. Runs as a local MCP server — no server setup needed.
retio-pagemap
docker run -p 8000:8000 retio1001/pagemap --transport http
Multi-architecture images (amd64/arm64) available on Docker Hub and GitHub Container Registry.
import asyncio
from pagemap.browser_session import BrowserSession
from pagemap.delta_serializer import to_delta_packet
from pagemap.page_map_builder import build_page_map_live
from pagemap.serializer import to_agent_prompt, to_json
async def main():
async with BrowserSession() as session:
page_map = await build_page_map_live(session, "https://example.com/product/123")
print(to_agent_prompt(page_map)) # Agent-optimized text format
print(to_json(page_map)) # Structured JSON
print(to_delta_packet(page_map)) # Digest-bound evidence packet
print(page_map.page_type) # "product_detail"
print(page_map.interactables) # [Interactable(ref=1, role="button", ...)]
print(page_map.metadata) # {"name": "...", "price": "..."}
asyncio.run(main())
For offline processing (no browser):
from pagemap.page_map_builder import build_page_map_offline
page_map = build_page_map_offline(open("page.html").read(), url="https://example.com/product/123")
PageMap treats all web content as untrusted input:
--ignore-robots opt-out flagLocal development: Private IPs are blocked by default. Use --allow-local or PAGEMAP_ALLOW_LOCAL=1.
Users are responsible for complying with the terms of service of target websites and all applicable laws when using PageMap.
"spawn uvx ENOENT" (Claude Desktop on macOS) — Claude Desktop does not inherit your shell PATH. Run which uvx and use the absolute path in your config.
First page takes a long time — Chromium cold start takes ~10-30s on first navigation. Subsequent pages load in 1-3 seconds.
Localhost blocked — Use --allow-local flag or set PAGEMAP_ALLOW_LOCAL=1.
Chromium not found — Run pip install retio-pagemap && playwright install chromium to install manually.
Have a question or idea? Join the conversation in GitHub Discussions.
git clone https://github.com/Retio-ai/Retio-pagemap.git
cd Retio-pagemap
uv sync --group dev
playwright install chromium
uv run pytest --tb=short -q
Local (STDIO) — Free forever. Self-hosted, open source under AGPL-3.0.
Cloud API — Hosted multi-tenant server with auth, rate limiting, and credit-based billing. Contact retio1001@retio.ai for access.
AGPL-3.0-only — see LICENSE for the full text.
For commercial licensing options, contact retio1001@retio.ai.
This section is written for AI agents using PageMap as an MCP tool.
| Tool | When to use |
|---|---|
get_page_map | Start here. Navigate to a URL and get a full structured map with numbered refs. |
execute_action | Click, type, select, or hover using a ref number from the last get_page_map. |
fill_form | Fill multiple form fields in one call. More efficient than sequential execute_action calls. |
get_page_state | Check current URL and title without a full rebuild. Use after actions that may navigate. |
scroll_page | Scroll to reveal lazy-loaded content before calling get_page_map again. |
wait_for | Wait for dynamic content to appear (e.g. after a search or form submit). |
take_screenshot | Capture the visual state when the PageMap alone is ambiguous. |
navigate_back | Go back one step in browser history. |
open_tab | Open a new browser tab and navigate to a URL. |
switch_tab | Switch to a different open tab by index. |
list_tabs | List all open tabs with their URLs and titles. |
close_tab | Close a tab by index. |
batch_get_page_map | Fetch multiple URLs in parallel. Use for comparison tasks. |
URL: https://example.com/product/123
Title: Product Name
Type: product_detail # auto-detected page type
## Actions
[1] button: Add to cart (click)
[2] select: Size (select) — options: S, M, L, XL
[3] link: See all reviews (click)
...
## Info
Price: $49.99
Rating: 4.5 / 5 (128 reviews)
Description: ...
## Images
[1] https://cdn.example.com/product.jpg
## Meta
Tokens: ~1,800 | Interactables: 24 | Generation: 380ms
## Actions — Every interactive element on the page with a stable ref number.## Info — Key page content extracted from HTML: prices, titles, ratings, descriptions.## Images — Product/content image URLs.## Meta — Token count, interactable count, generation time.When PageMap encounters a page-level obstacle, it includes a barrier field in the response:
State:
barrier: login_required
barrier_hint: "Login form detected with email + password fields. Use fill_form to authenticate."
Possible barriers: cookie_consent, login_required, bot_blocked, out_of_stock, empty_results, error_page, age_verification, region_restricted, popup_overlay.
When you see a barrier: follow the barrier_hint guidance. For bot_blocked, wait and retry. For login_required, use fill_form with credentials.
Refs are assigned by get_page_map and remain valid until the page state changes.
Refs are invalidated when:
execute_action causes a page-level changeWhen you get a stale ref error: call get_page_map again to get fresh refs before retrying.
When a page exceeds the token budget, content is pruned in this order:
## Actions and ## Info are always preservedIf key content seems missing, try scroll_page to load lazy content, then get_page_map again.
1. get_page_map(url) → read Actions + Info, pick refs
2. execute_action(ref, ...) → interact
3. get_page_state() → confirm navigation occurred
4. get_page_map(new_url) → get fresh refs for next step
For pages with dynamic content (search results, filters):
1. get_page_map(url)
2. execute_action(ref, "click") → trigger search/filter
3. wait_for(text="results") → wait for content
4. get_page_map(url) → get updated map
--allow-local flag.PageMap — Structured Web Intelligence for the Agent Era.
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
Secure MCP server for MySQL database interaction, queries, and schema management
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F