A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Your AI research assistant that cites real sources and stays honest. Search the entire web or narrow it down to just the
Your AI research assistant that cites real sources and stays honest.
Search the entire web or narrow it down to just the sites you trust;
medical journals, court databases, news outlets, academic papers.
Analyze the full source, not just snippets. Links that work, citations you can trust,
no made up closed garden pre-synthesized results.
macOS (Homebrew):
brew install zoharbabin/tap/web-researcher-mcp
claude mcp add --scope user web-researcher -- web-researcher-mcp
macOS / Linux (no package manager):
curl -fsSL https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.sh | sh
Windows (PowerShell):
powershell -ExecutionPolicy Bypass -c "irm https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.ps1 | iex"
No dev tools needed — each method downloads the binary, verifies its checksum, and puts it on your PATH. The curl/PowerShell installers also register it with Claude Code automatically when the claude CLI is present; Homebrew installs the binary, so run the claude mcp add line above (shown) to connect it.
Using a different MCP client (Claude Desktop, Cursor, …) or want to pass API keys? See Connect to Your AI Assistant for the per-app config, and Configuration to pick a search provider.
Your AI can now search the web, read full articles, find academic papers, look up patents, and run multi-step research — only from sources you pick.
Perplexity gets its citations wrong over a third of the time. It links to papers that don't exist, invents DOIs, and presents SEO spam with the same confidence as peer-reviewed research. ChatGPT's web search isn't much better — it can't tell a blog post from a court filing.
If your work gets cited, published, submitted to a court, or shown to a client — you can't afford "probably real" sources.
This tool fixes the root cause: instead of searching the entire web and hoping, you tell your AI exactly which sources to search. We call these "search lenses" — curated lists of trusted sites for each field.
| What you get | What that means for you |
|---|---|
| Search lenses — choose your sources by field | Your AI only sees the sites you trust (PubMed, SEC.gov, arXiv — not random blogs) |
| Research tools for every source type | Papers, patents, SEC filings, US court records, economic data, news, web pages, images, full-text reading, grounded answers with citations, structured extraction, and multi-step deep research |
| Always has a backup | Multiple search engines working together — if one has issues, the others pick up automatically |
| Reads full articles | Doesn't just give you snippets — extracts and reads entire pages, PDFs, Word docs, even YouTube transcripts |
| Real citations, formatted | Every source comes with a proper APA/MLA citation and a link that actually works |
| Your queries stay private | Runs on your machine — nobody sees what you're researching. Not us, not anyone. |
| Paper trail | Every search is logged so you can reproduce your research process months later |
Works with Claude, Claude Desktop, Cursor, and any AI assistant that supports tool use.
https://github.com/user-attachments/assets/1d17af8e-1ec4-4a37-b42b-f26712ebe860
| web-researcher-mcp | Perplexity | Scite.ai | Elicit | |
|---|---|---|---|---|
| You pick which sources are searched | Yes (built-in + custom lenses) | No | No | No |
| Makes up citations | Never — every link is real | ~37% incorrect | Rare (journals only) | Rare |
| Works across all fields | Yes — legal, medical, news, patents, everything | Yes | Journals only | Papers only |
| Keeps your research private | Yes — runs on your machine | No (they see everything) | No | No |
| Works inside your existing AI (Claude, Cursor, etc.) | Yes | No (separate app) | Partially | No (separate app) |
| Can read full articles, not just snippets | Yes — pages, PDFs, Word docs, YouTube | No | No | Limited |
| Cost | Free forever (open source) | $20/mo | $20/mo | $10-49/mo |
| Tool | What it does |
|---|---|
web_search | Search the web — optionally restricted to only the sources you trust via lenses |
scrape_page | Read any URL in full — web pages, PDFs, Word docs, slideshows, YouTube transcripts; supports mode: raw for verbatim, unsanitized source (e.g. inspecting JSON or HTML) |
search_and_scrape | Search and then read the best results — with quality scoring to surface the most reliable sources |
image_search | Find images by size, type, color, or format |
news_search | Search recent news with date controls and source filtering |
academic_search | Find real papers with real DOIs — authors, citation counts, open-access links |
citation_graph | Walk a paper's citation neighborhood — works it cites and works that cite it, with intent/influence signals |
patent_search | Search patent offices (US, Europe, international) with classification codes |
filing_search | Search SEC EDGAR for US public-company filings (10-K, 10-Q, 8-K, …) — or pull structured XBRL company facts |
legal_search | Search US court opinions and dockets via CourtListener — real cases with real citations |
econ_search | Look up economic time series from FRED (Federal Reserve) — GDP, CPI, unemployment, rates |
answer | Ask a factual question and get one synthesized answer with citations — the direct answer, not a reading list |
structured_search | Search and extract structured JSON per result (supply a schema), or pull entities by category (company, people, …) |
sequential_search | Multi-step deep research — your AI remembers what it already found and builds on it |
get_research_session | Recover a research session after context loss — picks up right where you left off |
research_export | Export a research session as a shareable report (markdown or JSON), with full per-step provenance |
format_bibliography | Turn collected sources into a formatted bibliography — APA, MLA, or BibTeX |
These are the always-on core tools. answer and structured_search are provider-independent — they activate when a capable provider (e.g. Exa) is configured. Operators can also enable opt-in, consent-gated tools (per-user analytics, long-term memory, shared workspaces) that appear only when their feature is turned on — see docs/TOOLS.md for the authoritative, CI-verified tool list and full schemas.
The server also ships guided prompt templates your AI assistant can pull in with one click — they walk it through a proven, multi-step process so you don't have to spell out every instruction:
| Template | What it guides your AI to do |
|---|---|
comprehensive-research | Run a structured, multi-step deep dive on a topic |
fact-check | Verify a claim against multiple independent sources |
competitive-analysis | Size up a company and its market (news, patents, web) |
literature-review | Systematically review academic literature on a topic |
In most AI apps these show up wherever you pick a prompt or "/" command. The server exposes live status resources too (stats://tools, stats://sessions, stats://rate-limits, stats://providers) so you — or your AI — can check usage, limits, and which providers are active. See docs/DEPLOYMENT.md for the full list.
brew install zoharbabin/tap/web-researcher-mcp
claude mcp add --scope user web-researcher -- web-researcher-mcp
Homebrew handles trust, updates, and PATH for you — no signing warnings.
macOS / Linux:
curl -fsSL https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.sh | sh
Windows (PowerShell):
powershell -ExecutionPolicy Bypass -c "irm https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.ps1 | iex"
Downloads the binary, verifies its SHA-256 checksum against the signed release, puts it on your PATH, and registers it with Claude Code if installed. Customize the install location:
INSTALL_DIR=/opt/tools curl -fsSL https://raw.githubusercontent.com/zoharbabin/web-researcher-mcp/main/install.sh | sh
Scoop (Windows):
scoop bucket add zoharbabin https://github.com/zoharbabin/scoop-bucket
scoop install web-researcher-mcp
Homebrew Cask (macOS — Developer ID-signed + notarized binary):
brew install --cask zoharbabin/tap/web-researcher-mcp
The cask ships the notarized darwin binary (Gatekeeper-clean). Most users want the formula above (brew install zoharbabin/tap/web-researcher-mcp), which the bare name resolves to; pass --cask explicitly for the notarized artifact.
Go install (if you have Go):
go install github.com/zoharbabin/web-researcher-mcp/cmd/web-researcher-mcp@latest
claude mcp add --scope user web-researcher -- web-researcher-mcp
Docker:
# STDIO mode needs -i so the container's stdin stays attached for MCP JSON-RPC
docker run -i --rm \
-e GOOGLE_CUSTOM_SEARCH_API_KEY=YOUR_KEY \
-e GOOGLE_CUSTOM_SEARCH_ID=YOUR_CX \
docker.io/zoharbabin/web-researcher-mcp:latest
Build from source:
git clone https://github.com/zoharbabin/web-researcher-mcp.git
cd web-researcher-mcp
go build -o web-researcher-mcp ./cmd/web-researcher-mcp
The install script registers with Claude Code automatically. For other apps, add to your AI's config file:
{
"mcpServers": {
"web-researcher": {
"command": "web-researcher-mcp",
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "YOUR_GOOGLE_API_KEY",
"GOOGLE_CUSTOM_SEARCH_ID": "YOUR_SEARCH_ENGINE_ID"
}
}
}
}
Any provider works — pick one and set its key. For example, Brave (no Google keys needed):
{
"mcpServers": {
"web-researcher": {
"command": "web-researcher-mcp",
"env": {
"SEARCH_PROVIDER": "brave",
"BRAVE_API_KEY": "YOUR_BRAVE_API_KEY"
}
}
}
}
Swap in any provider from the Configuration table by setting SEARCH_PROVIDER and that provider's key. Done — your AI assistant now has access to all research tools.
No API key required. DuckDuckGo is the built-in zero-config fallback — install and go. To raise result quality and unlock image/news search, add any one of the providers below. They're all optional and interchangeable — pick whichever you already use or prefer; the server treats them equally.
Set SEARCH_PROVIDER=<name> and supply that provider's key. Every provider works with search lenses, and any of them can be combined for automatic failover (see Search Providers).
| Provider | SEARCH_PROVIDER | Key variable(s) | Get a key |
|---|---|---|---|
| DuckDuckGo | duckduckgo | none | Built in — zero config |
| Google PSE | google | GOOGLE_CUSTOM_SEARCH_API_KEY + GOOGLE_CUSTOM_SEARCH_ID | cloud console + engine |
| Brave | brave | BRAVE_API_KEY | brave.com/search/api |
| Serper | serper | SERPER_API_KEY | serper.dev |
| SearchAPI.io | searchapi | SEARCHAPI_API_KEY | searchapi.io |
| SearXNG | searxng | SEARXNG_URL | self-hosted |
| Tavily | tavily | TAVILY_API_KEY | app.tavily.com |
| Exa | exa | EXA_API_KEY | dashboard.exa.ai |
Each provider has its own free tier, signup flow, and capability mix (images, news, freshness). See docs/API_SETUP.md for step-by-step setup of every provider and a capability comparison. Set up more than one and the server fails over automatically — see Search Providers.
When SEARCH_PROVIDER is unset, the server uses Google if its keys are present and otherwise falls back to the zero-config DuckDuckGo provider — so it always works out of the box, with or without keys.
| Variable | What to put | Why |
|---|---|---|
OPENALEX_EMAIL | Your email address | Unlocks faster access to OpenAlex's full catalog of scholarly works — no registration, just an email |
CROSSREF_EMAIL | Your email address | Same — faster access to DOI metadata for citations |
With these set,
academic_searchreturns real papers with DOIs, authors, citation counts, and open-access PDF links. Without them, it still works but uses web search as a fallback.
| Variable | What it is | Where to get it |
|---|---|---|
EPO_OPS_CONSUMER_KEY | European Patent Office key | developers.epo.org (free) |
EPO_OPS_CONSUMER_SECRET | EPO secret | Same as above |
USPTO_API_KEY | US patent office key | developer.uspto.gov (free) |
LENS_API_TOKEN | The Lens (patents + scholarly) | lens.org |
With these,
patent_searchreturns structured patent data with classification codes, dates, and inventors. Without them, it falls back to web search.
| Variable | Description | Default |
|---|---|---|
PORT | Run as a web server (for team/shared setups) | Off (runs locally) |
OAUTH_ISSUER_URL | Authentication server URL (for team access control) | |
OAUTH_AUDIENCE | Expected audience claim |
See docs/DEPLOYMENT.md for the complete list of all settings (cache, rate limiting, scraping, observability, etc.).
web-researcher-mcp/
├── cmd/web-researcher-mcp/ # Entry point (wiring only)
├── internal/
│ ├── config/ # Env-based strongly-typed configuration
│ ├── server/ # MCP server lifecycle + signal handling
│ ├── tools/ # Tool handlers (one file per tool)
│ ├── search/ # Pluggable search providers + router + lens routing
│ ├── scraper/ # Tiered scraping pipeline (markdown → stealth → HTML → browser; + optional paid Exa tier)
│ ├── documents/ # PDF, DOCX, PPTX parsing
│ ├── cache/ # Hybrid cache (memory + AES-encrypted disk)
│ ├── auth/ # OAuth 2.1 middleware + JWKS
│ ├── audit/ # Structured audit logging
│ ├── session/ # Per-tenant session persistence (memory index + encrypted disk)
│ ├── content/ # Sanitize, dedup, truncate, quality score
│ ├── metrics/ # Prometheus metrics + per-tool stats
│ ├── ratelimit/ # Three-tier rate limiting
│ ├── circuit/ # Circuit breaker for external APIs
│ ├── persist/ # TTL key/value store (memory or encrypted disk) for token revocation + rate quotas
│ └── resources/ # MCP Resources + Prompts
├── lenses/ # Search lens JSON files
└── docs/ # Extended documentation
The full layered diagram (MCP transports → tool dispatch → service layer → infrastructure) and the per-package map live in ARCHITECTURE.md — kept in one place to avoid drift.
You choose which search engine powers your research. All of them work with lenses.
| Provider | Whole-Web | Images | News | Notes |
|---|---|---|---|---|
| DuckDuckGo | Yes | — | — | Zero-config default (no API key needed); rate-limited for heavy use |
| Google PSE | Yes | Yes | Yes | Programmable Search Engine; free tier: 100 queries/day |
| Brave Search | Yes | Yes | Yes | Independent index; free tier available |
| Serper.dev | Yes | Yes | Yes | Google-identical results |
| SearXNG | Yes | Yes | Yes | Self-hosted, privacy-first, air-gapped deployments |
| SearchAPI.io | Yes | Yes | Yes | Unified API with multiple engine backends |
| Tavily | Yes | — | Yes | AI-agent search; clean, LLM-ready content |
| Exa | Yes | — | Yes | Neural/semantic search; also backs answer & structured_search and the optional paid scrape tier |
Set up multiple search engines so if one has issues, your research doesn't stop:
export SEARCH_ROUTING=brave,google,serper
If Brave is down, it automatically tries Google. If Google is rate-limited, it falls through to Serper. Your research just works.
See docs/DEPLOYMENT.md for advanced routing options (per-topic routing, patent-specific providers, etc.).
If you only have one search API key, that works too — just set it up and go.
Multi-provider routing (recommended):
export SEARCH_ROUTING=brave,google,serper
export BRAVE_API_KEY=BSAxxxxxxxxxx
export GOOGLE_CUSTOM_SEARCH_API_KEY=AIza...
export GOOGLE_CUSTOM_SEARCH_ID=017...
export SERPER_API_KEY=...
Single provider — Brave Search:
export SEARCH_PROVIDER=brave
export BRAVE_API_KEY=BSAxxxxxxxxxx
Single provider — SearXNG (self-hosted, privacy-first):
export SEARCH_PROVIDER=searxng
export SEARXNG_URL=http://localhost:8080
Single provider — Exa (also unlocks the answer & structured_search tools):
export SEARCH_PROVIDER=exa
export EXA_API_KEY=...
Single provider — Google PSE:
export SEARCH_PROVIDER=google
export GOOGLE_CUSTOM_SEARCH_API_KEY=AIza...
export GOOGLE_CUSTOM_SEARCH_ID=017...
Any provider from the Configuration table works the same way — set SEARCH_PROVIDER and its key(s).
Search lenses let you control which websites your AI is allowed to search. Instead of searching the entire web (and getting blogs, spam, and AI-generated junk), a lens restricts results to only the sources you trust for that topic.
| Lens | Focus |
|---|---|
docs | Official documentation and API references only |
academic | Preprint servers, repositories, open-access journals |
academic-extended | Preprint servers, OA aggregators, and repositories beyond core journal indexes |
clinical | Clinical trials, drug safety, evidence-based medicine |
security | CVEs, advisories, vulnerability research |
journalism | Public records, corporate filings, FOIA |
programming | Code docs, tutorials, Q&A |
devops | Infrastructure and operations — Kubernetes, Docker, Terraform, cloud, CI/CD |
news | Current events, journalism |
tech | Technology industry |
legal | Law, cases, statutes |
medical | Health, medicine |
finance | Markets, filings |
science | Research, papers |
government | Policy, regulations |
You can also create your own lenses for any field — just list the domains you trust.
When you (or your AI) use a lens, results come only from the sites in that lens. For example, using the medical lens means your AI searches PubMed, WHO, NIH, and other clinical sources — never health blogs or supplement ads.
Your AI uses lenses automatically when you ask it to. For example: "Search for recent findings on SGLT2 inhibitors using the clinical lens."
Add a JSON file to the lenses/ directory with the sites you trust:
{
"name": "my-industry",
"description": "Only searches sources I trust for my field",
"domains": [
"trusted-source.com",
"industry-journal.org",
"official-database.gov"
],
"cx": "",
"routing": ""
}
That's it. Now your AI will only search those sites when you use this lens. You can add up to ~10 domains per lens.
Advanced options (optional — most users can ignore these):
"google")Your research queries go directly from your machine to the search provider you chose. They never pass through our servers (we don't have servers). The tool runs entirely on your computer.
For the full threat model, see docs/SECURITY.md.
Add to your MCP config (~/.claude.json). Set SEARCH_PROVIDER and the matching key for whichever provider you use (see the Configuration table) — this example uses Google:
{
"mcpServers": {
"web-researcher": {
"command": "/path/to/web-researcher-mcp",
"env": {
"SEARCH_PROVIDER": "google",
"GOOGLE_CUSTOM_SEARCH_API_KEY": "AIza...",
"GOOGLE_CUSTOM_SEARCH_ID": "017..."
}
}
}
}
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"web-researcher": {
"command": "/path/to/web-researcher-mcp",
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "AIza...",
"GOOGLE_CUSTOM_SEARCH_ID": "017..."
}
}
}
}
Add to .cursor/mcp.json in your project root:
{
"mcpServers": {
"web-researcher": {
"command": "/path/to/web-researcher-mcp",
"env": {
"GOOGLE_CUSTOM_SEARCH_API_KEY": "AIza...",
"GOOGLE_CUSTOM_SEARCH_ID": "017..."
}
}
}
}
For teams that want one shared instance everyone connects to:
PORT=3000 \
OAUTH_ISSUER_URL=https://auth.example.com \
OAUTH_AUDIENCE=https://api.example.com \
./web-researcher-mcp
Then connect any AI app to http://localhost:3000/mcp/.
services:
web-researcher:
image: zoharbabin/web-researcher-mcp
ports:
- "3000:3000"
environment:
PORT: "3000"
SEARCH_PROVIDER: brave
BRAVE_API_KEY: ${BRAVE_API_KEY}
Note: Tool behavior is identical across all connection modes (STDIO and HTTP). The only differences are auth (HTTP requires OAuth) and rate limiting (HTTP enforces per-tenant limits; STDIO has only upstream API quotas). See docs/DEPLOYMENT.md for details.
Searches come back in under a second. Previously-seen results are cached so repeats are instant. Full article extraction works on 95%+ of the web — including sites that try to block bots. Heavy JavaScript sites get a real browser behind the scenes (automatic, no setup needed).
go build -o web-researcher-mcp ./cmd/web-researcher-mcp # Build
go test -race ./... # Test (with race detector)
make verify # Full gate: fmt, vet, lint, gosec, govulncheck, tests, E2E, build
The lint, gosec, and govulncheck tools are pinned as go.mod tool directives, so make verify runs them at the exact versions CI uses (no global installs needed). Branch protection requires the Lint, Test, Security, and E2E checks to pass.
See CONTRIBUTING.md for the full development workflow, code style guide, and PR process.
The server starts even with missing credentials (to allow MCP handshake). Set your API keys in the env block of your MCP client config, not in your shell profile.
For JavaScript-heavy sites, the tool uses a real browser (Chromium). With the binary install it auto-downloads on first use (~200MB). If you already have Chrome installed, set CHROME_PATH to point to it. The Docker image ships with Chromium bundled (CHROME_PATH preset), so JavaScript rendering works out of the box — no download.
The disk cache lives at your OS cache directory (e.g., ~/Library/Caches/web-researcher-mcp/ on macOS, ~/.cache/web-researcher-mcp/ on Linux). Delete that directory to clear it, or set CACHE_DIR to a custom path.
If your provider's free tier runs out (e.g. Google PSE allows 100 searches/day):
SEARCH_PROVIDER to any other option (see Configuration); each has its own free tierSEARCH_ROUTING=brave,google) — if one is rate-limited, it automatically falls through to the nextThis happens only if you replaced the binary by copying new bytes over the existing file in place (cp new /path/to/web-researcher-mcp). On Apple Silicon, macOS caches the binary's ad-hoc code signature against the file, and overwriting it in place can make the next launch get killed before it starts. The official installers (Homebrew, the one-command install.sh, and the Claude Code plugin) avoid this by installing to a fresh file. To fix a manual install, replace it cleanly and re-sign:
rm -f /path/to/web-researcher-mcp
cp /path/to/new-build /path/to/web-researcher-mcp
codesign --force -s - /path/to/web-researcher-mcp # ad-hoc re-sign
Then reconnect your client. (Re-running install.sh does this correctly for you.)
Contributions are welcome. Please see CONTRIBUTING.md for code style guidelines, development workflow, and how to submit pull requests.
| Document | Description |
|---|---|
| ARCHITECTURE.md | Design decisions, technology stack, dependencies |
| CONTRIBUTING.md | Development setup, code style, PR workflow |
| docs/TOOLS.md | Tool specifications and parameter schemas |
| docs/EXAMPLES.md | Usage examples with JSON tool calls |
| docs/API_SETUP.md | Search provider API key setup for all providers |
| docs/SECURITY.md | Threat model, SSRF, auth, compliance (SOC2/GDPR/FedRAMP) |
| docs/PRIVACY.md | What data goes where, third-party processors, retention |
| docs/DEPLOYMENT.md | Build, Docker, Kubernetes, client configs, scaling |
| docs/LESSONS_LEARNED.md | Node.js to Go migration story and lessons |
| docs/SESSION_PERSISTENCE.md | How sessions survive context loss — design, data flow, citations |
| docs/MIGRATION.md | Migrating from the deprecated google-researcher-mcp |
Built with Go and the Model Context Protocol
If you're tired of AI making things up, give this a try — and a ⭐ if it helps.
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F
Unity MCP acts as a bridge between AI assistants and your Unity Editor. Give your LLM tools to manage assets, control sc