A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
📄🐹⚡ Go MCP server for multi-format document access — PDF, TXT, MD, DOCX, CSV, images. Install and Go.
Install and Go. One command, single binary. Your AI reads any document — PDF, text, Markdown, DOCX, images.
MCP server for multi-format document access — read, search, extract images, OCR, and fetch documents from URLs via the Model Context Protocol. 13 tools, 6 formats, zero configuration.
go install github.com/drolosoft/go-docs-mcp@latest
# That's it. Single binary, starts in milliseconds.
For a deeper look at why an MCP server beats a direct tool, see Why MCP?
Every other document MCP server handles one format — a PDF server for PDFs, a DOCX server for DOCX. You'd need three separate servers to read three formats.
| Go-Docs MCP | Others | |
|---|---|---|
| Single binary, no runtime | Yes | Need Node/Python |
go install one-liner | Yes | npm+deps or pip+venv |
| Multi-format (6 types) | Yes | One format each |
| Full-text search | Yes | Partial or none |
| OCR (scanned PDFs + images) | Yes | Rare |
| Image & table extraction | Yes | Partial |
| Document outline | Yes | Rare |
| Fetch from URL | Yes | Rare |
| Dir-locked, read-only | Yes | Varies |
| Smart caching | Yes | No |
| Fully offline | Yes | Yes |
Go-Docs MCP reads them all from a single binary — fast, secure, and dependency-free at runtime.
| Category | Tool | Description |
|---|---|---|
| Discovery | list_documents | List all documents with metadata (format, pages, size) |
| Discovery | list_formats | List supported formats and dependency status |
| Reading | read_document | Full text, specific page, or page ranges from any format |
| Reading | read_url | Download from URL and extract text (50MB max) |
| Reading | get_document_summary | First 3 pages as a quick overview |
| Search | search_document | Case-insensitive full-text search with context |
| Analysis | get_document_metadata | Title, author, dates, version, page count |
| Analysis | get_document_outline | Table of contents / bookmarks |
| Analysis | extract_tables | Tables as structured data |
| Analysis | extract_images | Images as base64 (max 10 per call) |
| OCR | ocr_document | Force OCR on scanned/image-based PDFs |
| OCR | read_image | Extract text from PNG, JPG, TIFF via OCR |
| Export | convert_to_markdown | Convert any document to clean Markdown |
Highlights:
| Format | Dependencies | Notes |
|---|---|---|
poppler (pdftotext, pdfinfo, pdfimages, pdftoppm) | Full support — text, images, metadata, OCR fallback | |
| TXT, MD, CSV | None | Native, zero dependencies |
| DOCX | pandoc (optional) | Word document extraction |
| Images (PNG, JPG, TIFF) | tesseract (optional) | OCR text extraction |
# macOS
brew install poppler
brew install tesseract # optional: OCR
brew install pandoc # optional: DOCX
# Debian/Ubuntu
apt install poppler-utils
apt install tesseract-ocr # optional: OCR
apt install pandoc # optional: DOCX
# Fedora/RHEL
dnf install poppler-utils
dnf install tesseract # optional: OCR
dnf install pandoc # optional: DOCX
Note: TXT, MD, and CSV work out of the box with zero dependencies. Install only what you need.
go install github.com/drolosoft/go-docs-mcp@latest
git clone https://github.com/drolosoft/go-docs-mcp.git
cd go-docs-mcp
make build # produces ./go-docs-mcp
make install # installs to /usr/local/bin/
Go-Docs MCP reads documents from a configured directory. Set DOCS_MCP_DIR to change it:
| Variable | Default | Description |
|---|---|---|
DOCS_MCP_DIR | ~/.docs-mcp/documents/ | Directory containing documents to serve |
PDF_MCP_DIR | (legacy alias) | Backward-compatible alias for DOCS_MCP_DIR |
Place your documents in the directory and the server finds them automatically. All supported formats are detected.
Add to your .claude/settings.json:
{
"mcpServers": {
"docs": {
"command": "go-docs-mcp",
"env": {
"DOCS_MCP_DIR": "/path/to/your/documents"
}
}
}
}
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"docs": {
"command": "/usr/local/bin/go-docs-mcp",
"env": {
"DOCS_MCP_DIR": "/path/to/your/documents"
}
}
}
}
The server communicates over stdio using JSON-RPC 2.0:
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | go-docs-mcp
list_documentsLists all documents in the configured directory with format detection.
Parameters: None
Example output:
[
{
"filename": "architecture-guide.pdf",
"format": "pdf",
"title": "architecture-guide",
"pages": 42,
"size_bytes": 1048576
},
{
"filename": "notes.md",
"format": "markdown",
"title": "notes",
"size_bytes": 4096
}
]
list_formatsLists all supported document formats and their dependency status.
Parameters: None
read_documentReads the extracted text content of a document. Automatically falls back to OCR if the document is image-based/scanned and pdftotext returns empty text.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The document filename to read |
page | number | No | Single page number (1-based). Omit for full text. |
pages | string | No | Page ranges, e.g. "1-5", "10", "1-3,7,10-12". Overrides page. |
Example input:
{
"filename": "architecture-guide.pdf",
"pages": "1-3,10-12"
}
search_documentSearches within a document for lines matching a query. Returns matches with 2 lines of context and approximate page numbers.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The document filename to search |
query | string | Yes | Search query (case-insensitive) |
Example output:
Found 3 matches for 'microservice' in architecture-guide.pdf:
--- Match 1 (page ~2, line 45) ---
The system is composed of several
> microservice components that communicate
via gRPC and message queues.
get_document_summaryReturns the text from the first 3 pages of a document as a quick summary.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The document filename to summarize |
get_document_metadataReturns full document metadata.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The document filename to get metadata for |
Example output:
{
"title": "Architecture Guide",
"author": "Jane Doe",
"subject": "System Design",
"creator": "LaTeX",
"producer": "pdfTeX",
"creation_date": "Thu May 15 10:30:00 2025",
"modification_date": "Thu May 15 10:30:00 2025",
"pages": 42,
"file_size_bytes": 1048576,
"pdf_version": "1.5"
}
get_document_outlineExtracts the document outline (table of contents / bookmarks) as a structured list.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The document filename to extract outline from |
extract_tablesExtracts tables from a document as structured data.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The document filename to extract tables from |
page | number | No | Specific page to extract from. Omit for all pages. |
extract_imagesExtracts images from a document as base64-encoded data. Returns up to 10 images per call.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The document filename to extract images from |
page | number | No | Specific page to extract from. Omit for all pages. |
Example output:
[
{
"page": 1,
"index": 0,
"format": "jpeg",
"width": 800,
"height": 600,
"data_base64": "/9j/4AAQSkZJRg..."
}
]
read_urlDownloads a document from a URL and extracts its text content. Maximum file size: 50MB.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The URL of the document to download and read |
pages | string | No | Page ranges to extract, e.g. "1-5". Omit for full text. |
Example input:
{
"url": "https://example.com/report.pdf",
"pages": "1-3"
}
ocr_documentForces OCR on a PDF document using tesseract. Useful for scanned/image-based PDFs or when pdftotext returns garbled text. Requires tesseract and pdftoppm.
Note:
read_documentalready auto-detects image-based PDFs and falls back to OCR. Useocr_documentwhen you want to force OCR regardless, or need to specify a non-English language.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The PDF filename to OCR |
page | number | No | Specific page to OCR (1-based). Omit for all pages. |
language | string | No | Tesseract language code (default: eng). Use spa, fra, etc. |
Example input:
{
"filename": "scanned-contract.pdf",
"page": 1,
"language": "spa"
}
read_imageExtracts text from an image file using OCR. Supports PNG, JPG, and TIFF. Requires tesseract.
Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | The image filename to read (PNG, JPG, TIFF) |
language | string | No | Tesseract language code (default: eng). |
Example input:
{
"filename": "receipt.png",
"language": "eng"
}
DOCS_MCP_DIR are accessible../ rejectedmake build # Build the binary
make test # Run tests with race detector
make clean # Remove build artifacts
go-docs-mcp/
main.go # MCP server setup, 12 tool registrations
internal/
pdf/
reader.go # Document extraction, caching, search, metadata, images, OCR
Makefile # Build targets
go.mod # Module definition
MIT - Copyright 2026 Drolosoft
Drolosoft — Tools we wish existed
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
MCP server integration for DaVinci Resolve Studio
Secure MCP server for MySQL database interaction, queries, and schema management