A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
Unstract uses LLMs to extract structured JSON from documents — PDFs, images, scans, you name it. Define what you want to extract using natural language prompts, and deploy as an API or ETL pipeline.
Built for teams in finance, insurance, healthcare, KYC/compliance, and much more.
| Task | Without Unstract | With Unstract |
|---|---|---|
| Schema definition | Write regex, build templates per vendor | Write a prompt once, handles variations |
| New document type | Days of development | Minutes in Prompt Studio |
| LLM integration | Build your own pipeline | Plug in any provider (OpenAI, Anthropic, Bedrock, Ollama) |
| Deployment | Custom infrastructure | ./run-platform.sh or managed cloud |
| Output | Unstructured text blobs | Clean JSON, ready for your database |
⭐ If Unstract helps you, star this repo!
Prompt Studio — Define document extraction schemas with natural language. Docs →

API Deployment — Send a document over REST API, get JSON back. Docs →

ETL Pipeline — Pull documents from a folder, process them, load to your warehouse. Docs →
MCP Server — Connect to AI agents (Claude, etc.) via Model Context Protocol. Docs →
n8n Node — Drop into existing automation workflows. Docs →
# Clone and start
git clone https://github.com/Zipstack/unstract.git
cd unstract
./run-platform.sh
That's it!
unstract password: unstract# Pull and run entire Unstract platform with default env config.
./run-platform.sh
# Pull and run docker containers with a specific version tag.
./run-platform.sh -v v0.1.0
# Upgrade existing Unstract platform setup by pulling the latest available version.
./run-platform.sh -u
# Upgrade existing Unstract platform setup by pulling a specific version.
./run-platform.sh -u -v v0.2.0
# Build docker images locally as a specific version tag.
./run-platform.sh -b -v v0.1.0
# Build docker images locally from working branch as `current` version tag.
./run-platform.sh -b -v current
# Display the help information.
./run-platform.sh -h
# Only do setup of environment files.
./run-platform.sh -e
# Only do docker images pull with a specific version tag.
./run-platform.sh -p -v v0.1.0
# Only do docker images pull by building locally with a specific version tag.
./run-platform.sh -p -b -v v0.1.0
# Upgrade existing Unstract platform setup with docker images built locally from working branch as `current` version tag.
./run-platform.sh -u -b -v current
# Pull and run docker containers in detached mode.
./run-platform.sh -d -v v0.1.0
[!WARNING] This key encrypts adapter credentials — losing it makes existing adapters inaccessible!
Copy the value of ENCRYPTION_KEY from backend/.env or platform-service/.env to a secure location.
┌────────────────────────────────────────────────────────────┐
│ Unstract │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│ Frontend │ Backend │ Worker │ Platform Service │
│ (React) │ (Django) │ (Celery) │ (FastAPI) │
├─────────────┴─────────────┴─────────────┴──────────────────┤
│ Cache (Redis) │
├────────────────────────────────────────────────────────────┤
│ Message Queue (RabbitMQ) │
├────────────────────────────────────────────────────────────┤
│ Database (PostgreSQL) │
├────────────────────────────────────────────────────────────┤
│ LLM Adapters │ Vector DBs │ Text Extractors │
│ (OpenAI, etc.) │ (Qdrant, etc.) │ (LLMWhisperer) │
└────────────────────────────────────────────────────────────┘
Also see architecture.
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, DOC, ODT, TXT, CSV, JSON |
| Spreadsheets | XLSX, XLS, ODS |
| Presentations | PPTX, PPT, ODP |
| Images | PNG, JPG, JPEG, TIFF, BMP, GIF, WEBP |
| Provider | Status | Provider | Status |
|---|---|---|---|
| OpenAI | ✅ | Azure OpenAI | ✅ |
| OpenAI Compatible | ✅ | Anthropic Claude | ✅ |
| AWS Bedrock | ✅ | Google Gemini | ✅ |
| Ollama (local) | ✅ | Mistral AI | ✅ |
| Anyscale | ✅ |
| Provider | Status | Provider | Status |
|---|---|---|---|
| Qdrant | ✅ | Pinecone | ✅ |
| Weaviate | ✅ | PostgreSQL | ✅ |
| Milvus | ✅ |
| Provider | Status |
|---|---|
| LLMWhisperer | ✅ |
| Unstructured.io | ✅ |
| LlamaIndex Parse | ✅ |
Sources: AWS S3, MinIO, Google Cloud Storage, Azure Blob, Google Drive, Dropbox, SFTP
Destinations: Snowflake, Amazon Redshift, Google BigQuery, PostgreSQL, MySQL, MariaDB, SQL Server, Oracle
Follow these steps to change the default username and password.
# Install pre-commit hooks
./dev-env-cli.sh -p
# Run pre-commit checks
./dev-env-cli.sh -r
Finance & Banking → | Insurance → | Healthcare → | Income Tax →
For teams that need managed infrastructure, advanced accuracy features, or compliance certifications.
We welcome contributions! The easiest way to start:
good first issueReport Bug → | Request Feature →
Join the LLM-powered document automation community:
Unstract integrates Posthog to track minimal usage analytics. Disable by setting REACT_APP_ENABLE_POSTHOG=false in the frontend's .env file.
Unstract is released under the AGPL-3.0 License.
Built with ❤️ by Zipstack
MCP server integration for DaVinci Resolve Studio
mcp-language-server gives MCP enabled clients access semantic tools like get definition, references, rename, and diagnos
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots