Pydantic Deep Agents

The deep agent that forks itself.
Split one task into N parallel branches, let an AI judge merge the winner —
in your terminal, or in one function call. 100% type-safe. Any model. Self-hosted.

Pydantic Deep Agents CLI demo

Docs · PyPI · Forking · Why · CLI · Framework · Examples

Most agents give you one shot at a task. They pick an approach, commit to it, and if it's wrong you start over.

Pydantic Deep Agents can fork mid-run. One agent.run() splits into several branches that each try a different approach in parallel — isolated filesystems, separate budgets, independent reasoning. An AI judge (or you) picks the winner, and its history becomes the run's continuation. It's git branch for an agent's thinking.

That's one feature. There are forty more — planning, multi-agent swarms, persistent memory, sandboxed execution, skills, MCP, checkpoints, cost tracking — all batteries-included, all behind a single function call, all 100% type-safe.

hljs language-bash

# Terminal AI assistant — no Python setup required
curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash
pydantic-deep

hljs language-bash

# Or build your own agent
pip install pydantic-deep

⑂ Live Run Forking — the feature no one else has

Claude Code can't do this. Aider can't. LangGraph and CrewAI can't. It's the reason to use pydantic-deep.

When an agent hits a fork in the road — "should I refactor this with a decorator or a context manager?" — most tools force one bet. Pydantic Deep Agents lets the run branch:

hljs language-less

                                  ┌──  branch A: "use a decorator"      ── tests: 8/8 ✓  conf 0.71
   agent.run("refactor auth") ──┬─┼──  branch B: "use a context manager" ── tests: 6/8 ✗  conf 0.42
       (shared history)         │ └──  branch C: "extract a base class"   ── tests: 8/8 ✓  conf 0.55
                                │
                                └──►  ⚖️  AI judge weighs quality + tests + consistency
                                          → adopts branch A, continues the run

Each branch is fully isolated: a copy-on-write filesystem overlay (reads fall through to the parent, writes stay local), its own steering message, and its own budget_usd cap. The coordinator resolves the fork with one of four acceptance modes — manual, auto, auto_with_fallback (default), or vote — and the winning branch's history is adopted as the parent run's continuation.

Framework — opt in with one flag:

hljs language-python

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    forking=True,                 # gives the agent: fork_run, inspect_branches,
)                                 # merge_or_select, diff_branches, fork_cost, terminate_branch

Or run a real test command against every branch and let exit codes decide the winner:

hljs language-python

from pydantic_deep import LiveForkCapability

agent = create_deep_agent(
    forking=LiveForkCapability(test_command="pytest -q", test_timeout_s=120),
)
# confidence = quality_spread·0.4 + test_pass_ratio·0.4 + internal_consistency·0.2

CLI — fork an in-flight conversation, watch branches stream live, merge the best:

hljs language-css

/fork                 # split the current run into N parallel branches
>>A try a decorator   # steer branch A
>>B use a contextmgr  # steer branch B
/merge                # resolve — manual picker, AI judge, or vote

Live per-branch panels stream each approach side by side; a judge screen scores them; you accept, review the diff, or decline. Configure branch count, budgets, per-branch models, and merge strategy with /fork-config.

📖 Full reference: docs/capabilities/live-fork.md

🆚 Why pydantic-deep?

The only tool that is a terminal assistant and a Python framework and can fork its own runs — without giving up type safety or your choice of model.

	Pydantic Deep	Claude Code	Aider	LangGraph	CrewAI
Terminal TUI assistant	✅	✅	✅	—	—
Python framework / library	✅	—	~	✅	✅
Live run forking + AI judge	✅	—	—	—	—
Multi-agent swarm + message bus	✅	~	—	✅	✅
Any model / any provider	✅	Anthropic	✅	✅	✅
Sandboxed Docker execution	✅	—	~	DIY	DIY
Persistent memory + skills	✅	✅	—	DIY	~
Type-safe structured output	✅	—	—	~	~
MCP servers	✅	✅	—	~	~
Self-hosted, open source	✅ MIT	—	✅	✅	✅

_{✅ first-class · ~ partial / via extensions · — not available · DIY you wire it yourself. Comparison reflects each project as of 2026-06; corrections welcome via PR.}

What's New

2026-06-01 v0.3.24 — Live Run Forking — split an in-flight agent.run() into N parallel branches with copy-on-write isolation, per-branch budgets, a test-runner hook, and four merge modes (manual / auto / auto_with_fallback / vote). Opt in with forking=True.
2026-06-01 v0.3.23 — MCP client support (framework + CLI). Connect GitHub, Figma (OAuth), Context7, DeepWiki, or any custom server. Import servers straight from Claude Code. New interactive /mcp command. Plus a full CLI presentation pass: clipboard image paste, real +/- diffs, tool icons, turn summaries.
2026-06-01 v0.3.23 — Automatic fallback-model retry — fallback_model= wraps your primary in a FallbackModel chain; fires on API errors but never on auth errors. Plus a batteries-included security hook preset (default_security_hook()) and three new output styles (markdown, json-only, bullet).
2026-04-22 v0.3.17 — LiteParse document parsing (include_liteparse=True) — PDFs, DOCX, XLSX, PPTX, and images with optional OCR, all local.
2026-04-10 v0.3.5 — Headless runner (pydantic-deep run), Docker sandbox with named workspaces, browser automation via Playwright.

Full history: CHANGELOG.md

The Agent Harness

Pydantic Deep Agents is an agent harness — the complete infrastructure that wraps an LLM and makes it a functional autonomous agent. The model provides intelligence; the harness provides planning, tools, memory, sandboxed execution, unlimited context, and — uniquely — the ability to fork.

⑂ Live run forking	Split a run into N isolated branches, each trying a different approach. AI judge or test results pick the winner. No other agent framework has this.
🔧 Tool-calling	File read/write/edit, shell execution, glob, grep, web search, web fetch, browser automation — wired up and ready.
🤝 Multi-agent / swarm	Spawn subagents for parallel workstreams. Shared TODO lists with claiming. Peer-to-peer message bus. Full team coordination.
🧠 Persistent memory	MEMORY.md persists across sessions. Auto-injected into the system prompt. Each agent has isolated memory by default.
♾️ Unlimited context	Auto-summarization when approaching the token budget. LLM-based or zero-cost sliding window. Never hits a context wall.
🐳 Sandboxed execution	Docker sandbox with named workspaces. Installed packages persist between sessions. Project dir mounted at /workspace.
🗂️ Plan Mode	Dedicated planner subagent asks clarifying questions and structures the work before execution begins. Headless-compatible.
🔖 Checkpoints	Save conversation state at any point. Rewind to any checkpoint. Fork sessions to explore alternative approaches.
📚 Skills system	Domain-specific knowledge loaded on demand from SKILL.md files. Built-in: code-review, refactor, test-writer, git-workflow, and more.
📄 Document parsing	Parse PDFs, DOCX, XLSX, PPTX, and images with optional OCR via LiteParse. Runs locally — no cloud services required.
🔌 MCP	Connect any Model Context Protocol server — GitHub, Figma (OAuth), Context7, DeepWiki, or custom. Import straight from Claude Code.
⚡ Lifecycle hooks + security preset	Claude Code-style PRE/POST_TOOL_USE hooks. Shell or Python handlers. `default_security_hook()` blocks destructive commands out of the box.
📐 Structured output	Type-safe Pydantic model responses via `output_type`. No JSON parsing. No `dict["key"]`. Full IDE autocomplete.
🔁 Fallback models	Primary model fails? `fallback_model=` hops to the next in the chain — on API errors, never on auth errors.
🔄 Stuck loop detection	Detects repeated identical tool calls, A-B-A-B alternating patterns, and no-op calls. Warns the model or stops the run.
💰 Cost tracking	Real-time token and USD cost tracking per run and cumulative. Hard budget limits with `BudgetExceededError`.
✨ Self-improving	`/improve` analyzes past sessions and proposes updates to MEMORY.md, SOUL.md, and AGENTS.md.
🏷️ 100% type-safe	Pyright strict + MyPy strict. 100% test coverage. Every public API is fully typed — safe to use in production.

Built natively on pydantic-ai — uses the Capabilities API directly, inherits all pydantic-ai streaming, multi-model support, and Pydantic validation automatically.

🖥️ CLI — Terminal AI Assistant

A Claude Code-style terminal AI assistant that works with any model and any provider — and forks.

Install (macOS & Linux)

hljs language-bash

curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash

No Python setup required — the script installs uv and the CLI automatically. Then:

hljs language-bash

export ANTHROPIC_API_KEY=sk-ant-...
pydantic-deep

Windows / manual: pip install "pydantic-deep[cli]" · Update: pydantic-deep update

Model & Provider Support

Works with any model that supports tool-calling:

Provider	Example models
Anthropic	`anthropic:claude-opus-4-6`, `claude-sonnet-4-6`
OpenAI	`openai:gpt-5.4`, `gpt-4.1`
OpenRouter	`openrouter:anthropic/claude-opus-4-6` (200+ models)
Google Gemini	`google-gla:gemini-2.5-pro`
Ollama (local)	`ollama:qwen3`, `ollama:llama3.3`
Any OpenAI-compatible	Custom base URL via env

Switch model anytime: pydantic-deep config set model openai:gpt-5.4 or /model in the TUI.

What you get in the TUI

	Feature
⑂	Live run forking — split a run into branches, stream them side by side, merge the winner
💬	Streaming chat with tool call visualization, icons, and real `+/-` diffs
📁	File read / write / edit, shell execution, glob, grep
🤝	Task planning, plan mode, and subagent delegation
🧠	Persistent memory and self-improvement across sessions
♾️	Context compression for unlimited conversations
🔖	Checkpoints — save, rewind, and fork any session
🔌	MCP servers via `/mcp` — GitHub, Figma (OAuth), and more; import from Claude Code
🌐	Web search & fetch built-in · 🖥️ browser automation via Playwright (`--browser`)
🐳	Docker sandbox — sandboxed execution with named workspaces
💭	Extended thinking — `minimal` / `low` / `medium` / `high` / `xhigh`
📋	Clipboard image paste (`Ctrl+V` / `/paste`) — multimodal prompts
💰	Real-time cost and token tracking per session
🛡️	Tool approval dialogs — approve, auto-approve, or deny per tool call
@	`@filename` file references · `!command` shell passthrough
✨	`/fork`, `/merge`, `/improve`, `/skills`, `/mcp`, `/model`, `/theme`, `/compact`, and more

Usage

hljs language-bash

# Interactive TUI (default)
pydantic-deep
pydantic-deep tui --model openrouter:anthropic/claude-opus-4-6

# Headless deep agent — benchmarks, CI/CD, scripted automation
pydantic-deep run "Fix the failing test in test_auth.py"
pydantic-deep run --task-file task.md --json

# Docker sandbox — sandboxed execution, project dir mounted at /workspace
pydantic-deep tui --sandbox docker
pydantic-deep tui --workspace ml-env     # named workspace, packages persist

# Browser automation (requires pydantic-deep[browser])
pydantic-deep tui --browser

# Config & skills
pydantic-deep config set model anthropic:claude-sonnet-4-6
pydantic-deep skills list
pydantic-deep update                     # update to latest version

See CLI docs for the full reference.

🐍 Framework — Build Your Own Agent

hljs language-bash

pip install pydantic-deep

One function call gives you a production deep agent with planning, tool-calling, multi-agent delegation, persistent memory, unlimited context, forking, and cost tracking. Everything is a toggle:

hljs language-python

from pydantic_ai_backends import StateBackend
from pydantic_deep import create_deep_agent, create_default_deps

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    forking=True,               # ⑂ split a run into parallel branches + AI judge
    include_todo=True,          # Task planning with subtasks and dependencies
    include_subagents=True,     # Multi-agent swarm — delegate to subagents
    include_skills=True,        # Domain-specific skills from SKILL.md files
    include_memory=True,        # Persistent memory across sessions
    include_plan=True,          # Structured planning before execution
    include_teams=True,         # Agent teams with shared TODO lists + message bus
    include_liteparse=True,     # Document parsing — PDF, DOCX, XLSX + OCR
    web_search=True,            # Tool-calling: web search
    thinking="high",            # Extended thinking / reasoning effort
    context_manager=True,       # Unlimited context via auto-summarization
    cost_tracking=True,         # Token/USD budget enforcement
    fallback_model="openai:gpt-5.4",   # auto-retry if the primary model fails
    include_checkpoints=True,   # Save, rewind, and fork conversations
)

deps = create_default_deps(StateBackend())
result = await agent.run("Build a REST API for user auth", deps=deps)

Structured Output

Type-safe responses with Pydantic models — no JSON parsing, no dict["key"]:

hljs language-python

from pydantic import BaseModel

class CodeReview(BaseModel):
    summary: str
    issues: list[str]
    score: int

agent = create_deep_agent(output_type=CodeReview)
result = await agent.run("Review the auth module", deps=deps)
print(result.output.score)  # fully typed

Multi-Agent Swarm

Spawn isolated subagents for parallel workstreams. Each subagent is a full deep agent with its own tool-calling, memory, and context:

hljs language-python

agent = create_deep_agent(
    subagents=[
        {
            "name": "researcher",
            "description": "Researches topics using web search",
            "instructions": "Search the web, synthesize findings, cite sources.",
        },
        {
            "name": "code-reviewer",
            "description": "Reviews code for quality, security, and performance",
            "instructions": "Check for security issues, N+1 queries, missing tests...",
        },
    ],
)
# Main agent delegates: task(description="Review auth.py", subagent_type="code-reviewer")

Claude Code-Style Lifecycle Hooks + Security Preset

hljs language-python

from pydantic_deep import create_deep_agent, default_security_hook, Hook, HookEvent

agent = create_deep_agent(
    hooks=[
        *default_security_hook(),   # blocks destructive shell, path traversal, secret leaks
        Hook(
            event=HookEvent.PRE_TOOL_USE,
            command="echo 'Tool: $TOOL_NAME args: $TOOL_INPUT' >> /tmp/audit.log",
        ),
    ],
)

MCP Servers

Connect GitHub, Figma (OAuth), Context7, DeepWiki, or any custom server — auth handled for you:

hljs language-python

from pydantic_deep import create_deep_agent, build_mcp_server, MCPServerConfig

deepwiki = build_mcp_server(
    MCPServerConfig(name="deepwiki", transport="http", url="https://mcp.deepwiki.com/mcp")
)

agent = create_deep_agent(mcp_servers=[deepwiki])   # curated defaults via builtin_mcp_servers()

Context Files

Pydantic Deep Agents auto-discovers and injects project-specific context into every conversation:

File	Purpose	Who Sees It
`AGENTS.md`	Project conventions, architecture, instructions	Main agent + all subagents
`CLAUDE.md`	Claude Code project instructions	Main agent + all subagents
`SOUL.md`	Agent personality, style, communication preferences	Main agent only
`.cursorrules`	Cursor editor conventions	Main agent only
`MEMORY.md`	Persistent memory — read/write/update tools	Per-agent (isolated)

Compatible with Claude Code, Cursor, GitHub Copilot, and other agent frameworks. AGENTS.md follows the agents.md spec.

See the full API reference for all options.

🔬 DeepResearch — Reference App

A full-featured research deep agent with web UI — built entirely on Pydantic Deep Agents.

Plan Mode — planner asks clarifying questions	Multi-Agent Swarm — 5 subagents researching in parallel
Excalidraw Canvas — live diagrams synced with agent	File Browser — workspace files with inline preview

Web search (Tavily, Brave, Jina), sandboxed code execution, Excalidraw diagrams, plan mode, report export.

hljs language-bash

cd apps/deepresearch && uv sync && cp .env.example .env
uv run deepresearch    # → http://localhost:8080

See apps/deepresearch/README.md for full setup.

Architecture

Pydantic Deep Agents uses pydantic-ai's native Capabilities API for all cross-cutting concerns — forking, hooks, memory, skills, context files, teams, and plan mode are all first-class pydantic-ai capabilities.

hljs language-sql

                         Pydantic Deep Agents
+---------------------------------------------------------------------+
|                                                                     |
|   +----------+ +----------+ +----------+ +----------+ +---------+   |
|   | Planning | |Filesystem| | Subagents| |  Skills  | |  Teams  |   |
|   +----+-----+ +----+-----+ +----+-----+ +----+-----+ +----+----+   |
|        |            |            |            |            |        |
|        +------------+-----+------+------------+------------+        |
|                           |                                         |
|                           v                                         |
|  Forking       --> +------------------+ <-- Capabilities            |
|  Summarization --> |    Deep Agent    | <-- Hooks                   |
|  Checkpointing --> |   (pydantic-ai)  | <-- Memory                  |
|  Cost Tracking --> |                  | <-- MCP                     |
|                    +--------+---------+                             |
|                             |                                       |
|           +-----------------+-----------------+                     |
|           v                 v                 v                     |
|    +------------+    +------------+    +------------+               |
|    |   State    |    |   Local    |    |   Docker   |               |
|    |  Backend   |    |  Backend   |    |  Sandbox   |               |
|    +------------+    +------------+    +------------+               |
|                                                                     |
+---------------------------------------------------------------------+

Modular Packages

Every component is a standalone package — use only what you need:

Package	What It Does
pydantic-ai-backend	File storage, Docker sandbox, console toolset
pydantic-ai-todo	Task planning with subtasks and dependencies
subagents-pydantic-ai	Sync/async delegation, background tasks, cancellation
summarization-pydantic-ai	LLM summaries or zero-cost sliding window
pydantic-ai-shields	Cost tracking, input/output/tool blocking

Full Feature List

Expand

Live Run Forking

Split an in-flight agent.run() into N parallel branches sharing history up to the fork point
Copy-on-write BranchOverlay filesystem isolation — reads fall through to parent, writes stay local
Per-branch steering messages, per-branch budget_usd caps, aggregate budget enforcement
Four merge modes: manual, auto, auto_with_fallback (default), vote
Autonomous JudgeAgent with structured JudgeVerdict; compute_confidence blends quality, test pass ratio, and consistency
Test-runner hook — run a shell command against each branch's snapshot; exit code feeds the judge
Agent tools: fork_run, inspect_branches, merge_or_select, terminate_branch, diff_branches, fork_cost
CLI: /fork, /merge, /fork-config, live per-branch streaming panels, judge screen, merge acceptance gate

Tool-Calling

ls, read_file, write_file, edit_file, glob, grep, execute — full filesystem access
Docker sandbox with named workspaces — sandboxed execution, packages persist between sessions
Web search (DuckDuckGo, Tavily, Brave) and web fetch
Browser automation via Playwright — navigate, click, type_text, screenshot, execute_js, and more

Deep Agent Architecture

Planning — Task tracking with subtasks, dependencies, and cycle detection
Subagents / Multi-agent swarm — Sync/async delegation, background task management, soft/hard cancellation
Agent Teams — Shared TODO lists with claiming and dependency tracking, peer-to-peer message bus
Plan Mode — Dedicated planner subagent for structured planning before execution
Persistent memory — MEMORY.md that persists across sessions, auto-injected into system prompt
Self-improving — /improve analyzes past sessions, proposes updates to context files

Context & Memory

Unlimited context — Auto-summarization when approaching token budget (LLM-based or sliding window)
Context limit warnings — Model receives URGENT/CRITICAL messages when approaching 70% context usage
Eviction capability — Intercepts large tool outputs via after_tool_execute before they enter history
Context files — Auto-discover AGENTS.md, CLAUDE.md, SOUL.md, .cursorrules, copilot-instructions, and more
Checkpoints — Save state, rewind or fork conversations. In-memory and file-based stores

Production Features

MCP — Connect any Model Context Protocol server; import from Claude Code; OAuth + keystore auth
Lifecycle hooks — Claude Code-style PRE/POST_TOOL_USE. Shell commands or Python handlers
Security preset — default_security_hook() blocks destructive commands, path traversal, secret leaks
Fallback models — fallback_model= chains; fires on API errors, never on auth errors
Structured output — Type-safe responses with Pydantic models via output_type
Cost tracking — Token/USD budgets with automatic enforcement and real-time callbacks
Output styles — Built-in (concise, explanatory, formal, conversational, markdown, json-only, bullet) or custom
Streaming · Image support · Human-in-the-loop confirmation workflows

CLI

Interactive TUI (Textual) with streaming, tool visualization, live fork panels, session management
Headless runner (pydantic-deep run) for CI/CD, benchmarks, scripted automation
25+ slash commands: /fork, /merge, /mcp, /improve, /compact, /diff, /model, /skills, /theme, and more
@filename file references, !command shell passthrough, clipboard image paste
Tool approval dialogs with auto-approve · debug logging per session

Contributing

hljs language-bash

git clone https://github.com/vstorm-co/pydantic-deep.git
cd pydantic-deep
make install
make test   # 100% coverage required
make all    # lint + typecheck + test

See CONTRIBUTING.md. Good first issues are labeled here.

Vstorm OSS Ecosystem

pydantic-deep is part of a broader open-source ecosystem for production AI agents:

Project	Description	Stars
full-stack-ai-agent-template	Zero to production AI app in 30 minutes. FastAPI + Next.js 15, 6 AI frameworks (incl. pydantic-deep), RAG pipeline, 75+ config options.
pydantic-ai-shields	Drop-in guardrails for Pydantic AI agents. 5 infra + 5 content shields.
pydantic-ai-subagents	Declarative multi-agent orchestration with token tracking.
pydantic-ai-summarization	Smart context compression for long-running agents.
pydantic-ai-backend	Sandboxed execution for AI agents. Docker + Daytona.
content-skills	Claude Code content studio — blog, social, slides, video, infographics — all brand-aware.
production-stack-skills	Claude Code skills for production-grade FastAPI, PostgreSQL, Docker, and observability.

Want the full stack? Use full-stack-ai-agent-template — it ships pydantic-deep integrated with FastAPI, Next.js, auth, WebSocket streaming, and RAG out of the box.

Browse all projects at oss.vstorm.co

Star History

If pydantic-deep saved you from wiring an agent harness by hand — give it a ⭐. It's the single biggest thing that helps the project grow.

License

MIT — see LICENSE

Need help shipping AI agents in production?

We're Vstorm — an Applied Agentic AI Engineering Consultancy
with 30+ production agent implementations. Pydantic Deep Agents is what we build them with.

Made with care by Vstorm

Pydantic Deep Agents

Pydantic Deep Agents CLI demo

Docs · PyPI · Forking · Why · CLI · Framework · Examples

Most agents give you one shot at a task. They pick an approach, commit to it, and if it's wrong you start over.

hljs language-bash

# Terminal AI assistant — no Python setup required
curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash
pydantic-deep

hljs language-bash

# Or build your own agent
pip install pydantic-deep

⑂ Live Run Forking — the feature no one else has

Claude Code can't do this. Aider can't. LangGraph and CrewAI can't. It's the reason to use pydantic-deep.

When an agent hits a fork in the road — "should I refactor this with a decorator or a context manager?" — most tools force one bet. Pydantic Deep Agents lets the run branch:

hljs language-less

                                  ┌──  branch A: "use a decorator"      ── tests: 8/8 ✓  conf 0.71
   agent.run("refactor auth") ──┬─┼──  branch B: "use a context manager" ── tests: 6/8 ✗  conf 0.42
       (shared history)         │ └──  branch C: "extract a base class"   ── tests: 8/8 ✓  conf 0.55
                                │
                                └──►  ⚖️  AI judge weighs quality + tests + consistency
                                          → adopts branch A, continues the run

Framework — opt in with one flag:

hljs language-python

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    forking=True,                 # gives the agent: fork_run, inspect_branches,
)                                 # merge_or_select, diff_branches, fork_cost, terminate_branch

Or run a real test command against every branch and let exit codes decide the winner:

hljs language-python

from pydantic_deep import LiveForkCapability

agent = create_deep_agent(
    forking=LiveForkCapability(test_command="pytest -q", test_timeout_s=120),
)
# confidence = quality_spread·0.4 + test_pass_ratio·0.4 + internal_consistency·0.2

CLI — fork an in-flight conversation, watch branches stream live, merge the best:

hljs language-css

/fork                 # split the current run into N parallel branches
>>A try a decorator   # steer branch A
>>B use a contextmgr  # steer branch B
/merge                # resolve — manual picker, AI judge, or vote

📖 Full reference: docs/capabilities/live-fork.md

🆚 Why pydantic-deep?

The only tool that is a terminal assistant and a Python framework and can fork its own runs — without giving up type safety or your choice of model.

	Pydantic Deep	Claude Code	Aider	LangGraph	CrewAI
Terminal TUI assistant	✅	✅	✅	—	—
Python framework / library	✅	—	~	✅	✅
Live run forking + AI judge	✅	—	—	—	—
Multi-agent swarm + message bus	✅	~	—	✅	✅
Any model / any provider	✅	Anthropic	✅	✅	✅
Sandboxed Docker execution	✅	—	~	DIY	DIY
Persistent memory + skills	✅	✅	—	DIY	~
Type-safe structured output	✅	—	—	~	~
MCP servers	✅	✅	—	~	~
Self-hosted, open source	✅ MIT	—	✅	✅	✅

_{✅ first-class · ~ partial / via extensions · — not available · DIY you wire it yourself. Comparison reflects each project as of 2026-06; corrections welcome via PR.}

What's New

2026-06-01 v0.3.24 — Live Run Forking — split an in-flight agent.run() into N parallel branches with copy-on-write isolation, per-branch budgets, a test-runner hook, and four merge modes (manual / auto / auto_with_fallback / vote). Opt in with forking=True.
2026-06-01 v0.3.23 — MCP client support (framework + CLI). Connect GitHub, Figma (OAuth), Context7, DeepWiki, or any custom server. Import servers straight from Claude Code. New interactive /mcp command. Plus a full CLI presentation pass: clipboard image paste, real +/- diffs, tool icons, turn summaries.
2026-06-01 v0.3.23 — Automatic fallback-model retry — fallback_model= wraps your primary in a FallbackModel chain; fires on API errors but never on auth errors. Plus a batteries-included security hook preset (default_security_hook()) and three new output styles (markdown, json-only, bullet).
2026-04-22 v0.3.17 — LiteParse document parsing (include_liteparse=True) — PDFs, DOCX, XLSX, PPTX, and images with optional OCR, all local.
2026-04-10 v0.3.5 — Headless runner (pydantic-deep run), Docker sandbox with named workspaces, browser automation via Playwright.

Full history: CHANGELOG.md

The Agent Harness

⑂ Live run forking	Split a run into N isolated branches, each trying a different approach. AI judge or test results pick the winner. No other agent framework has this.
🔧 Tool-calling	File read/write/edit, shell execution, glob, grep, web search, web fetch, browser automation — wired up and ready.
🤝 Multi-agent / swarm	Spawn subagents for parallel workstreams. Shared TODO lists with claiming. Peer-to-peer message bus. Full team coordination.
🧠 Persistent memory	MEMORY.md persists across sessions. Auto-injected into the system prompt. Each agent has isolated memory by default.
♾️ Unlimited context	Auto-summarization when approaching the token budget. LLM-based or zero-cost sliding window. Never hits a context wall.
🐳 Sandboxed execution	Docker sandbox with named workspaces. Installed packages persist between sessions. Project dir mounted at /workspace.
🗂️ Plan Mode	Dedicated planner subagent asks clarifying questions and structures the work before execution begins. Headless-compatible.
🔖 Checkpoints	Save conversation state at any point. Rewind to any checkpoint. Fork sessions to explore alternative approaches.
📚 Skills system	Domain-specific knowledge loaded on demand from SKILL.md files. Built-in: code-review, refactor, test-writer, git-workflow, and more.
📄 Document parsing	Parse PDFs, DOCX, XLSX, PPTX, and images with optional OCR via LiteParse. Runs locally — no cloud services required.
🔌 MCP	Connect any Model Context Protocol server — GitHub, Figma (OAuth), Context7, DeepWiki, or custom. Import straight from Claude Code.
⚡ Lifecycle hooks + security preset	Claude Code-style PRE/POST_TOOL_USE hooks. Shell or Python handlers. `default_security_hook()` blocks destructive commands out of the box.
📐 Structured output	Type-safe Pydantic model responses via `output_type`. No JSON parsing. No `dict["key"]`. Full IDE autocomplete.
🔁 Fallback models	Primary model fails? `fallback_model=` hops to the next in the chain — on API errors, never on auth errors.
🔄 Stuck loop detection	Detects repeated identical tool calls, A-B-A-B alternating patterns, and no-op calls. Warns the model or stops the run.
💰 Cost tracking	Real-time token and USD cost tracking per run and cumulative. Hard budget limits with `BudgetExceededError`.
✨ Self-improving	`/improve` analyzes past sessions and proposes updates to MEMORY.md, SOUL.md, and AGENTS.md.
🏷️ 100% type-safe	Pyright strict + MyPy strict. 100% test coverage. Every public API is fully typed — safe to use in production.

Built natively on pydantic-ai — uses the Capabilities API directly, inherits all pydantic-ai streaming, multi-model support, and Pydantic validation automatically.

🖥️ CLI — Terminal AI Assistant

A Claude Code-style terminal AI assistant that works with any model and any provider — and forks.

Install (macOS & Linux)

hljs language-bash

curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash

No Python setup required — the script installs uv and the CLI automatically. Then:

hljs language-bash

export ANTHROPIC_API_KEY=sk-ant-...
pydantic-deep

Windows / manual: pip install "pydantic-deep[cli]" · Update: pydantic-deep update

Model & Provider Support

Works with any model that supports tool-calling:

Provider	Example models
Anthropic	`anthropic:claude-opus-4-6`, `claude-sonnet-4-6`
OpenAI	`openai:gpt-5.4`, `gpt-4.1`
OpenRouter	`openrouter:anthropic/claude-opus-4-6` (200+ models)
Google Gemini	`google-gla:gemini-2.5-pro`
Ollama (local)	`ollama:qwen3`, `ollama:llama3.3`
Any OpenAI-compatible	Custom base URL via env

Switch model anytime: pydantic-deep config set model openai:gpt-5.4 or /model in the TUI.

What you get in the TUI

	Feature
⑂	Live run forking — split a run into branches, stream them side by side, merge the winner
💬	Streaming chat with tool call visualization, icons, and real `+/-` diffs
📁	File read / write / edit, shell execution, glob, grep
🤝	Task planning, plan mode, and subagent delegation
🧠	Persistent memory and self-improvement across sessions
♾️	Context compression for unlimited conversations
🔖	Checkpoints — save, rewind, and fork any session
🔌	MCP servers via `/mcp` — GitHub, Figma (OAuth), and more; import from Claude Code
🌐	Web search & fetch built-in · 🖥️ browser automation via Playwright (`--browser`)
🐳	Docker sandbox — sandboxed execution with named workspaces
💭	Extended thinking — `minimal` / `low` / `medium` / `high` / `xhigh`
📋	Clipboard image paste (`Ctrl+V` / `/paste`) — multimodal prompts
💰	Real-time cost and token tracking per session
🛡️	Tool approval dialogs — approve, auto-approve, or deny per tool call
@	`@filename` file references · `!command` shell passthrough
✨	`/fork`, `/merge`, `/improve`, `/skills`, `/mcp`, `/model`, `/theme`, `/compact`, and more

Usage

hljs language-bash

# Interactive TUI (default)
pydantic-deep
pydantic-deep tui --model openrouter:anthropic/claude-opus-4-6

# Headless deep agent — benchmarks, CI/CD, scripted automation
pydantic-deep run "Fix the failing test in test_auth.py"
pydantic-deep run --task-file task.md --json

# Docker sandbox — sandboxed execution, project dir mounted at /workspace
pydantic-deep tui --sandbox docker
pydantic-deep tui --workspace ml-env     # named workspace, packages persist

# Browser automation (requires pydantic-deep[browser])
pydantic-deep tui --browser

# Config & skills
pydantic-deep config set model anthropic:claude-sonnet-4-6
pydantic-deep skills list
pydantic-deep update                     # update to latest version

See CLI docs for the full reference.

🐍 Framework — Build Your Own Agent

hljs language-bash

pip install pydantic-deep

One function call gives you a production deep agent with planning, tool-calling, multi-agent delegation, persistent memory, unlimited context, forking, and cost tracking. Everything is a toggle:

hljs language-python

from pydantic_ai_backends import StateBackend
from pydantic_deep import create_deep_agent, create_default_deps

agent = create_deep_agent(
    model="anthropic:claude-sonnet-4-6",
    forking=True,               # ⑂ split a run into parallel branches + AI judge
    include_todo=True,          # Task planning with subtasks and dependencies
    include_subagents=True,     # Multi-agent swarm — delegate to subagents
    include_skills=True,        # Domain-specific skills from SKILL.md files
    include_memory=True,        # Persistent memory across sessions
    include_plan=True,          # Structured planning before execution
    include_teams=True,         # Agent teams with shared TODO lists + message bus
    include_liteparse=True,     # Document parsing — PDF, DOCX, XLSX + OCR
    web_search=True,            # Tool-calling: web search
    thinking="high",            # Extended thinking / reasoning effort
    context_manager=True,       # Unlimited context via auto-summarization
    cost_tracking=True,         # Token/USD budget enforcement
    fallback_model="openai:gpt-5.4",   # auto-retry if the primary model fails
    include_checkpoints=True,   # Save, rewind, and fork conversations
)

deps = create_default_deps(StateBackend())
result = await agent.run("Build a REST API for user auth", deps=deps)

Structured Output

Type-safe responses with Pydantic models — no JSON parsing, no dict["key"]:

hljs language-python

from pydantic import BaseModel

class CodeReview(BaseModel):
    summary: str
    issues: list[str]
    score: int

agent = create_deep_agent(output_type=CodeReview)
result = await agent.run("Review the auth module", deps=deps)
print(result.output.score)  # fully typed

Multi-Agent Swarm

Spawn isolated subagents for parallel workstreams. Each subagent is a full deep agent with its own tool-calling, memory, and context:

hljs language-python

agent = create_deep_agent(
    subagents=[
        {
            "name": "researcher",
            "description": "Researches topics using web search",
            "instructions": "Search the web, synthesize findings, cite sources.",
        },
        {
            "name": "code-reviewer",
            "description": "Reviews code for quality, security, and performance",
            "instructions": "Check for security issues, N+1 queries, missing tests...",
        },
    ],
)
# Main agent delegates: task(description="Review auth.py", subagent_type="code-reviewer")

Claude Code-Style Lifecycle Hooks + Security Preset

hljs language-python

from pydantic_deep import create_deep_agent, default_security_hook, Hook, HookEvent

agent = create_deep_agent(
    hooks=[
        *default_security_hook(),   # blocks destructive shell, path traversal, secret leaks
        Hook(
            event=HookEvent.PRE_TOOL_USE,
            command="echo 'Tool: $TOOL_NAME args: $TOOL_INPUT' >> /tmp/audit.log",
        ),
    ],
)

MCP Servers

Connect GitHub, Figma (OAuth), Context7, DeepWiki, or any custom server — auth handled for you:

hljs language-python

from pydantic_deep import create_deep_agent, build_mcp_server, MCPServerConfig

deepwiki = build_mcp_server(
    MCPServerConfig(name="deepwiki", transport="http", url="https://mcp.deepwiki.com/mcp")
)

agent = create_deep_agent(mcp_servers=[deepwiki])   # curated defaults via builtin_mcp_servers()

Context Files

Pydantic Deep Agents auto-discovers and injects project-specific context into every conversation:

File	Purpose	Who Sees It
`AGENTS.md`	Project conventions, architecture, instructions	Main agent + all subagents
`CLAUDE.md`	Claude Code project instructions	Main agent + all subagents
`SOUL.md`	Agent personality, style, communication preferences	Main agent only
`.cursorrules`	Cursor editor conventions	Main agent only
`MEMORY.md`	Persistent memory — read/write/update tools	Per-agent (isolated)

Compatible with Claude Code, Cursor, GitHub Copilot, and other agent frameworks. AGENTS.md follows the agents.md spec.

See the full API reference for all options.

🔬 DeepResearch — Reference App

A full-featured research deep agent with web UI — built entirely on Pydantic Deep Agents.

Plan Mode — planner asks clarifying questions	Multi-Agent Swarm — 5 subagents researching in parallel
Excalidraw Canvas — live diagrams synced with agent	File Browser — workspace files with inline preview

Web search (Tavily, Brave, Jina), sandboxed code execution, Excalidraw diagrams, plan mode, report export.

hljs language-bash

cd apps/deepresearch && uv sync && cp .env.example .env
uv run deepresearch    # → http://localhost:8080

See apps/deepresearch/README.md for full setup.

Architecture

hljs language-sql

                         Pydantic Deep Agents
+---------------------------------------------------------------------+
|                                                                     |
|   +----------+ +----------+ +----------+ +----------+ +---------+   |
|   | Planning | |Filesystem| | Subagents| |  Skills  | |  Teams  |   |
|   +----+-----+ +----+-----+ +----+-----+ +----+-----+ +----+----+   |
|        |            |            |            |            |        |
|        +------------+-----+------+------------+------------+        |
|                           |                                         |
|                           v                                         |
|  Forking       --> +------------------+ <-- Capabilities            |
|  Summarization --> |    Deep Agent    | <-- Hooks                   |
|  Checkpointing --> |   (pydantic-ai)  | <-- Memory                  |
|  Cost Tracking --> |                  | <-- MCP                     |
|                    +--------+---------+                             |
|                             |                                       |
|           +-----------------+-----------------+                     |
|           v                 v                 v                     |
|    +------------+    +------------+    +------------+               |
|    |   State    |    |   Local    |    |   Docker   |               |
|    |  Backend   |    |  Backend   |    |  Sandbox   |               |
|    +------------+    +------------+    +------------+               |
|                                                                     |
+---------------------------------------------------------------------+

Modular Packages

Every component is a standalone package — use only what you need:

Package	What It Does
pydantic-ai-backend	File storage, Docker sandbox, console toolset
pydantic-ai-todo	Task planning with subtasks and dependencies
subagents-pydantic-ai	Sync/async delegation, background tasks, cancellation
summarization-pydantic-ai	LLM summaries or zero-cost sliding window
pydantic-ai-shields	Cost tracking, input/output/tool blocking

Full Feature List

Expand

Live Run Forking

Split an in-flight agent.run() into N parallel branches sharing history up to the fork point
Copy-on-write BranchOverlay filesystem isolation — reads fall through to parent, writes stay local
Per-branch steering messages, per-branch budget_usd caps, aggregate budget enforcement
Four merge modes: manual, auto, auto_with_fallback (default), vote
Autonomous JudgeAgent with structured JudgeVerdict; compute_confidence blends quality, test pass ratio, and consistency
Test-runner hook — run a shell command against each branch's snapshot; exit code feeds the judge
Agent tools: fork_run, inspect_branches, merge_or_select, terminate_branch, diff_branches, fork_cost
CLI: /fork, /merge, /fork-config, live per-branch streaming panels, judge screen, merge acceptance gate

Tool-Calling

ls, read_file, write_file, edit_file, glob, grep, execute — full filesystem access
Docker sandbox with named workspaces — sandboxed execution, packages persist between sessions
Web search (DuckDuckGo, Tavily, Brave) and web fetch
Browser automation via Playwright — navigate, click, type_text, screenshot, execute_js, and more

Deep Agent Architecture

Planning — Task tracking with subtasks, dependencies, and cycle detection
Subagents / Multi-agent swarm — Sync/async delegation, background task management, soft/hard cancellation
Agent Teams — Shared TODO lists with claiming and dependency tracking, peer-to-peer message bus
Plan Mode — Dedicated planner subagent for structured planning before execution
Persistent memory — MEMORY.md that persists across sessions, auto-injected into system prompt
Self-improving — /improve analyzes past sessions, proposes updates to context files

Context & Memory

Unlimited context — Auto-summarization when approaching token budget (LLM-based or sliding window)
Context limit warnings — Model receives URGENT/CRITICAL messages when approaching 70% context usage
Eviction capability — Intercepts large tool outputs via after_tool_execute before they enter history
Context files — Auto-discover AGENTS.md, CLAUDE.md, SOUL.md, .cursorrules, copilot-instructions, and more
Checkpoints — Save state, rewind or fork conversations. In-memory and file-based stores

Production Features

MCP — Connect any Model Context Protocol server; import from Claude Code; OAuth + keystore auth
Lifecycle hooks — Claude Code-style PRE/POST_TOOL_USE. Shell commands or Python handlers
Security preset — default_security_hook() blocks destructive commands, path traversal, secret leaks
Fallback models — fallback_model= chains; fires on API errors, never on auth errors
Structured output — Type-safe responses with Pydantic models via output_type
Cost tracking — Token/USD budgets with automatic enforcement and real-time callbacks
Output styles — Built-in (concise, explanatory, formal, conversational, markdown, json-only, bullet) or custom
Streaming · Image support · Human-in-the-loop confirmation workflows

CLI

Interactive TUI (Textual) with streaming, tool visualization, live fork panels, session management
Headless runner (pydantic-deep run) for CI/CD, benchmarks, scripted automation
25+ slash commands: /fork, /merge, /mcp, /improve, /compact, /diff, /model, /skills, /theme, and more
@filename file references, !command shell passthrough, clipboard image paste
Tool approval dialogs with auto-approve · debug logging per session

Contributing

hljs language-bash

git clone https://github.com/vstorm-co/pydantic-deep.git
cd pydantic-deep
make install
make test   # 100% coverage required
make all    # lint + typecheck + test

See CONTRIBUTING.md. Good first issues are labeled here.

Vstorm OSS Ecosystem

pydantic-deep is part of a broader open-source ecosystem for production AI agents:

Project	Description	Stars
full-stack-ai-agent-template	Zero to production AI app in 30 minutes. FastAPI + Next.js 15, 6 AI frameworks (incl. pydantic-deep), RAG pipeline, 75+ config options.
pydantic-ai-shields	Drop-in guardrails for Pydantic AI agents. 5 infra + 5 content shields.
pydantic-ai-subagents	Declarative multi-agent orchestration with token tracking.
pydantic-ai-summarization	Smart context compression for long-running agents.
pydantic-ai-backend	Sandboxed execution for AI agents. Docker + Daytona.
content-skills	Claude Code content studio — blog, social, slides, video, infographics — all brand-aware.
production-stack-skills	Claude Code skills for production-grade FastAPI, PostgreSQL, Docker, and observability.

Want the full stack? Use full-stack-ai-agent-template — it ships pydantic-deep integrated with FastAPI, Next.js, auth, WebSocket streaming, and RAG out of the box.

Browse all projects at oss.vstorm.co

Star History

If pydantic-deep saved you from wiring an agent harness by hand — give it a ⭐. It's the single biggest thing that helps the project grow.

License

MIT — see LICENSE

Need help shipping AI agents in production?

We're Vstorm — an Applied Agentic AI Engineering Consultancy
with 30+ production agent implementations. Pydantic Deep Agents is what we build them with.

Made with care by Vstorm

pydantic-deepagents

Pydantic Deep Agents

⑂ Live Run Forking — the feature no one else has

🆚 Why pydantic-deep?

What's New

The Agent Harness

🖥️ CLI — Terminal AI Assistant

Install (macOS & Linux)

Model & Provider Support

What you get in the TUI

Usage

🐍 Framework — Build Your Own Agent

Structured Output

Multi-Agent Swarm

Claude Code-Style Lifecycle Hooks + Security Preset

MCP Servers

Context Files

🔬 DeepResearch — Reference App

Architecture

Modular Packages

Full Feature List

Live Run Forking

Tool-Calling

Deep Agent Architecture

Context & Memory

Production Features

CLI

Contributing

Vstorm OSS Ecosystem

Star History

License

Need help shipping AI agents in production?

Similar Packages

pydantic-deepagents

Pydantic Deep Agents

⑂ Live Run Forking — the feature no one else has

🆚 Why pydantic-deep?

What's New

The Agent Harness

🖥️ CLI — Terminal AI Assistant

Install (macOS & Linux)

Model & Provider Support

What you get in the TUI

Usage

🐍 Framework — Build Your Own Agent

Structured Output

Multi-Agent Swarm

Claude Code-Style Lifecycle Hooks + Security Preset

MCP Servers

Context Files

🔬 DeepResearch — Reference App

Architecture

Modular Packages

Full Feature List

Live Run Forking

Tool-Calling

Deep Agent Architecture

Context & Memory

Production Features

CLI

Contributing

Vstorm OSS Ecosystem

Star History

License

Need help shipping AI agents in production?

Similar Packages