World Model MCP

Enforcement, provenance, and harness-neutral memory for AI coding agents. A temporal knowledge graph that validates code changes against learned constraints at the edit boundary, re-injects relevant context after compaction, tracks contradictions with confidence-weighted resolution, and runs across Claude Code, Cursor, and pi.

Status: v0.7.5 -- 26 MCP tools, 18 CLI subcommands, 304 tests. Adds a Codex CLI adapter (install-codex wires world-model-mcp into ~/.codex/config.toml with PreToolUse / PostCompact / PostToolUse / SessionStart hooks). v0.7.4 added an AGENTS.md constraint reader, a self-hosted Claude Managed Agents deployment guide, and a reproducible contradiction-resolution benchmark (93.5% overall). v0.7.3 added a guided demo, opt-in telemetry, and a pi-package adapter. v0.7.0 introduced PostCompact auto-injection, the defer enforcement tier, confidence-weighted contradiction resolution, and a compaction audit log. v0.7.2 added streamable HTTP transport for remote / MCP-tunnel deployment. Contributions welcome.

mcp-name: io.github.SaravananJaichandar/world-model-mcp

If world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one and the feedback shapes what ships next.

What It Does

World Model MCP creates a temporal knowledge graph of your codebase that learns from every coding session to:

Prevent Hallucinations -- Validates API/function references against known entities before use
Stop Repeated Mistakes -- Learns constraints from corrections, applies them in future sessions
Reduce Regressions -- Tracks bug fixes and warns when changes touch critical regions
Survive Compaction -- Re-injects top constraints and recent facts after the agent's context window resets
Resolve Contradictions -- Picks a winner between conflicting facts using confidence, recency, or source count

Think of it as a long-term memory layer that runs alongside Claude Code, Cursor, or any MCP-aware coding agent.

What's new in v0.7.5

Codex CLI adapter -- new install-codex CLI subcommand appends a [mcp_servers.world_model] block plus PreToolUse, PostToolUse, PostCompact, and SessionStart hooks to ~/.codex/config.toml. The bundled snippet was verified against openai/codex@main at v0.138.0-alpha (server name uses underscore to dodge the tool-name hyphen-strip in codex-rs/codex-mcp/src/mcp/mod.rs; hook output sticks to camelCase with deny_unknown_fields compliance). Schema regression tests in tests/test_v075_features.py lock the contract down. See adapters/codex/README.md.
Dual-shape payload normalization in hook_helper and inject_helper -- both helpers now accept either Claude Code's payload shape (event, project_dir) or Codex's (hook_event_name, cwd), so the same Python code drives all four adapters (Claude Code, Cursor, pi, Codex).
Antigravity CLI adapter intentionally NOT shipped this release -- the Antigravity API surface is still settling (six 1.0.x releases in three weeks, the url field for HTTP MCP servers landed June 3, hook JSON event-name casing remains undocumented). Targeting June 25 for that adapter after the API stabilizes. Detailed reasoning in the v0.7.5 RELEASE_NOTES entry.

What's new in v0.7.4

AGENTS.md / .agents/skills/ constraint reader -- world-model-mcp now reads declarative project conventions from AGENTS.md, CLAUDE.md, GEMINI.md, and .agents/skills/*.md files and mixes them into PreToolUse enforcement alongside the SQLite-backed constraints. Supports structured fence blocks (```constraint and YAML frontmatter) and heuristic imperative-sentence extraction for prose-style AGENTS.md files. New MCP tool: get_agents_md_constraints. (anthropics/claude-code#6235 has 4,000+ thumbs-up for AGENTS.md as the cross-agent format.)
Self-hosted Claude Managed Agents deployment guide -- Anthropic's official position: "Memory is not yet supported in self-hosted sessions." world-model-mcp fills that gap. New guide at docs/deployment/managed-agents-self-hosted.md, with a Modal quickstart you can deploy in under five minutes.
Reproducible contradiction-resolution benchmark -- 24-pair dataset at benchmarks/contradictions/dataset.jsonl, runner at benchmarks/contradictions/run.py, results at benchmarks/contradictions/RESULTS.md. Headline: 93.5% overall accuracy, 100% on keep_higher_confidence and keep_most_sources, with documented honest weaknesses on tie-handling and small confidence gaps. Re-run with python benchmarks/contradictions/run.py. CI workflow guards regressions.

What's new in v0.7.3

world-model demo -- one command to see every primitive working. Initializes the knowledge graph, seeds reproducible demo data via scripts/demo_seed.py, then exercises each primitive (PreToolUse enforcement, contradiction detection, PostCompact injection, audit log) with real outputs. New users can see the value without writing any code.
Opt-in telemetry -- off by default, prompted once during world-model setup, inspectable with world-model telemetry --status, disabled with world-model telemetry --disable. No file paths, no code, no identifiers tied to a person. See Privacy and Security for the exact payload.
pi adapter -- new adapters/pi/ package. world-model-mcp now plugs into earendil-works/pi via pi's extension API (tool_call -> PreToolUse, context -> auto-injection, session_compact -> audit log). Install with world-model install-pi.

What v0.7.0 introduced (still active)

PostCompact / UserPromptSubmit auto-injection -- when the agent's context is compacted, the hook automatically splices the top constraints and recent canonical facts back into the next turn. Configurable, fails open.
defer enforcement tier -- PreToolUse now classifies recurring warning-level violations as defer, which pauses headless agents (with graceful fallback to ask on older clients) instead of either hard-denying or silently passing through.
Confidence-weighted contradiction resolution -- the new resolve_contradiction tool picks a winner using keep_higher_confidence, keep_most_recent, keep_most_sources, or auto. The loser is marked superseded.
Compaction audit log -- every PostCompact event writes a row with pre/post token counts and what was re-injected. Query with the audit-compactions CLI or export to JSONL.
Cursor adapter -- harness-neutral hooks under adapters/cursor/. Same Python helpers, different manifest format.
Streamable HTTP transport (v0.7.2) -- WORLD_MODEL_TRANSPORT=http so the same 25 MCP tools work behind an MCP tunnel for Claude Managed Agents with self-hosted sandboxes. See docs/deployment/mcp-tunnel.md.

Quick Start

Option 1: Desktop Extension (one-click for Claude Desktop)

Download the latest .mcpb from Releases and drag it into Claude Desktop. Auto-installs hooks, MCP server config, and dependencies.

Option 2: pip install (Claude Code CLI / IDE plugins)

hljs language-bash

# 1. Install the package
pip install world-model-mcp

# 2. Setup in your project (auto-seeds the knowledge graph from existing code)
cd /path/to/your/project
python -m world_model_server.cli setup

# 3. Restart Claude Code
# Done! The world model is pre-populated and active

You can also re-seed or seed manually at any time:

hljs language-bash

# Seed from existing codebase
world-model seed

# Re-seed with force (re-processes already seeded files)
world-model seed --force

Option 3: HTTP transport for remote / MCP-tunnel deployment

For Claude Managed Agents with self-hosted sandboxes, or any deployment where the MCP server lives behind a firewall and the agent reaches it from Anthropic-side infrastructure, run world-model-mcp in HTTP mode.

hljs language-bash

pip install 'world-model-mcp[http]'

export WORLD_MODEL_TRANSPORT=http
export WORLD_MODEL_HTTP_PORT=8765
python -m world_model_server.server

Or use the bundled image:

hljs language-bash

docker compose up -d                    # Dockerfile.http + persistent volume
curl http://127.0.0.1:8765/healthz      # {"status":"ok","version":"0.7.2"}

Full walkthrough including Anthropic MCP tunnels setup: docs/deployment/mcp-tunnel.md.

Stdio remains the default transport for Claude Code, Cursor, and .mcpb installs. Nothing changes for those flows.

Option 4: Run the guided demo (no Claude Code required)

To see every primitive working with real outputs from a real SQLite database before committing to a full install:

hljs language-bash

pip install world-model-mcp
cd /tmp/wm-test && mkdir -p wm-test && cd wm-test
world-model demo

The demo initializes a knowledge graph, seeds reproducible data, and exercises PreToolUse enforcement, contradiction detection, the PostCompact injection bundle, and the compaction audit log -- with the actual JSON outputs. Re-runs are idempotent.

Option 5: Run inside pi (experimental)

For users of earendil-works/pi:

hljs language-bash

pip install world-model-mcp           # the Python helpers
world-model install-pi                # writes adapters/world-model-pi/
pi install local:./adapters/world-model-pi

The pi adapter wires the same hook_helper and inject_helper you'd use from Claude Code into pi's tool_call, context, and session_compact events. See adapters/pi/README.md.

Option 6: Run inside Codex CLI (experimental)

For users of OpenAI's Codex CLI:

hljs language-bash

pip install world-model-mcp                # the Python helpers
python -m world_model_server.cli install-codex
# (appends [mcp_servers.world_model] + hook blocks to ~/.codex/config.toml)
# Restart codex; verify with: codex mcp list

--dry-run prints what would be appended without writing; --force re-appends even if the adapter marker is already present. The bundled snippet uses world_model (underscore) as the MCP server name to dodge Codex's silent hyphen-strip in its tool-name sanitizer. Hook output is camelCase with deny_unknown_fields compliance against Codex's strict Rust schema; the contract is locked down by tests in tests/test_v075_features.py. See adapters/codex/README.md.

What Gets Installed

hljs language-bash

your-project/
├── .mcp.json                    # MCP server configuration
├── .claude/
│   ├── settings.json           # Hook configuration
│   ├── hooks/                  # Compiled TypeScript hooks
│   └── world-model/            # SQLite databases (~155 KB)

Features

1. Hallucination Prevention

Before:

hljs language-typescript

// Claude invents an API that doesn't exist
const user = await User.findByEmail(email); // This method doesn't exist

After:

hljs language-typescript

// Claude checks the world model first
const user = await User.findOne({ email }); // Verified to exist

Goal: Reduce non-existent API references by validating against the knowledge graph

2. Learning from Corrections

Session 1: User corrects Claude

hljs language-typescript

// Claude writes:
console.log('debug info');

// User corrects to:
logger.debug('debug info');

// World model learns: "Use logger.debug() not console.log()"

Session 2: Claude uses the learned pattern

hljs language-typescript

// Claude automatically writes:
logger.debug('debug info'); // No correction needed

Goal: Learned patterns persist across sessions and prevent repeat violations

3. Regression Prevention

hljs language-typescript

// Week 1: Bug fixed (null check added)
if (user && user.email) { ... }

// Week 2: Refactoring
// World model warns: "This line preserves a critical bug fix"
// Claude preserves the null check

// Result: Bug not re-introduced

Goal: Detect potential regressions before code execution

How It Works

Architecture

hljs language-yaml

┌──────────────────────────────────────────────────────────┐
│ Claude Code + Hooks                                      │
│ Captures: file edits, tool calls, user corrections       │
└──────────────────────────────────────────────────────────┘
                         |
                         v
┌──────────────────────────────────────────────────────────┐
│ MCP Server (Python)                                      │
│ - 22 MCP tools for querying/recording/predicting          │
│ - LLM-powered entity extraction (Claude Haiku)           │
│ - External linter integration (ESLint, Pylint, Ruff)     │
└──────────────────────────────────────────────────────────┘
                         |
                         v
┌──────────────────────────────────────────────────────────┐
│ Knowledge Graph (SQLite + FTS5)                          │
│ - entities.db: APIs, functions, classes                  │
│ - facts.db: Temporal assertions with evidence            │
│ - relationships.db: Entity dependency graph              │
│ - constraints.db: Learned rules from corrections         │
│ - sessions.db: Session history and outcomes              │
│ - events.db: Activity log with reasoning chains          │
└──────────────────────────────────────────────────────────┘

Key Concepts

Temporal Facts: Every fact has validAt and invalidAt timestamps
- "Function X existed from 2024-01-15 to 2024-03-20"
- Query: "What was true on March 1st?"
Evidence Chains: Every assertion traces back to source
- Fact -> Session -> Event -> Source Code Location
Constraint Learning: Pattern recognition from user corrections
- Automatic rule type inference (linting, architecture, testing)
- Severity detection (error, warning, info)
- Example generation for future reference
Dual Validation: Combines two validation sources
- World model constraints (learned from user)
- External linters (ESLint, Pylint, Ruff)

MCP Tools

Twenty-two MCP tools available to Claude Code:

1. `query_fact`

Check if APIs/functions exist before using them

hljs language-python

result = query_fact(
    query="Does User.findByEmail exist?",
    entity_type="function"
)
# Returns: {exists: bool, confidence: float, facts: [...]}

2. `record_event`

Capture development activity with reasoning chains

hljs language-python

record_event(
    event_type="file_edit",
    file_path="src/api/auth.ts",
    reasoning="Added JWT authentication middleware"
)

3. `validate_change`

Pre-execution validation against constraints and linters

hljs language-python

result = validate_change(
    file_path="src/api/auth.ts",
    proposed_content="..."
)
# Returns: {safe: bool, violations: [...], suggestions: [...]}

4. `get_constraints`

Retrieve project-specific rules for a file

hljs language-python

constraints = get_constraints(
    file_path="src/**/*.ts",
    constraint_types=["linting", "architecture"]
)

5. `record_correction`

Learn from user edits (HIGH PRIORITY)

hljs language-python

record_correction(
    claude_action={...},
    user_correction={...},
    reasoning="Use logger.debug instead of console.log"
)

6. `get_related_bugs`

Regression risk assessment

hljs language-python

result = get_related_bugs(
    file_path="src/api/auth.ts",
    change_description="refactoring authentication logic"
)
# Returns: {bugs: [...], risk_score: float, critical_regions: [...]}

7. `seed_project`

Scan the codebase and populate the knowledge graph with entities and relationships

hljs language-python

result = seed_project(
    project_dir=".",
    force=False
)
# Returns: {files_seeded: int, entities_created: int, relationships_created: int}

8. `ingest_pr_reviews`

Pull GitHub PR review comments and convert team feedback into constraints

hljs language-python

result = ingest_pr_reviews(
    repo="owner/repo",  # Auto-detected from git remote if omitted
    count=10
)
# Returns: {prs_scanned: int, constraints_created: int, constraints_updated: int}

Documentation

QUICKSTART.md - 5-minute setup guide
CONTRIBUTING.md - Contribution guidelines
RELEASE_NOTES.md - Version history and features

Testing

hljs language-bash

# Run tests
pytest

# With coverage
pytest --cov=world_model_server --cov-report=html

186 tests covering knowledge graph CRUD, FTS5 search, constraint management, bug tracking, auto-seeding, PR review ingestion, decision traces, outcome linkage, trajectory learning, prediction layer, memory health, contradiction detection, transcript pointers, project identity, and PreToolUse enforcement. See tests/ for details.

Configuration

Environment Variables

hljs language-bash

# Database location (default: ./.claude/world-model/)
export WORLD_MODEL_DB_PATH="/custom/path"

# Anthropic API key (optional - enables LLM extraction)
# IMPORTANT: Never commit this! Use .env file (see .env.example)
export ANTHROPIC_API_KEY="your-api-key-here"

# Model selection
export WORLD_MODEL_EXTRACTION_MODEL="claude-3-haiku-20240307"  # Fast
export WORLD_MODEL_REASONING_MODEL="claude-3-5-sonnet-20241022"  # Accurate

# Debug mode
export WORLD_MODEL_DEBUG=1

Note: Create a .env file in your project root (see .env.example) - it's automatically ignored by git.

Customizing Hooks

Edit .claude/settings.json to customize which tools trigger world model hooks:

hljs language-json

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write|Bash",
      "hooks": [...]
    }]
  }
}

Language Support

Currently Supported:

TypeScript / JavaScript
Python

Coming Soon:

Go, Rust, Java, C++

Extensible Architecture: Easy to add new language parsers (see CONTRIBUTING.md)

Privacy and Security

Local-First: All knowledge graph data stays on your machine.
Optional LLM: Works without API key (uses regex patterns as fallback).
Encrypted Storage: SQLite databases are local files (encrypt your disk for security).

Telemetry (opt-in, off by default)

v0.7.3 added anonymous usage telemetry. It is:

Off by default. You have to explicitly opt in.
Asked once during world-model setup, with a clear y/N prompt.
Inspectable: world-model telemetry --status shows the exact JSON payload that would be sent.
Disable any time with world-model telemetry --disable, or globally with WORLD_MODEL_TELEMETRY_DISABLE=1.
Skipped in non-TTY environments (CI, scripts) so it never blocks an automated setup.

What we send (only if you opt in):

Field	Example	Why
`event`	`setup_completed`, `demo_run`, `hook_fired`	Which lifecycle step ran
`version`	`0.7.3`	Which release you're on
`install_id`	random UUID at `~/.world-model/install_id`	Distinguish installs without identifying users
`ts`	unix timestamp	When the event fired

What we never send: file paths, file contents, rule names, hostnames, IP addresses, API keys, decision-trace text, fact text, or anything else that could identify a person or leak business logic. The full payload schema lives in world_model_server/telemetry.py.

Where it goes: opt-in events are posted to a dedicated private GitHub repo (SaravananJaichandar/world-model-telemetry) as plain issues. There is no third-party analytics service, no cookie, no fingerprint. The PAT embedded in the client is scoped to that one repo with Issues: write only.

API Key Usage (only if you provide `ANTHROPIC_API_KEY`)

Entity extraction from code changes
Constraint inference from corrections
Never sends: Credentials, secrets, PII

Security Best Practices

Never commit .env files
Use .env.example as template
Store API keys in environment variables or .env files only
The .gitignore automatically excludes sensitive files

Roadmap

v0.2.x

Auto-seeding: knowledge graph populates from existing codebase on setup
PR Review Intelligence: ingest GitHub review comments as constraints
Relationship tracking: import and dependency graph between entities
Multi-language support: Python, TypeScript/JavaScript, Solidity, Go, Rust
CLI query command for knowledge graph lookups
40 tests, 8 MCP tools

v0.3.0

Module-level matching: query by module name finds the file and its contents
Incremental re-seeding: only re-process files changed since last seed
Fuzzy entity matching: approximate name search for typos and abbreviations
Query caching: in-memory cache with TTL for repeated lookups
Java support: complete multi-language coverage
MCP server pipeline validation on real projects

v0.4.0

Outcome linkage: test failures linked to code changes with facts
Trajectory learning: co-edit patterns tracked across sessions
Decision trace capture: structured log of agent proposals and human corrections
Cross-project entity search with project registry
5 new MCP tools (13 total), 104 tests

v0.5.0

Regression prediction, "what if" simulation, test failure prediction
Multi-project knowledge transfer, memory health, fact TTL/decay
get_context_for_action pre-edit bundle, constraint violation tracking, find_contradictions
20 MCP tools, 151 tests

v0.6.0 — Enforcement, Provenance, Identity

PreToolUse constraint enforcement hook: deny hard violations at the edit boundary
Indexed transcript pointers: hydrate any fact back to source conversation
Project identity decoupling: stable UUID across directory renames
Content-hash deduplication for facts and constraints
Auto-generate CLAUDE.md from the knowledge graph
BetaAbstractMemoryTool subclass for Anthropic SDK integration
Desktop Extension (.mcpb) packaging for Claude Desktop
22 MCP tools, 13 CLI subcommands, 186 tests

v0.7.0 — Auto-injection, defer tier, contradiction resolution, harness adapters

PostCompact and UserPromptSubmit auto-injection: re-emit top constraints and recent facts after context loss
defer enforcement tier in PreToolUse: pause headless agents on recurring warning-level violations, with graceful fallback to ask
Confidence-weighted contradiction resolution: pick a winner using confidence, recency, or source count, with an auto strategy
Compaction audit log: query and export what was remembered across each compaction boundary
Cursor adapter package
25 MCP tools, 14 CLI subcommands, 220 tests

v0.7.2 — Streamable HTTP transport

HTTP transport mode for remote / MCP-tunnel deployment
/healthz endpoint, Dockerfile.http, docker-compose.yml
docs/deployment/mcp-tunnel.md walkthrough for Claude Managed Agents
236 tests

v0.7.3 — Onboarding, telemetry, pi adapter

world-model demo guided tour for first-time users
Opt-in anonymous telemetry, off by default, inspectable
pi-package adapter (adapters/pi/, install-pi CLI)
17 CLI subcommands, 256 tests

v0.7.4 (Current) — Interop, deployment, benchmark

AGENTS.md / .agents/skills/ constraint reader (new MCP tool: get_agents_md_constraints)
Self-hosted Claude Managed Agents deployment guide + Modal quickstart
Reproducible contradiction-resolution benchmark (24-pair dataset, CI workflow, RESULTS.md)
26 MCP tools, 17 CLI subcommands, 283 tests

v0.8.0 (Next)

Codex CLI adapter (OpenAI; v0.133.0 shipped SubagentStart/SubagentStop hooks May 21)
Antigravity CLI adapter (Google; Gemini CLI sunsets June 18, 2026)
MCP spec 2026-07-28 readiness (stateless transport, _meta headers, InputRequiredResult)
In-agent /world-model slash command + TUI status widget (replaces standalone dashboard)
Cline adapter (lower urgency after they shipped global AGENTS rules in v3.86)
Evidence-weighted decay: constraints persist, low-evidence assertions expire

Contributing

Contributions are welcome. See CONTRIBUTING.md for:

Development setup
Coding standards
Adding language support
Writing tests
Submitting PRs

Areas where help is needed:

Language parsers (Go, Rust, Java, C++)
Performance optimization
Documentation improvements
Real-world testing feedback

Stats

Project Size:

~4,800 lines of code
13 Python modules
3 TypeScript hook implementations

Storage Efficiency:

Empty database: ~155 KB
Per entity: ~500 bytes
Per fact: ~800 bytes

License

MIT License - Free for commercial and personal use

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

World Model MCP

Status: v0.7.5 -- 26 MCP tools, 18 CLI subcommands, 304 tests. Adds a Codex CLI adapter (install-codex wires world-model-mcp into ~/.codex/config.toml with PreToolUse / PostCompact / PostToolUse / SessionStart hooks). v0.7.4 added an AGENTS.md constraint reader, a self-hosted Claude Managed Agents deployment guide, and a reproducible contradiction-resolution benchmark (93.5% overall). v0.7.3 added a guided demo, opt-in telemetry, and a pi-package adapter. v0.7.0 introduced PostCompact auto-injection, the defer enforcement tier, confidence-weighted contradiction resolution, and a compaction audit log. v0.7.2 added streamable HTTP transport for remote / MCP-tunnel deployment. Contributions welcome.

mcp-name: io.github.SaravananJaichandar/world-model-mcp

If world-model-mcp helped you, star the repo or open an issue with what worked or didn't. I read every one and the feedback shapes what ships next.

What It Does

World Model MCP creates a temporal knowledge graph of your codebase that learns from every coding session to:

Prevent Hallucinations -- Validates API/function references against known entities before use
Stop Repeated Mistakes -- Learns constraints from corrections, applies them in future sessions
Reduce Regressions -- Tracks bug fixes and warns when changes touch critical regions
Survive Compaction -- Re-injects top constraints and recent facts after the agent's context window resets
Resolve Contradictions -- Picks a winner between conflicting facts using confidence, recency, or source count

Think of it as a long-term memory layer that runs alongside Claude Code, Cursor, or any MCP-aware coding agent.

What's new in v0.7.5

Codex CLI adapter -- new install-codex CLI subcommand appends a [mcp_servers.world_model] block plus PreToolUse, PostToolUse, PostCompact, and SessionStart hooks to ~/.codex/config.toml. The bundled snippet was verified against openai/codex@main at v0.138.0-alpha (server name uses underscore to dodge the tool-name hyphen-strip in codex-rs/codex-mcp/src/mcp/mod.rs; hook output sticks to camelCase with deny_unknown_fields compliance). Schema regression tests in tests/test_v075_features.py lock the contract down. See adapters/codex/README.md.
Dual-shape payload normalization in hook_helper and inject_helper -- both helpers now accept either Claude Code's payload shape (event, project_dir) or Codex's (hook_event_name, cwd), so the same Python code drives all four adapters (Claude Code, Cursor, pi, Codex).
Antigravity CLI adapter intentionally NOT shipped this release -- the Antigravity API surface is still settling (six 1.0.x releases in three weeks, the url field for HTTP MCP servers landed June 3, hook JSON event-name casing remains undocumented). Targeting June 25 for that adapter after the API stabilizes. Detailed reasoning in the v0.7.5 RELEASE_NOTES entry.

What's new in v0.7.4

AGENTS.md / .agents/skills/ constraint reader -- world-model-mcp now reads declarative project conventions from AGENTS.md, CLAUDE.md, GEMINI.md, and .agents/skills/*.md files and mixes them into PreToolUse enforcement alongside the SQLite-backed constraints. Supports structured fence blocks (```constraint and YAML frontmatter) and heuristic imperative-sentence extraction for prose-style AGENTS.md files. New MCP tool: get_agents_md_constraints. (anthropics/claude-code#6235 has 4,000+ thumbs-up for AGENTS.md as the cross-agent format.)
Self-hosted Claude Managed Agents deployment guide -- Anthropic's official position: "Memory is not yet supported in self-hosted sessions." world-model-mcp fills that gap. New guide at docs/deployment/managed-agents-self-hosted.md, with a Modal quickstart you can deploy in under five minutes.
Reproducible contradiction-resolution benchmark -- 24-pair dataset at benchmarks/contradictions/dataset.jsonl, runner at benchmarks/contradictions/run.py, results at benchmarks/contradictions/RESULTS.md. Headline: 93.5% overall accuracy, 100% on keep_higher_confidence and keep_most_sources, with documented honest weaknesses on tie-handling and small confidence gaps. Re-run with python benchmarks/contradictions/run.py. CI workflow guards regressions.

What's new in v0.7.3

world-model demo -- one command to see every primitive working. Initializes the knowledge graph, seeds reproducible demo data via scripts/demo_seed.py, then exercises each primitive (PreToolUse enforcement, contradiction detection, PostCompact injection, audit log) with real outputs. New users can see the value without writing any code.
Opt-in telemetry -- off by default, prompted once during world-model setup, inspectable with world-model telemetry --status, disabled with world-model telemetry --disable. No file paths, no code, no identifiers tied to a person. See Privacy and Security for the exact payload.
pi adapter -- new adapters/pi/ package. world-model-mcp now plugs into earendil-works/pi via pi's extension API (tool_call -> PreToolUse, context -> auto-injection, session_compact -> audit log). Install with world-model install-pi.

What v0.7.0 introduced (still active)

PostCompact / UserPromptSubmit auto-injection -- when the agent's context is compacted, the hook automatically splices the top constraints and recent canonical facts back into the next turn. Configurable, fails open.
defer enforcement tier -- PreToolUse now classifies recurring warning-level violations as defer, which pauses headless agents (with graceful fallback to ask on older clients) instead of either hard-denying or silently passing through.
Confidence-weighted contradiction resolution -- the new resolve_contradiction tool picks a winner using keep_higher_confidence, keep_most_recent, keep_most_sources, or auto. The loser is marked superseded.
Compaction audit log -- every PostCompact event writes a row with pre/post token counts and what was re-injected. Query with the audit-compactions CLI or export to JSONL.
Cursor adapter -- harness-neutral hooks under adapters/cursor/. Same Python helpers, different manifest format.
Streamable HTTP transport (v0.7.2) -- WORLD_MODEL_TRANSPORT=http so the same 25 MCP tools work behind an MCP tunnel for Claude Managed Agents with self-hosted sandboxes. See docs/deployment/mcp-tunnel.md.

Quick Start

Option 1: Desktop Extension (one-click for Claude Desktop)

Download the latest .mcpb from Releases and drag it into Claude Desktop. Auto-installs hooks, MCP server config, and dependencies.

Option 2: pip install (Claude Code CLI / IDE plugins)

hljs language-bash

# 1. Install the package
pip install world-model-mcp

# 2. Setup in your project (auto-seeds the knowledge graph from existing code)
cd /path/to/your/project
python -m world_model_server.cli setup

# 3. Restart Claude Code
# Done! The world model is pre-populated and active

You can also re-seed or seed manually at any time:

hljs language-bash

# Seed from existing codebase
world-model seed

# Re-seed with force (re-processes already seeded files)
world-model seed --force

Option 3: HTTP transport for remote / MCP-tunnel deployment

hljs language-bash

pip install 'world-model-mcp[http]'

export WORLD_MODEL_TRANSPORT=http
export WORLD_MODEL_HTTP_PORT=8765
python -m world_model_server.server

Or use the bundled image:

hljs language-bash

docker compose up -d                    # Dockerfile.http + persistent volume
curl http://127.0.0.1:8765/healthz      # {"status":"ok","version":"0.7.2"}

Full walkthrough including Anthropic MCP tunnels setup: docs/deployment/mcp-tunnel.md.

Stdio remains the default transport for Claude Code, Cursor, and .mcpb installs. Nothing changes for those flows.

Option 4: Run the guided demo (no Claude Code required)

To see every primitive working with real outputs from a real SQLite database before committing to a full install:

hljs language-bash

pip install world-model-mcp
cd /tmp/wm-test && mkdir -p wm-test && cd wm-test
world-model demo

Option 5: Run inside pi (experimental)

For users of earendil-works/pi:

hljs language-bash

pip install world-model-mcp           # the Python helpers
world-model install-pi                # writes adapters/world-model-pi/
pi install local:./adapters/world-model-pi

The pi adapter wires the same hook_helper and inject_helper you'd use from Claude Code into pi's tool_call, context, and session_compact events. See adapters/pi/README.md.

Option 6: Run inside Codex CLI (experimental)

For users of OpenAI's Codex CLI:

hljs language-bash

pip install world-model-mcp                # the Python helpers
python -m world_model_server.cli install-codex
# (appends [mcp_servers.world_model] + hook blocks to ~/.codex/config.toml)
# Restart codex; verify with: codex mcp list

What Gets Installed

hljs language-bash

your-project/
├── .mcp.json                    # MCP server configuration
├── .claude/
│   ├── settings.json           # Hook configuration
│   ├── hooks/                  # Compiled TypeScript hooks
│   └── world-model/            # SQLite databases (~155 KB)

Features

1. Hallucination Prevention

Before:

hljs language-typescript

// Claude invents an API that doesn't exist
const user = await User.findByEmail(email); // This method doesn't exist

After:

hljs language-typescript

// Claude checks the world model first
const user = await User.findOne({ email }); // Verified to exist

Goal: Reduce non-existent API references by validating against the knowledge graph

2. Learning from Corrections

Session 1: User corrects Claude

hljs language-typescript

// Claude writes:
console.log('debug info');

// User corrects to:
logger.debug('debug info');

// World model learns: "Use logger.debug() not console.log()"

Session 2: Claude uses the learned pattern

hljs language-typescript

// Claude automatically writes:
logger.debug('debug info'); // No correction needed

Goal: Learned patterns persist across sessions and prevent repeat violations

3. Regression Prevention

hljs language-typescript

// Week 1: Bug fixed (null check added)
if (user && user.email) { ... }

// Week 2: Refactoring
// World model warns: "This line preserves a critical bug fix"
// Claude preserves the null check

// Result: Bug not re-introduced

Goal: Detect potential regressions before code execution

How It Works

Architecture

hljs language-yaml

┌──────────────────────────────────────────────────────────┐
│ Claude Code + Hooks                                      │
│ Captures: file edits, tool calls, user corrections       │
└──────────────────────────────────────────────────────────┘
                         |
                         v
┌──────────────────────────────────────────────────────────┐
│ MCP Server (Python)                                      │
│ - 22 MCP tools for querying/recording/predicting          │
│ - LLM-powered entity extraction (Claude Haiku)           │
│ - External linter integration (ESLint, Pylint, Ruff)     │
└──────────────────────────────────────────────────────────┘
                         |
                         v
┌──────────────────────────────────────────────────────────┐
│ Knowledge Graph (SQLite + FTS5)                          │
│ - entities.db: APIs, functions, classes                  │
│ - facts.db: Temporal assertions with evidence            │
│ - relationships.db: Entity dependency graph              │
│ - constraints.db: Learned rules from corrections         │
│ - sessions.db: Session history and outcomes              │
│ - events.db: Activity log with reasoning chains          │
└──────────────────────────────────────────────────────────┘

Key Concepts

Temporal Facts: Every fact has validAt and invalidAt timestamps
- "Function X existed from 2024-01-15 to 2024-03-20"
- Query: "What was true on March 1st?"
Evidence Chains: Every assertion traces back to source
- Fact -> Session -> Event -> Source Code Location
Constraint Learning: Pattern recognition from user corrections
- Automatic rule type inference (linting, architecture, testing)
- Severity detection (error, warning, info)
- Example generation for future reference
Dual Validation: Combines two validation sources
- World model constraints (learned from user)
- External linters (ESLint, Pylint, Ruff)

MCP Tools

Twenty-two MCP tools available to Claude Code:

1. `query_fact`

Check if APIs/functions exist before using them

hljs language-python

result = query_fact(
    query="Does User.findByEmail exist?",
    entity_type="function"
)
# Returns: {exists: bool, confidence: float, facts: [...]}

2. `record_event`

Capture development activity with reasoning chains

hljs language-python

record_event(
    event_type="file_edit",
    file_path="src/api/auth.ts",
    reasoning="Added JWT authentication middleware"
)

3. `validate_change`

Pre-execution validation against constraints and linters

hljs language-python

result = validate_change(
    file_path="src/api/auth.ts",
    proposed_content="..."
)
# Returns: {safe: bool, violations: [...], suggestions: [...]}

4. `get_constraints`

Retrieve project-specific rules for a file

hljs language-python

constraints = get_constraints(
    file_path="src/**/*.ts",
    constraint_types=["linting", "architecture"]
)

5. `record_correction`

Learn from user edits (HIGH PRIORITY)

hljs language-python

record_correction(
    claude_action={...},
    user_correction={...},
    reasoning="Use logger.debug instead of console.log"
)

6. `get_related_bugs`

Regression risk assessment

hljs language-python

result = get_related_bugs(
    file_path="src/api/auth.ts",
    change_description="refactoring authentication logic"
)
# Returns: {bugs: [...], risk_score: float, critical_regions: [...]}

7. `seed_project`

Scan the codebase and populate the knowledge graph with entities and relationships

hljs language-python

result = seed_project(
    project_dir=".",
    force=False
)
# Returns: {files_seeded: int, entities_created: int, relationships_created: int}

8. `ingest_pr_reviews`

Pull GitHub PR review comments and convert team feedback into constraints

hljs language-python

result = ingest_pr_reviews(
    repo="owner/repo",  # Auto-detected from git remote if omitted
    count=10
)
# Returns: {prs_scanned: int, constraints_created: int, constraints_updated: int}

Documentation

QUICKSTART.md - 5-minute setup guide
CONTRIBUTING.md - Contribution guidelines
RELEASE_NOTES.md - Version history and features

Testing

hljs language-bash

# Run tests
pytest

# With coverage
pytest --cov=world_model_server --cov-report=html

Configuration

Environment Variables

hljs language-bash

# Database location (default: ./.claude/world-model/)
export WORLD_MODEL_DB_PATH="/custom/path"

# Anthropic API key (optional - enables LLM extraction)
# IMPORTANT: Never commit this! Use .env file (see .env.example)
export ANTHROPIC_API_KEY="your-api-key-here"

# Model selection
export WORLD_MODEL_EXTRACTION_MODEL="claude-3-haiku-20240307"  # Fast
export WORLD_MODEL_REASONING_MODEL="claude-3-5-sonnet-20241022"  # Accurate

# Debug mode
export WORLD_MODEL_DEBUG=1

Note: Create a .env file in your project root (see .env.example) - it's automatically ignored by git.

Customizing Hooks

Edit .claude/settings.json to customize which tools trigger world model hooks:

hljs language-json

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write|Bash",
      "hooks": [...]
    }]
  }
}

Language Support

Currently Supported:

TypeScript / JavaScript
Python

Coming Soon:

Go, Rust, Java, C++

Extensible Architecture: Easy to add new language parsers (see CONTRIBUTING.md)

Privacy and Security

Local-First: All knowledge graph data stays on your machine.
Optional LLM: Works without API key (uses regex patterns as fallback).
Encrypted Storage: SQLite databases are local files (encrypt your disk for security).

Telemetry (opt-in, off by default)

v0.7.3 added anonymous usage telemetry. It is:

Off by default. You have to explicitly opt in.
Asked once during world-model setup, with a clear y/N prompt.
Inspectable: world-model telemetry --status shows the exact JSON payload that would be sent.
Disable any time with world-model telemetry --disable, or globally with WORLD_MODEL_TELEMETRY_DISABLE=1.
Skipped in non-TTY environments (CI, scripts) so it never blocks an automated setup.

What we send (only if you opt in):

Field	Example	Why
`event`	`setup_completed`, `demo_run`, `hook_fired`	Which lifecycle step ran
`version`	`0.7.3`	Which release you're on
`install_id`	random UUID at `~/.world-model/install_id`	Distinguish installs without identifying users
`ts`	unix timestamp	When the event fired

API Key Usage (only if you provide `ANTHROPIC_API_KEY`)

Entity extraction from code changes
Constraint inference from corrections
Never sends: Credentials, secrets, PII

Security Best Practices

Never commit .env files
Use .env.example as template
Store API keys in environment variables or .env files only
The .gitignore automatically excludes sensitive files

Roadmap

v0.2.x

Auto-seeding: knowledge graph populates from existing codebase on setup
PR Review Intelligence: ingest GitHub review comments as constraints
Relationship tracking: import and dependency graph between entities
Multi-language support: Python, TypeScript/JavaScript, Solidity, Go, Rust
CLI query command for knowledge graph lookups
40 tests, 8 MCP tools

v0.3.0

Module-level matching: query by module name finds the file and its contents
Incremental re-seeding: only re-process files changed since last seed
Fuzzy entity matching: approximate name search for typos and abbreviations
Query caching: in-memory cache with TTL for repeated lookups
Java support: complete multi-language coverage
MCP server pipeline validation on real projects

v0.4.0

Outcome linkage: test failures linked to code changes with facts
Trajectory learning: co-edit patterns tracked across sessions
Decision trace capture: structured log of agent proposals and human corrections
Cross-project entity search with project registry
5 new MCP tools (13 total), 104 tests

v0.5.0

Regression prediction, "what if" simulation, test failure prediction
Multi-project knowledge transfer, memory health, fact TTL/decay
get_context_for_action pre-edit bundle, constraint violation tracking, find_contradictions
20 MCP tools, 151 tests

v0.6.0 — Enforcement, Provenance, Identity

PreToolUse constraint enforcement hook: deny hard violations at the edit boundary
Indexed transcript pointers: hydrate any fact back to source conversation
Project identity decoupling: stable UUID across directory renames
Content-hash deduplication for facts and constraints
Auto-generate CLAUDE.md from the knowledge graph
BetaAbstractMemoryTool subclass for Anthropic SDK integration
Desktop Extension (.mcpb) packaging for Claude Desktop
22 MCP tools, 13 CLI subcommands, 186 tests

v0.7.0 — Auto-injection, defer tier, contradiction resolution, harness adapters

PostCompact and UserPromptSubmit auto-injection: re-emit top constraints and recent facts after context loss
defer enforcement tier in PreToolUse: pause headless agents on recurring warning-level violations, with graceful fallback to ask
Confidence-weighted contradiction resolution: pick a winner using confidence, recency, or source count, with an auto strategy
Compaction audit log: query and export what was remembered across each compaction boundary
Cursor adapter package
25 MCP tools, 14 CLI subcommands, 220 tests

v0.7.2 — Streamable HTTP transport

HTTP transport mode for remote / MCP-tunnel deployment
/healthz endpoint, Dockerfile.http, docker-compose.yml
docs/deployment/mcp-tunnel.md walkthrough for Claude Managed Agents
236 tests

v0.7.3 — Onboarding, telemetry, pi adapter

world-model demo guided tour for first-time users
Opt-in anonymous telemetry, off by default, inspectable
pi-package adapter (adapters/pi/, install-pi CLI)
17 CLI subcommands, 256 tests

v0.7.4 (Current) — Interop, deployment, benchmark

AGENTS.md / .agents/skills/ constraint reader (new MCP tool: get_agents_md_constraints)
Self-hosted Claude Managed Agents deployment guide + Modal quickstart
Reproducible contradiction-resolution benchmark (24-pair dataset, CI workflow, RESULTS.md)
26 MCP tools, 17 CLI subcommands, 283 tests

v0.8.0 (Next)

Codex CLI adapter (OpenAI; v0.133.0 shipped SubagentStart/SubagentStop hooks May 21)
Antigravity CLI adapter (Google; Gemini CLI sunsets June 18, 2026)
MCP spec 2026-07-28 readiness (stateless transport, _meta headers, InputRequiredResult)
In-agent /world-model slash command + TUI status widget (replaces standalone dashboard)
Cline adapter (lower urgency after they shipped global AGENTS rules in v3.86)
Evidence-weighted decay: constraints persist, low-evidence assertions expire

Contributing

Contributions are welcome. See CONTRIBUTING.md for:

Development setup
Coding standards
Adding language support
Writing tests
Submitting PRs

Areas where help is needed:

Language parsers (Go, Rust, Java, C++)
Performance optimization
Documentation improvements
Real-world testing feedback

Stats

Project Size:

~4,800 lines of code
13 Python modules
3 TypeScript hook implementations

Storage Efficiency:

Empty database: ~155 KB
Per entity: ~500 bytes
Per fact: ~800 bytes

License

MIT License - Free for commercial and personal use

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

world-model-mcp

World Model MCP

What It Does

What's new in v0.7.5

What's new in v0.7.4

What's new in v0.7.3

What v0.7.0 introduced (still active)

Quick Start

Option 1: Desktop Extension (one-click for Claude Desktop)

Option 2: pip install (Claude Code CLI / IDE plugins)

Option 3: HTTP transport for remote / MCP-tunnel deployment

Option 4: Run the guided demo (no Claude Code required)

Option 5: Run inside pi (experimental)

Option 6: Run inside Codex CLI (experimental)

What Gets Installed

Features

1. Hallucination Prevention

2. Learning from Corrections

3. Regression Prevention

How It Works

Architecture

Key Concepts

MCP Tools

1. query_fact

2. record_event

3. validate_change

4. get_constraints

5. record_correction

6. get_related_bugs

7. seed_project

8. ingest_pr_reviews

Documentation

Testing

Configuration

Environment Variables

Customizing Hooks

Language Support

Privacy and Security

Telemetry (opt-in, off by default)

API Key Usage (only if you provide ANTHROPIC_API_KEY)

Security Best Practices

Roadmap

v0.2.x

v0.3.0

v0.4.0

v0.5.0

v0.6.0 — Enforcement, Provenance, Identity

v0.7.0 — Auto-injection, defer tier, contradiction resolution, harness adapters

v0.7.2 — Streamable HTTP transport

v0.7.3 — Onboarding, telemetry, pi adapter

v0.7.4 (Current) — Interop, deployment, benchmark

v0.8.0 (Next)

Contributing

Stats

License

Support

Similar Packages

world-model-mcp

World Model MCP

What It Does

What's new in v0.7.5

What's new in v0.7.4

What's new in v0.7.3

What v0.7.0 introduced (still active)

Quick Start

Option 1: Desktop Extension (one-click for Claude Desktop)

Option 2: pip install (Claude Code CLI / IDE plugins)

Option 3: HTTP transport for remote / MCP-tunnel deployment

Option 4: Run the guided demo (no Claude Code required)

Option 5: Run inside pi (experimental)

Option 6: Run inside Codex CLI (experimental)

What Gets Installed

Features

1. Hallucination Prevention

2. Learning from Corrections

3. Regression Prevention

How It Works

Architecture

Key Concepts

MCP Tools

1. `query_fact`

2. `record_event`

3. `validate_change`

4. `get_constraints`

5. `record_correction`

6. `get_related_bugs`

7. `seed_project`

8. `ingest_pr_reviews`

API Key Usage (only if you provide `ANTHROPIC_API_KEY`)

1. `query_fact`

2. `record_event`

3. `validate_change`

4. `get_constraints`

5. `record_correction`

6. `get_related_bugs`

7. `seed_project`

8. `ingest_pr_reviews`

API Key Usage (only if you provide `ANTHROPIC_API_KEY`)