A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Universal LLM router for AI coding tools. Works with Claude Code, Cursor, Codex, Gemini CLI, Copilot and more.
Make Claude Code, Codex, and Gemini CLI use the cheapest model that can still do the job well.
Save 35-80% on routine prompts, protect premium quota, and fall back automatically when providers fail.
Install in 30 seconds
pip install llm-routing
Works with Claude Code, Codex, and Gemini CLI · No API keys required on Claude Pro/Max
Local-first. No hosted proxy. No account required.
AI coding tools send too many prompts to premium models by default.
That means:
llm-router sits between your coding tool and your model providers. It classifies each prompt, tries the cheapest capable model first, and falls back automatically when needed.
You keep the same workflow. The router changes the model choice underneath.
pip install llm-routing
llm-router install
Package name:
llm-routingon PyPI. CLI command:llm-router.
export OPENAI_API_KEY="sk-..." # GPT-4o, o3
export GEMINI_API_KEY="AIza..." # Gemini Flash/Pro (free tier available)
export OLLAMA_BASE_URL="http://localhost:11434" # Local models (free)
export OPENROUTER_API_KEY="sk-or-v1-…" # 343 OpenRouter models (qwen, deepseek, grok, …)
Works with zero API keys on Claude Code Pro/Max subscriptions — routing uses MCP tools that call external models only when beneficial. Add OPENROUTER_API_KEY to unlock the open-weight workhorse pool used by the cost_aggressive policy.
llm-router health # Check provider connectivity
If you already use Claude Code, Codex, or Gemini CLI, keep your existing workflow and let llm-router choose models underneath it.
| Prompt | Routed to |
|---|---|
| "What does this Python error mean?" | Ollama / Gemini Flash / Codex |
| "Refactor this endpoint" | GPT-4o / Gemini Pro |
| "Design a distributed tracing strategy" | o3 / Claude Opus |
The exact chain depends on your configured providers, budget profile, and routing policy.
| Tool | Mode | Typical Savings |
|---|---|---|
| Claude Code | Full auto-routing via hooks | 60–80% |
| Codex CLI | Full auto-routing via hooks | 60–80% |
| Gemini CLI | Full auto-routing via hooks | 50–70% |
| VS Code / Cursor | Manual MCP tools | 30–50% |
| Any MCP client | Manual MCP tools | Varies |
llm_query.llm-router install # Claude Code (default)
llm-router install --host codex # Codex CLI
llm-router install --host gemini-cli # Gemini CLI
llm-router install --host vscode # VS Code
llm-router install --host cursor # Cursor
See docs/HOST_SUPPORT_MATRIX.md for full details on each host.
For a strict boundary that never automatically falls through to native Claude, configure:
# ~/.llm-router/routing.yaml
enforce: smart
mode: zero_claude
In zero_claude mode, prompts either complete through direct external execution or are blocked before Claude Code invokes its model. Prefix a prompt with claude: when you intentionally want a native Claude turn.
User prompt
│
▼
┌──────────────────────┐
│ Complexity Classifier │ ← Heuristic (free, instant) or Ollama/Flash ($0.0001)
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Free-First Router │ ← Tries cheapest model first, walks up the chain
│ │
│ Ollama (free) │
│ → Codex (prepaid) │
│ → Gemini Flash │
│ → GPT-4o / Claude │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Guards (parallel) │ ← Circuit breaker, budget pressure, quality check
└──────────┬───────────┘
│
▼
Response + cost logged to local SQLite
Classification is free for many tasks (regex heuristics catch ~70%) or near-free for ambiguous prompts when using local Ollama or Gemini Flash.
| Use case | How |
|---|---|
| Route simple questions to free local models | Auto (hooks) or llm_query |
| Protect Claude subscription quota | Budget pressure monitoring + auto-downgrade |
| Fall back across providers on failure | Automatic chain with circuit breakers |
| Track token spend and savings | llm_usage, llm_savings, session-end reports |
| Enforce routing policy for your team | LLM_ROUTER_POLICY=aggressive |
| Generate images/video/audio | llm_image, llm_video, llm_audio |
| Run multi-step research pipelines | llm_orchestrate with templates |
| Bulk-edit files with cheap models | llm_fs_edit_many |
| Compare two routing policies | llm-router policy diff <a> <b> (v10) |
| Benchmark + track Arena score | llm-router benchmark run / regress (v10) |
Beyond the install + auth flow, llm-router ships several operational subcommands:
llm-router benchmark list # list registered benchmark runners
llm-router benchmark run routerarena --split sub_10 # route a dataset and score it
llm-router benchmark regress --policy <p> --benchmark <b> # detect score regressions
llm-router policy diff balanced cost_aggressive # per-prompt model + cost delta
These power the routing self-improvement loop: routing decisions get persisted to a SQLite outcomes table; benchmark runs against a reference dataset establish baseline scores; regress flags drops > 0.005 in release-over-release comparisons. See docs/CLI.md for the full subcommand reference.
Routing chains are built from your configured providers. You only need one.
| Provider | Models | Cost | Setup |
|---|---|---|---|
| Ollama | gemma4, qwen3.5, llama3, etc. | Free (local) | OLLAMA_BASE_URL |
| OpenAI | GPT-4o, o3, GPT-4o-mini | Paid API | OPENAI_API_KEY |
| Gemini Flash, Pro | Free tier + paid | GEMINI_API_KEY | |
| Anthropic | Claude Sonnet, Opus, Haiku | Paid API or subscription | ANTHROPIC_API_KEY or subscription |
| xAI | Grok-3 | Paid API | XAI_API_KEY |
| DeepSeek | DeepSeek Chat, Reasoner | Paid API (ultra-cheap) | DEEPSEEK_API_KEY |
| Mistral | Mistral Large, Small | Paid API | MISTRAL_API_KEY |
| Cohere | Command R+ | Paid API | COHERE_API_KEY |
| Perplexity | Sonar Pro (web-grounded) | Paid API | PERPLEXITY_API_KEY |
| Groq | Fast inference (Llama, Mixtral) | Free tier | GROQ_API_KEY |
| Together | Open-source models | Paid API | TOGETHER_API_KEY |
| HuggingFace | Open-source models | Free tier + paid | HF_TOKEN |
| OpenRouter | 343 models (qwen3-235b, deepseek-v4-flash, grok-4.3, gemini-flash-lite, claude, gpt, …) | Paid API (one key, all providers) | OPENROUTER_API_KEY |
| Codex | GPT-5.4, o3 (prepaid desktop) | Included with Codex CLI | Auto-detected |
| Provider | Type | Setup |
|---|---|---|
| fal | Image (Flux), Video (Kling) | FAL_KEY |
| Stability | Image (Stable Diffusion 3) | STABILITY_API_KEY |
| ElevenLabs | Audio / TTS | ELEVENLABS_API_KEY |
| Runway | Video (Gen-3) | RUNWAY_API_KEY |
| Replicate | Various open-source models | REPLICATE_API_TOKEN |
See docs/PROVIDERS.md for setup instructions and model recommendations.
Control how aggressively the router offloads to cheap models. Policies ship as YAML files in src/llm_router/policies/ — write your own to override workhorses, subject specialists, and per-task chains.
| Policy | Confidence Threshold | Typical Savings | Best For |
|---|---|---|---|
| Aggressive | 2 | 60–75% | Maximum cost reduction |
| Balanced (default) | 4 | 35–45% | Cost/quality tradeoff |
| Conservative | 6 | 10–15% | Quality over cost |
cost_aggressive | 3 | 70–85% | OpenRouter open-weight workhorses + subject specialists. Activate with OPENROUTER_API_KEY. New in v10. |
export LLM_ROUTER_POLICY=aggressive # Or: balanced, conservative, cost_aggressive
export LLM_ROUTER_ENFORCE=smart # smart | hard | soft | off
export LLM_ROUTER_PROFILE=balanced # budget | balanced | premium
export LLM_ROUTER_BANDIT=on # on (default) | off — opt out of telemetry-driven chain reorder
The cost_aggressive policy routes via OpenRouter:
export OPENROUTER_API_KEY=sk-or-v1-...
export LLM_ROUTER_POLICY=cost_aggressive
# Now: code → qwen3-coder-next, medical → gemini-flash-lite, reasoning → grok-4.3, …
See docs/POLICIES.md for the YAML schema and how to author your own policy.
LLM_ROUTER_ENFORCE controls how strictly the auto-route hook blocks direct model use:
smart — route when confident, pass through when uncertainhard — always route, block unrouted tool callssoft — suggest routing, never blockoff — disable hook enforcementllm-router exposes 60 MCP tools organized by function:
| Category | Tools | Examples |
|---|---|---|
| Routing & classification | 7 | llm_route, llm_classify, llm_auto, llm_stream |
| Text generation | 6 | llm_query, llm_code, llm_analyze, llm_research |
| Media generation | 3 | llm_image, llm_video, llm_audio |
| Pipeline orchestration | 2 | llm_orchestrate, llm_pipeline_templates |
| Admin & monitoring | 20+ | llm_usage, llm_budget, llm_health, llm_savings |
| Filesystem operations | 4 | llm_fs_find, llm_fs_edit_many |
| Subscription tracking | 3 | llm_check_usage, llm_refresh_claude_usage |
Slim mode (LLM_ROUTER_SLIM=routing or core) reduces registered tools to save context tokens in constrained environments.
Savings are calculated by comparing actual spend against a baseline of routing every task to Claude Sonnet/Opus.
Methodology:
(baseline - actual) / baselineAssumptions and limitations:
len(text) / 4 approximation, not exact tokenizer countsObserved range: 35–80% savings depending on policy and task mix. The "87%" figure in some docs represents a single-user peak over a specific development period, not a guaranteed outcome.
llm-router runs entirely on your machine. There is no hosted proxy, no telemetry, no account required.
| What | Where | Details |
|---|---|---|
| Your prompts | Sent to configured providers | Exactly like using those providers directly |
| API keys | .env or ~/.llm-router/config.yaml | Local files, never transmitted |
| Usage logs | ~/.llm-router/usage.db | Unencrypted SQLite (filesystem permissions) |
| Classification cache | In-memory | Cleared on process restart |
| Hook scripts | ~/.claude/hooks/ | Local shell scripts, inspectable |
What we do:
~/.llm-router/What you should know:
See SECURITY.md for responsible disclosure policy and docs/SECURITY_DESIGN.md for the full threat model.
Minimal setup — only configure what you have:
# Provider keys (set any combination)
export OPENAI_API_KEY="sk-proj-..."
export GEMINI_API_KEY="AIza..."
export OLLAMA_BASE_URL="http://localhost:11434"
export OLLAMA_BUDGET_MODELS="gemma4:latest,qwen3.5:latest"
# Routing behavior
export LLM_ROUTER_PROFILE="balanced" # budget | balanced | premium
export LLM_ROUTER_POLICY="balanced" # aggressive | balanced | conservative
export LLM_ROUTER_ENFORCE="smart" # smart | hard | soft | off
For teams or environments where .env is restricted:
# User-level config (no project .env needed)
mkdir -p ~/.llm-router && chmod 700 ~/.llm-router
cat > ~/.llm-router/config.yaml << 'EOF'
openai_api_key: "sk-proj-..."
gemini_api_key: "AIza..."
ollama_base_url: "http://localhost:11434"
llm_router_profile: "balanced"
EOF
chmod 600 ~/.llm-router/config.yaml
| Document | Purpose |
|---|---|
| Quick Start (2 min) | Fastest path to working routing |
| Getting Started | Full setup walkthrough |
| Host Support Matrix | Per-host feature comparison |
| Providers | Provider setup and model recommendations |
| Tool Reference | All 60 MCP tools with examples |
| Architecture | Internal design and module structure |
| Troubleshooting | Common issues and fixes |
| Security Design | Threat model and data handling |
Contributions welcome. See CONTRIBUTING.md for full guidelines.
git clone https://github.com/ypollak2/llm-router.git
cd llm-router
uv sync --extra dev
uv run pytest tests/ -q # Run tests (1900+)
uv run ruff check src/ tests/ # Lint
| Name | What it is |
|---|---|
llm-routing | Current PyPI package (pip install llm-routing) |
llm-router | CLI command and GitHub repo name |
claude-code-llm-router | Deprecated legacy package (redirects to llm-routing) |
⭐ If llm-router saved you money, star the repo — it helps other developers discover it.
Issues · Discussions · PyPI · Changelog
MIT License
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
Secure MCP server for MySQL database interaction, queries, and schema management
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F