A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Claude Code execution playbook with 3 pilot modes: cost-first (Haiku), quality-first (Sonnet), ceiling-elevation (Opus).
Version: 0.4.1 (2026-05-08 — opus-pilot added; 5 benchmark-driven fixes for haiku/sonnet-pilot) License: MIT Validated on: Claude Sonnet 4.6 + Haiku 4.5 + Opus 4.7 (6-agent × 10-question pilot-vs-vanilla benchmark)
A portable, opinionated execution playbook for Claude Code that provides:
What "quality floor protection" means: on structured wiki-extraction tasks, Sonnet+SKILL ≈ Sonnet Plain (both ~283/300 vs Opus 286/300 in A/B/C benchmark). The SKILL's value is in preventing quality drops on complex tasks (ambiguous requirements, multi-step agentic, cross-module design) — not in boosting already-strong performance on routine extraction. Think of it as insurance: no premium on good days, prevents catastrophic failure on hard days.
Built on the empirical finding that prompt structure + sub-agent delegation often matters more than raw model capability. Documented in two arXiv-backed studies:
AgentOpt (arxiv 2604.06296): On HotpotQA, Opus alone = 31.71%; Ministral planner + Opus solver = 74.27%. Augment Code AGENTS.md eval (2026-04): Best-quality docs ≈ one model tier upgrade. Bad docs are worse than none.
claude-pilot-suite/
├── rules/ # Auto-loaded baseline discipline
│ ├── core.md Language / Git / surgical changes / quality
│ ├── subagent-strategy.md Sub-agent delegation criteria + advisor mode
│ ├── context-management.md Context window monitoring + compact strategy
│ ├── output-discipline.md Plain-text formatting rules (no fluff)
│ ├── haiku-pilot.md Mode declaration + escalation gates (Haiku→Sonnet/Opus)
│ ├── sonnet-pilot.md Mode declaration + escalation gates (Sonnet→Opus)
│ └── opus-pilot.md Mode declaration for Opus ceiling-elevation (no upward escalation)
├── skills/ # On-demand playbooks (loaded only when triggered)
│ ├── haiku-pilot/SKILL.md Default-Haiku execution + 7-step pre-flight (incl. Source-Verify gate)
│ ├── sonnet-pilot/SKILL.md Quality-first Sonnet execution + 7-step pre-flight (incl. Line-Level Source-Verify)
│ └── opus-pilot/SKILL.md Ceiling-elevation playbook: 5 mechanisms + 3 Opus-specific pre-flights
├── examples/
│ └── CLAUDE.md.template Sample CLAUDE.md to integrate the suite
├── README.md
├── INSTALL.md
└── LICENSE
Trigger: "haiku" / "Haiku" / "Haiku mode" / "haiku-pilot"
Philosophy: weak planner + strong delegation > strong planner alone. Default to Haiku, dispatch sub-agents aggressively, escalate only on quantitative gate triggers.
Escalation gates:
advisor()implementerExpected savings: 70–85% vs full-Opus baseline (assuming 80/15/5 task distribution).
Trigger: "sonnet" / "Sonnet" / "Sonnet mode" / "sonnet-pilot"
Philosophy: deep context engineering + Opus-on-demand via advisor() ≈ full Opus quality.
Pre-flight enforcements (Sonnet's high confidence biases toward skipping these):
Choice: ... / Rejected: ... — Reason: ...)git diff --stat + per-file explanation)Escalation gate to Opus: stricter than haiku-pilot's, because Opus is 5× the cost.
Expected savings: 55–65% vs full-Opus baseline.
Trigger: "opus" / "Opus" / "Opus mode" / "opus-pilot"
Sub-mode modifiers (append to any base trigger):
Opus 1M / 1M context / extended context → activate 1M context windowOpus xhigh / xhigh thinking / extended thinking → activate extended thinking budgetPhilosophy: Opus is the ceiling — no upward escalation. The goal is to exceed vanilla Opus baseline on hard tasks through structural mechanisms, not by calling a stronger model.
5 ceiling-elevation mechanisms:
Choice: … / Rejected: … — Reason: …3 Opus-specific pre-flights (fires before any hard task):
Hard rule: Opus is the ceiling. All escalation goes down (Sonnet/Haiku sub-agents), never up.
Citation grounding: 9 research papers (P01–P09) + P10 AgentOpt. See skills/opus-pilot/SKILL.md §References.
The suite is the synthesis of benchmarks measuring how prompt structure affects answer quality across model tiers. Key findings:
| Finding | Source | Implication |
|---|---|---|
| Sub-agent delegation > model upgrade for multi-step tasks | AgentOpt arxiv 2604.06296 | Default to Haiku + delegate, not jump to Opus |
| Citation density correlates with reviewer-verifiable quality | v0.2.1 benchmark | Pre-flight #4 Citation Anchor enforcement |
| Anti-expansion self-check tables prevent 2.24× word inflation | v3 benchmark | Self-Review Loop uses single-table format |
| Sonnet's confidence bias = skip self-review unless externally enforced | A/B/C benchmark | Pre-flight #4 is mandatory in Sonnet Pilot |
| Haiku's tendency to skip assumption disclosure | haiku-pilot.md §Gotchas | Pre-flight #2 enforces it |
| Haiku fabricates numbers on hard citation tasks (5/20 cases) | v0.2.1 benchmark | Pre-flight #5 Source-Verify gate (grep -i every cited number against source) |
| Self-check overhead can exceed value on tiny answers (Q01/Q04 net-negative vs vanilla) | v0.4.1 benchmark | Task-type Fast-path: easy recall → one-line fast-path: <reason> |
| Haiku anchor density collapses on hard tasks (7/100w medium → 1/100w hard) | v0.4.1 benchmark | Pre-flight #4 hard-task threshold raised to ≥ 5 anchors/paragraph |
Sonnet self-check word-count enforcement fails without wc -w gate (Q10 1115w, +39.4% over) | v0.4.1 benchmark | Hard-cap added: hard-task over 800w must rewrite, not apologize |
| Hard task type misclassification causes gate no-hit (architecture-hard scored as medium) | v0.4.1 benchmark | Escalation gate now auto-triggers on Q type tagged Architecture/Counter-factual/Synthesis |
| Section-level citation vs line-level: 0.5pt D1 gap (A2=9.5 vs A3=10.0) | v0.4.1 benchmark | Pre-flight #7 Source-Verify requires (P0X §Y.Z:LineN) on hard tasks |
See INSTALL.md for three installation methods (clone+symlink / copy / subtree).
Quick version:
# Clone next to your project, then symlink
git clone https://github.com/<your-fork>/claude-pilot-suite.git
ln -s "$(pwd)/claude-pilot-suite/rules" ~/your-project/.claude/rules-pilot
ln -s "$(pwd)/claude-pilot-suite/skills/haiku-pilot" ~/your-project/.claude/skills/haiku-pilot
ln -s "$(pwd)/claude-pilot-suite/skills/sonnet-pilot" ~/your-project/.claude/skills/sonnet-pilot
ln -s "$(pwd)/claude-pilot-suite/skills/opus-pilot" ~/your-project/.claude/skills/opus-pilot
# Then update your project's CLAUDE.md to @-import the rules:
# @.claude/rules-pilot/core.md
# @.claude/rules-pilot/subagent-strategy.md
# @.claude/rules-pilot/context-management.md
# @.claude/rules-pilot/output-discipline.md
# @.claude/rules-pilot/haiku-pilot.md
# @.claude/rules-pilot/sonnet-pilot.md
# @.claude/rules-pilot/opus-pilot.md
See examples/CLAUDE.md.template for a complete example.
This suite is a starter pack. Two files require workspace-specific customization:
rules/core.md § Production Safety Red LineThe default text uses abstract markers ("whatever your workspace's production markers are"). Replace with concrete markers for your stack:
main / production / masterprod / production*-prod / prod-*production / prodENV=productionrules/subagent-strategy.md § Sub-Agent Dispatch TableThe default table uses generic agent names (researcher, implementer, test-writer, etc.). Replace with the agents in your .claude/agents/ directory.
If you don't have specialized agents, the general-purpose agent (always available) accepts a model override and works as a fallback.
quick-code-reviewer if you have one; otherwise the loop is just a git diff --stat checklist.output-discipline.md (included): plain-text formatting baseline; all three pilot SKILLs reference it.AGENTS.md cross-tool root manifest)Originally part of cc-workspace — extracted as portable plugin pack for community use.
Issues / PRs welcome:
License: MIT (see LICENSE).
Human + AI music production workflow for Suno - skills, templates, and tools
Claude Code skill for YouTube creators — channel audits, video SEO, retention scripts, thumbnails, content strategy, Sho
AI image generation skill for Claude Code -- Creative Director powered by Gemini
A Claude Code skill by Hao (駱君昊) that learns your Facebook voice and auto-posts to FB / IG / Threads / X with a 14-day c