A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Deterministic LLM prompt defense scanner — 12 attack vectors, pure regex, zero AI cost, < 5ms
Deterministic LLM prompt defense scanner. Checks system prompts for missing defenses against 17 attack vectors (12 base + 5 agent-specific in v1.4). Pure regex — no LLM, no API calls, < 5ms, 100% reproducible.
$ npx prompt-defense-audit "You are a helpful assistant."
Grade: F (8/100, 1/12 defenses)
Defense Status:
✗ Role Boundary (80%)
Partial: only 1/2 defense pattern(s)
✗ Instruction Boundary (80%)
No defense pattern found
✗ Data Protection (80%)
No defense pattern found
...
OWASP lists Prompt Injection as the #1 threat to LLM applications. Yet most developers ship system prompts with zero defense.
We scanned 1,646 production system prompts from 4 public datasets. Results:
Existing security tools require LLM calls (expensive, non-deterministic) or cloud services (privacy concerns). This package runs locally, instantly, for free.
Our philosophy: The deterministic engine is the product. AI deep analysis is optional — because regex is already strong enough for 90%+ of use cases. Zero AI cost by default.
npm install prompt-defense-audit
import { audit, auditWithDetails } from 'prompt-defense-audit'
// Quick audit
const result = audit('You are a helpful assistant.')
console.log(result.grade) // 'F'
console.log(result.score) // 8
console.log(result.missing) // ['instruction-override', 'data-leakage', ...]
// Detailed audit with per-vector evidence
const detailed = auditWithDetails(mySystemPrompt)
for (const check of detailed.checks) {
console.log(`${check.defended ? '✅' : '❌'} ${check.name}: ${check.evidence}`)
}
# Inline prompt
npx prompt-defense-audit "You are a helpful assistant."
# From file
npx prompt-defense-audit --file my-prompt.txt
# Pipe from stdin
cat prompt.txt | npx prompt-defense-audit
# JSON output (for CI/CD)
npx prompt-defense-audit --json "Your prompt"
# Traditional Chinese output
npx prompt-defense-audit --zh "你的系統提示"
# List all 12 attack vectors
npx prompt-defense-audit --vectors
GRADE=$(npx prompt-defense-audit --json --file prompt.txt | node -e "
const r = JSON.parse(require('fs').readFileSync('/dev/stdin','utf8'));
console.log(r.grade);
")
if [[ "$GRADE" == "D" || "$GRADE" == "F" ]]; then
echo "Prompt defense audit failed: grade $GRADE"
exit 1
fi
Based on OWASP LLM Top 10, empirical research on 1,646 production prompts, and structured analysis of six documented crypto AI agent incidents (see CASE_STUDIES.md).
| # | Vector | What it checks | Gap rate* |
|---|---|---|---|
| 1 | Role Escape | Role definition + boundary enforcement | 92.4% |
| 2 | Instruction Override | Refusal clauses + meta-instruction protection | — |
| 3 | Data Leakage | System prompt / training data disclosure prevention | 9.4% |
| 4 | Output Manipulation | Output format restrictions | 88.3% |
| 5 | Multi-language Bypass | Language-specific defense | 64.3% |
| 6 | Unicode Attacks | Homoglyph / zero-width character detection | — |
| 7 | Context Overflow | Input length limits | — |
| 8 | Indirect Injection | External data validation | 97.8% |
| 9 | Social Engineering | Emotional manipulation resistance | 71.4% |
| 10 | Output Weaponization | Harmful content generation prevention | — |
| 11 | Abuse Prevention | Rate limiting / auth awareness | — |
| 12 | Input Validation | XSS / SQL injection / sanitization | — |
Added after analysing six documented crypto AI agent incidents. Each vector is grounded in a specific real-world failure — see CASE_STUDIES.md for primary sources and root-cause analysis.
| # | Vector | What it checks | Reference incident |
|---|---|---|---|
| 13 | Encoding-aware Indirect Injection | Treating decoded/translated content (Morse, base64, ROT13) as untrusted data, not instructions | Grok×Bankrbot Morse code, May 2026 |
| 14 | Function/Tool Semantic Immutability | Function or tool semantics cannot be redefined mid-conversation | Freysa approveTransfer redefinition, Nov 2024 |
| 15 | Memory Provenance Awareness | Retrieved RAG memory may be poisoned by adversaries on other platforms | ElizaOS memory injection, Princeton 2025 |
| 16 | Cross-Agent Authorization Boundary | Authority does not silently inherit from another agent's output | Grok×Bankrbot principal confusion, May 2026 |
| 17 | Financial Transaction Guardrails | Hard limits, multi-sig, refusal thresholds for transactions | Lobstar Wilde decimal-error transfer, Feb 2026 |
*Gap rate = % of 1,646 production prompts missing this defense. Source: research data.
| Grade | Score | Meaning |
|---|---|---|
| A | 90–100 | Strong defense coverage |
| B | 70–89 | Good, some gaps |
| C | 50–69 | Moderate, significant gaps |
| D | 30–49 | Weak, most defenses missing |
| F | 0–29 | Critical, nearly undefended |
audit(prompt: string): AuditResultQuick audit. Returns grade, score, and list of missing defense IDs.
interface AuditResult {
grade: 'A' | 'B' | 'C' | 'D' | 'F'
score: number // 0-100
coverage: string // e.g. "4/12"
defended: number // count of defended vectors
total: number // 12
missing: string[] // IDs of undefended vectors
}
auditWithDetails(prompt: string): AuditDetailedResultFull audit with per-vector evidence.
interface AuditDetailedResult extends AuditResult {
checks: DefenseCheck[]
unicodeIssues: { found: boolean; evidence: string }
}
interface DefenseCheck {
id: string
name: string // English
nameZh: string // 繁體中文
defended: boolean
confidence: number // 0-1
evidence: string // Human-readable explanation
}
ATTACK_VECTORS: AttackVector[]Array of all 12 attack vector definitions with bilingual names and descriptions.
This tool does NOT:
Static prompt analysis is layer 1 of a defense-in-depth model. The following classes of attack require defenses at other layers — this scanner does not replace them, and we say so explicitly so it isn't oversold:
A pass on this scanner is necessary, not sufficient. See CASE_STUDIES.md for an honest mapping of which documented incidents this scanner would flag versus which it cannot help with.
,) triggers Unicode detection — known limitation.prompt-defense-audit is a static, design-time check. It pairs cleanly with runtime-side projects that detect attacks as they happen:
| Lifecycle stage | Tool | Question it answers |
|---|---|---|
| Build / CI gate | prompt-defense-audit (this) | "Is the prompt designed to resist attacks?" |
| Runtime detection | Agent-Threat-Rule (ATR) | "Is an attack happening right now?" |
Failure modes are orthogonal: the audit misses novel attacks not anticipated at design time; ATR misses prompts that have no resistance even before traffic arrives. Used together they form a defense-in-depth pattern (CI gate → runtime detection).
Detailed integration including the 1:N vector mapping (20 defense vectors → 9 ATR detection categories), recommended usage pattern, and cross-references: docs/integrations/agent-threat-rules.md.
This tool is backed by empirical analysis of 1,646 production system prompts from 4 public datasets:
| Dataset | Size | Source |
|---|---|---|
| LouisShark/chatgpt_system_prompt | 1,389 | GPT Store custom GPTs |
| jujumilk3/leaked-system-prompts | 121 | ChatGPT, Claude, Grok, Perplexity, Cursor, v0 |
| x1xhlol/system-prompts-and-models | 80 | Cursor, Windsurf, Devin, Augment |
| elder-plinius/CL4R1T4S | 56 | Claude, Gemini, Grok, Cursor |
Key references:
See CONTRIBUTING.md. Key areas: new language patterns, better regex accuracy, integration examples.
See SECURITY.md. Report vulnerabilities to dev@ultralab.tw — not via GitHub issues.
MIT — Ultra Lab
This library powers prompt defense detection across multiple production deployments and security frameworks. 11 PRs merged into indicator-org repos (Microsoft / Cisco / OWASP / UK Government AISI / awesome-list curators):
agent-compliance — PromptDefenseEvaluator integrated with MerkleAuditChain + PromotionGate, merged Apr 2026.mcp-scanner — PromptDefenseAnalyzer module (12-vector regex audit), merged Apr 2026 (proposal → merge in 39 minutes).mcp_trust_boundary scenario — adversarial-seeding regression test, merged May 2026.goal_hijack scenario — foot-in-the-door variant, merged May 2026.inspect_evals — SimpleQA config migration to single-file --run-config, merged Jun 2026.Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F
Unity MCP acts as a bridge between AI assistants and your Unity Editor. Give your LLM tools to manage assets, control sc
0
via CLI