Prompt Injection
Blocks instruction override, system-prompt leak, and roleplay escapes.
Nine ship-ready security layers chain into a single GuardrailEngine that catches prompt injection, jailbreaks, secret leaks, PII, and command injection across 26 framework, LLM, agent, and RAG connectors.
import { guard } from "bonklm"; const verdict = await guard({ input: userPrompt, layers: [ "prompt-injection", "jailbreak", "pii", "secrets", ], policy: "strict", }); // → blocked at layer 1 in 38ms if (!verdict.safe) return verdict.reason();
BonkLM sits between your user input and your model. Threats radiate inward; verdicts radiate back out. The bat is the intercept surface — every tool call passes through it before it touches the wallet, the database, or the model.
Pattern-based detection — no LLM-in-the-loop tax, no second model to run. Every verdict is reproducible.
Validators run in-process. No network hop, no rate limit, no provider outage to cascade.
Express, Fastify, Next.js, OpenAI, Anthropic, LangChain, ElizaOS, MCP — wire it once, protect everywhere.
Each layer is a named module under packages/core/src. Chain them, swap them, override them. Every call leaves a tamper-evident trace.
Blocks instruction override, system-prompt leak, and roleplay escapes.
Catches DAN, social engineering, adversarial framings, and persona attacks.
Decodes base64, hex, leetspeak, and HTML-comment smuggling before scanning.
Spots context-overflow and delimiter abuse used to pivot tool calls.
Detects SSN, email, phone, addresses across US, EU, and international formats.
Flags leaked API keys, tokens, AWS creds, GitHub tokens, JWT, and more.
Strips reflected XSS, DOM-based payloads, and svg/script smuggles.
Blocks command-injection patterns in shell-execution tool calls.
Chunk-level inspection of LLM streams. Cuts off mid-flight if a layer trips.
A typical wallet-drain attempt hits the engine at step 1, gets pattern-matched at step 3, and the rest of the chain never runs. Open the ElizaOS demo for the full receipt.
26 connector packages cover frameworks, LLM providers, agent runtimes, and RAG stores. Auto-detection in the install wizard wires the right pieces in seconds.
HTTP layers that mount BonkLM as middleware on the inbound request.
Wrap a model client so prompts and completions pass through the engine.
Validators + guards at every agent step — tool calls, retrieved docs, memory writes.
Inspect retrieved documents before they reach the model prompt.
Guard long-term agent memory against poisoning and PII leakage.
Adapters into other security frameworks and protocols.
// Express + Anthropic — drop-in protection
import { GuardrailEngine } from '@blackunicorn/bonklm'
import { anthropicConnector } from '@blackunicorn/bonklm-anthropic'
import { expressMiddleware } from '@blackunicorn/bonklm-express'
const guard = new GuardrailEngine({
sensitivity: 'standard', // 'strict' | 'standard' | 'permissive'
action: 'block', // 'block' | 'sanitize' | 'log' | 'allow'
validators: ['prompt-injection', 'jailbreak', 'pii', 'secret'],
})
app.use(expressMiddleware(guard))
app.post('/chat', anthropicConnector(guard, anthropic))One install, one import, one guard. The rest is configuration.
We ran a stock ElizaOS agent with plugin-solana against five attack variants on devnet. One drained the wallet on every model that ran it. BonkLM blocks the same flow at the tool_call surface — read the full breakdown with the on-chain receipt.
Stateless playground — runs the same validators you’d ship in production against prompt injection, jailbreaks, secret leaks, and PII. No prompt is logged or persisted.
Open playground