Why we picked deterministic over LLM-in-the-loop
Pattern engines, not classifier models. The latency, dependency, and auditability tradeoffs behind BonkLM's architecture.
Half the LLM-security market in 2026 routes its guardrails through another LLM. “Use a small model to check the big model.” It sounds reasonable until you deploy it. We picked the opposite — BonkLM’s validators are deterministic pattern engines. This post is the why.
What “LLM-in-the-loop” actually costs
Every guardrail check becomes a network round-trip to a model provider. At 50–200ms per call, this is not a small tax. At tool-callgranularity (the interesting surface for agents), it’s 50–200ms per tool, per step, per agent run. Latency budgets evaporate. Cost-per-conversation triples.
Then the provider has an incident. The classifier model is unreachable. Now the security layer is the load-bearing wall. Do you fail open (silently allow everything) or fail closed (lock out every customer)? Both answers are bad. The third option is “cache verdicts and pray”, which is what most production systems quietly do.
What you give up with patterns
Honest caveats first. A pattern engine can’t reason about novel paraphrases. The French-language base64-embedded payload that crystallised in our variant-4 test would defeat an English-only regex set. We solve that by decoding before scanning (the Reformulation validator) and by maintaining multilingual pattern lists. We don’t solve it by asking the model to think harder.
Pattern engines also can’t catch “structurally clean, semantically nasty” prompts where every individual word is benign and the malice emerges from composition. Those are real attacks. They are also rare relative to the volume of in-the-wild injections, which are overwhelmingly variants on a small set of templates the open-source community has been cataloguing for two years.
The deterministic guarantees
What you get in exchange:
- Sub-millisecond verdicts, in-process, no network. A validator chain adds nothing to your p99.
- No provider dependency.The security layer doesn’t go down with OpenAI’s status page.
- Reproducible verdicts.The same input always gets the same verdict. You can ship the engine to a customer’s VPC and convince their security team it will behave identically on day 1000.
- Auditable.Every match is a specific pattern that lives in source control. There’s no “the model felt different today”.
When LLM-in-the-loop is the right call
Honest version: if your threat model is novel-paraphrase attacks generated by skilled adversaries with months of iteration time, a deterministic engine is a starter for a defence-in-depth stack, not the whole thing. The right composition is a deterministic first pass (cheap, fast, catches 95% of in-the-wild patterns) plus an optional ML secondary for the surface that can afford the latency.
BonkLM is the first pass. We’ll write the integration patterns for layering an ML secondary on top when more customers ask for it.