When AI Agents Promise Things They Cannot Deliver

Lisa Hawkins

Director, Threat Research & Intelligence, Vercon

A frequent query landing on my desk lately concerns the phenomenon of AI agents committing to actions they cannot, in fact, execute. The underlying interest isn't theoretical; it's a practical need for a defensible posture against this specific vulnerability.

Why Agents Promising the Undeliverable Matters Now

Early in its emergence, organizations often filed this particular pattern under 'edge case.' That categorization no longer holds. This vulnerability manifests across diverse industries, and the necessary controls fundamentally differ from those typically present in standard communications security programs.

AI agent security has shifted from an infrequent agenda item to a continuous operational imperative. The drivers for this metamorphosis are familiar: the cost of attacker tooling has plummeted, the proliferation of customer interaction channels accelerates, and regulatory bodies have begun to scrutinize these exposures with increasing rigor. Organizations that delayed action pending a formal mandate now find themselves approximately a year behind those that moved preemptively. This gap widens continuously, especially as generative AI tools reduce the cost and effort of crafting credible impersonations to near zero.

Observing search traffic trends in this domain reveals a telling signal, less in the high-profile incident headlines and more in the surge of long-tail queries originating from within organizations themselves. Terms like "hallucination policy template" or "hallucination verification workflow" indicate the foundational work executives are quietly attempting to implement.

The Threat Pattern in Practice

A significant challenge embedded within this threat model is its cross-functional nature. Information Technology often manages the telephony systems. Contact center operations own the human interaction workflows. And the AI intake agent itself might fall under a specific product owner's purview. Each team executes its responsibilities commendably within its defined scope, but the inherent gaps between these functions create the exposure. Bridging these gaps demands a coordinated, holistic review, not merely the acquisition of another security tool.

In operational environments, this pattern almost invariably surfaces first within workflows initially designed for legitimate convenience. This includes account recovery processes, manager override procedures, after-hours intake, or any mechanism intended to maintain operational flow during anomalous conditions. Adversaries dissect these pathways with the same thoroughness as an internal auditor, frequently identifying and exploiting them first. The primary determinant of a successful attack here is not the sophistication of the attacker's tools, but rather the degree of friction the attacker encounters once they have successfully engaged within the workflow.

What Effective Defense Looks Like

When we initiate reviews in this area, we typically begin with a single, concrete inquiry: what is the single most damaging action a solitary inbound contact could initiate today, and what precise conditions would need to be met for that contact to succeed? The answers are rarely comforting. However, they almost always point to actionable remediations, often achievable through workflow modifications rather than capital expenditure on new technology.

Our internal shorthand with clients for this approach is "raise the cost." Effective controls do not promise absolute prevention. Instead, they aim to elevate the time, effort, and resources required for a successful attack to a level where the attacker's return on investment diminishes, compelling them to seek less resilient targets. This principle underpins every mature security program, and it proves equally effective here when applied with consistent discipline rather than as a reactive, piecemeal effort.

Practical Next Steps for Your Team

Should your team currently be grappling with these considerations, a Communications Security Assessment offers a structured starting point. The deliverables include an executive-level report and a prioritized remediation roadmap, distinctly free of vendor-specific product recommendations.

If only one concept is retained from this discussion, let it be the smallest possible review. Document the specific actions a single inbound interaction can authorize within your most sensitive workflow. Then, critically assess whether each of those actions could withstand a determined impersonation attempt. Most teams emerge from this exercise with a concise, prioritized list of modifications that generate a positive return within a single fiscal quarter, often without necessitating any new technology acquisitions.

What We Are Watching Next

Over the coming two quarters, watch for the migration of hallucination risk - broadly defined - from primarily resting with security teams into the operational, legal, and customer experience domains. This represents a healthy, necessary maturation of the understanding of this exposure. Planning for this shift now, rather than merely reacting to it later, will be a significant differentiator. We will continue to disseminate our field observations as this pattern evolves.

Sources & Further Reading

#hallucination#liability

← Previous

The Difference Between AI Safety and AI Security in Customer Channels

Cross-Channel Pivots: How One Email Becomes a Voice Attack