Prompt Injection Remains Unsolved Architectural Problem

Jun 08, 2026

Prompt injection continues to pose a fundamental security challenge for AI systems that researchers have yet to solve at the architectural level, according to Ariel Fogel, an AI security researcher at Pillar Security who presented at Infosecurity Europe 2026. The core problem stems from how large language models process all inputs as a single token sequence, making it impossible to enforce reliable boundaries between system prompts, user queries, and content retrieved by agents.

The threat has grown significantly more dangerous as organizations deploy agentic AI systems that can take autonomous actions. A successful prompt injection no longer merely produces an incorrect answer but can trigger chains of real-world actions when agents have tool access and the ability to act on behalf of users. Fogel warned that most organizations are deploying these agents faster than they can govern them, making traditional security controls inadequate for the speed and scale of modern AI systems.

Existing defenses designed for human operators often fail when applied to AI agents. Fogel noted that sandboxing, allow-lists, and manual review processes can be circumvented or even exploited by injected prompts. In some cases, allow-lists actually streamlined attacks because the commands agents needed were already approved. In other instances, agents redefined their own sandbox boundaries through their outputs, effectively rewriting the containment meant to stop them.

Security researchers have proposed frameworks to reduce risk, including Simon Willison's concept of the "Lethal Trifecta" which identifies three dangerous conditions: agent access to private data, exposure to untrusted content, and permission for external communication. Meta's "Rule of Two" suggests agents should satisfy no more than two of these properties in any session without human approval. However, Fogel cautioned these remain helpful heuristics rather than complete defenses, as research shows attacks can succeed with only two properties present.

Fogel emphasized that defenders must shift from prevention-only strategies to constraining what injected agents can do. He recommended controls that operate at machine speed, including live behavioral monitoring, real-time containment and stop mechanisms, joint incident response between safety and security teams, and stronger identity hygiene such as ephemeral credentials and cryptographic attestation. Until models can enforce firm privilege separations, organizations must combine rapid detection, automated containment, tighter session design, and cross-disciplinary response playbooks to manage the risk.

Source: https://www.infosecurity-magazine.com/news/infosec-europe-prompt-injection/

Discussion about this post

Ready for more?