Prompt Injection in Tool-Calling Agents: A Practical Containment Design That Blocks Unauthorized Actions
The defensive posture most teams adopt—"we'll write really good system prompts telling it not to follow bad instructions"—fails consistently because LLMs fundamentally cannot distinguish between instructions from you and instructions embedded in user data. That architectural limitation means containment has to happen outside the model, not inside the prompt. Someone will eventually paste a “helpful” snippet into your agent chat that contains a hidden instruction telling the model to email secrets, delete files, or spam an API endpoint. The agent will comply because it was built to be obedient and because tool-calling turns obedience into side effects. The boring fix that keeps working is also the least “AI” sounding fix: treat the model like an untrusted user and put every tool call behind a server-side permissions gate that the model cannot bypass. Prerequisites Basic API knowledge, basic auth concepts (roles, scopes), and one honest assumption: prompts can be manipulate...
.png)