
Over the past few months, I have gone from occasionally trying coding assistants to using AI agents as a daily part of building software. Tools like Cursor, Claude Code, and Codex make it possible to move faster than I ever could before. You can explore ideas, refactor code, wire up systems, and iterate at a pace that still feels slightly unreal.
When agents work well, they do not just save time. They change how you build. You start delegating more. You trust the system to try paths you would not take yourself. You let it run.
And then you see the other side.
A file disappears that should not have. A config gets wiped. A helpful cleanup turns into a destructive edit. Sometimes it is recoverable. Sometimes it becomes a scramble. If you spend time in developer forums or community threads, you will see plenty of similar stories. Agents confidently make breaking changes while still doing the right thing according to their instructions.
It is rarely malicious. Often it is technically reasonable.
But the outcome can still be painful.
That mix, astonishing speed plus occasional fragility, is the defining experience of agentic workflows right now. It also points to a bigger issue that we are going to talk about a lot over the next few years.
Agents operate at machine speed. Oversight operates at human speed.
There is a growing gap between the speed at which agents can act and the amount of control humans can realistically exert over those actions. I have started thinking of it as the control gap.
This post is the first in a short series about what we are seeing as agents start to touch real systems, and why the current obvious controls do not quite match the reality of how agents behave.
What changed: software started doing work, not just suggesting it
Traditional software is easier to govern because most of its behavior is deterministic. We know, roughly, what a deployed service does at runtime, and our controls reflect that.
- permissions and least privilege
- code review and static analysis
- separation of duties
- audit trails and change management
- runtime controls in the environments where things can go wrong
LLM driven agents break a core assumption behind those controls. The next action is not a fixed code path. The agent is making decisions probabilistically, in a loop, based on whatever context it is reading and whatever tool output it just saw.
And increasingly, agents do not just recommend actions. They take them.
That shift from assistive intelligence to operational agency is where things get interesting and risky. Because once an agent has tools, it has real hands.
- it can touch the filesystem
- it can run processes
- it can fetch or exfiltrate data over the network
- it can mutate state in external systems via APIs
At that point, bad outputs matter less than bad actions.
What we are seeing in practice
Across real usage, especially with coding agents and tool driven workflows, some patterns show up again and again.
1) Agents inherit broad privileges by default
When you want an agent to be useful, you tend to give it the environment it needs. Repo access, package managers, network access, credentials in env vars, and so on.
It is not because anyone is careless. It is because the fastest way to get value is to plug the agent into your existing workflow.
But broad access means broad blast radius.
2) The input surface is huge, and much of it is untrusted
Agents do not only listen to the user prompt. They consume web pages, READMEs and PR comments, tickets and docs, tool output, logs, and stack traces.
Security boundaries often rely on cleanly separating instructions from data. Agents are designed to blur that line. They are built to treat text as actionable context. That is what makes them helpful. It is also what makes them vulnerable to things like prompt injection and indirect prompt injection.
If you want a good overview of the prompt injection problem, see:
- OWASP Top 10 for LLM Applications: Prompt Injection
- UK NCSC: Prompt injection is not SQL injection
- Anthropic research: Prompt injection defenses
3) Human approval does not scale linearly
A common response to risk is, we will add approvals.
Approvals can help, especially for high risk operations. But once an agent is effective, it generates a lot of actions.
- dozens of small edits
- repeated retries
- long tool chains
- iterative command sequences
If you ask for approval 50 times, people start approving reflexively. If you ask for approval 0 times, you have no brakes. The hard problem is not add approvals. It is how do you keep humans meaningfully in control without turning them into a rubber stamp.
4) The scariest failures are not always attacks
Some of the most damaging outcomes do not require a sophisticated adversary.
They come from normal agent behavior under ambiguity.
- doing the obvious thing with the wrong assumption
- applying a refactor that breaks a security invariant
- moving fast and breaking something real
- accidentally pulling and executing untrusted code
- handling secrets carelessly, for example copying into logs, issues, or tool output
In agentic systems, reliability failures and security failures start to look similar, because both are unsafe actions executed quickly.
What other people are saying, and why it is converging now
If you zoom out, you can see the ecosystem converging on the same worry from different directions.
- Security folks are increasingly blunt that prompt injection is not a patch it once class of problem. It behaves more like a confused deputy issue in complex tool chains.
- Governance conversations are shifting from model safety to operational safety: auditability, least privilege, and controlled capabilities.
- Tool builders are adding guardrails, confirmations, and modes. Everyone is feeling the same pressure. Agents are getting more capable faster than humans can supervise.
In other words, this is not one company’s quirky opinion. It is an emerging consensus that agents change where the control problem lives.
Frameworks like the NIST AI Risk Management Framework also reflect this shift. The focus is moving from a single mitigation to a broader discipline: governance, measurement, and operational controls.
The control gap, the real issue
Here is the best way I can describe what is happening.
- agents create action volume
- action volume creates oversight fatigue
- oversight fatigue creates implicit trust
- implicit trust creates blast radius
And just slow down is not a satisfying answer, because the upside of agents is precisely that they compress work into a tight loop.
So the question becomes:
How do we let agents move at machine speed without forcing humans to surrender control?
Execution is where irreversibility lives. Once the action happens, an API call, a delete, a credential read, a network request, you can log it, explain it, and postmortem it. But you cannot unring the bell.
Most of today’s controls either try to shape intent before the run, prompts, policies, best practices, or explain what happened after the run, logs, traces. The gap shows up in the middle. Control during execution, at the moment actions are taken.
That is the gap we are trying to understand.
What comes next
In the next post, we will share a more concrete model for the problem and why intent controls and after the fact controls keep failing once agents have real tools. We will also describe what it means to put guardrails at the point of execution.
If you are running agents in dev, CI, or production:
- What is the action you most want a seatbelt for?
- Where have approvals helped, and where have they turned into noise?
- What is the failure mode that surprised you most?
We are working on this problem at Canyon Road.
AgentSH
As part of this work, we are building AgentSH, an open source project exploring execution time controls for agentic workloads.
← All postsBuilt by Canyon Road
We build Beacon and AgentSH to give security teams runtime control over AI tools and agents, whether supervised on endpoints or running unsupervised at scale. Policy enforced at the point of execution, not the prompt.
Contact Us →