Menu
Blog How it works Use Cases
agentsh
Open-source runtime for AI agent security
Beacon
AI endpoint visibility and control
Watchtower
Enterprise control plane for agentsh and Beacon
Contact Us

Intent, Execution, Audit: A Model for Agent Control

Intent. Execution. Audit. A model for agent control

A few days ago I watched an agent do something that perfectly captured the problem.

It wasn’t an attack. It wasn’t “AI gone rogue.” It was just normal agent behavior under ambiguity.

I asked it to fix a failing build and clean up whatever was causing it. The steps looked reasonable. The log looked clean. The diff looked tidy. And then a config that should have been treated as an invariant got “cleaned up” too.

We had perfect visibility into what happened.

We did not have control over whether it could happen.

That is the mismatch I want to name in this post.

In the last post I described the control gap: agents operate at machine speed, oversight operates at human speed, and the gap between those two keeps widening.

This post is about where that gap actually lives - and why the controls we reach for first keep failing once agents have real tools.

A concrete model: three planes of control

Most teams try to govern agents using one of two approaches:

Those are necessary.

But they are not the missing piece.

The missing piece is the middle: control during execution, at the moment actions are taken.

Here is the simplest model I’ve found that maps cleanly onto reality:

1) Intent controls (before)

What we want the agent to do.

Intent controls influence behavior. They do not constrain capability.

2) Execution controls (during)

What the agent can actually do.

Execution controls are the difference between “the agent shouldn’t do that” and “the agent can’t do that.”

3) Audit controls (after)

What happened, and how we learn / prove it.

Audit controls make systems governable. They do not make them safe.

If you only have prompts before and logs after, you don’t have control. You have hope and hindsight.

Why agents break the assumptions behind traditional control

Traditional software is easier to govern because most of its behavior is deterministic. Even when systems are complex, the runtime behavior is constrained by code paths we can reason about.

Agentic systems break that assumption.

The “program” is not a fixed code path.

The agent is making decisions probabilistically, in a loop, based on whatever context it is reading and whatever tool output it just saw.

And increasingly, agents do not just recommend actions.

They take them.

Once an agent has tools, it has real hands:

At that point, bad outputs matter less than bad actions.

Why intent controls fail once agents have real tools

When teams say “we’ll add guardrails,” they usually mean “we’ll add more instructions.”

That works surprisingly well for assistive systems, where the output is text and the user is still the actuator.

It breaks down when the system is the actuator.

1) Agents don’t just follow the prompt - they follow the prompt plus everything around it

Agents don’t only consume your instruction. They consume:

Security boundaries often rely on separating instructions from data.

Agents are designed to blur that line. They are built to treat text as actionable context. That is what makes them helpful.

It is also what makes them vulnerable.

Untrusted text can steer how delegated authority is exercised. (If you like formal names for problems, this starts to look a lot like the "confused deputy" pattern in complex tool chains.)

2) “Policy in natural language” is not policy

A policy that is not enforced at the point of action is not a policy.

It is a suggestion.

You can write:

…and still get a destructive outcome because the agent made a technically plausible assumption:

The failure mode is rarely malicious.

Often it is technically reasonable.

But the outcome can still be painful.

3) Intent controls don’t compose

Even if each instruction is individually reasonable, agents compose them under time pressure.

“Fix the failing test.”
“Clean up whatever is causing it.”
“Update the dependency.”
“Remove unused config.”

Those are normal tasks. In combination, with broad privileges, they can create an unsafe path.

The risk is not one bad instruction. The risk is a plausible chain of small decisions executed quickly.

Why after-the-fact controls fail as prevention

Auditability matters. It’s essential for debugging, governance, and compliance.

But it doesn’t solve the core risk:

Audit controls answer “what happened?”

Execution controls answer “can this happen?”

A lot of current “agent safety” is basically trying to use observability as a substitute for constraint.

It’s valuable. It is not enough.

So what does control during execution actually mean?

Execution-time control is not mystical. It is classic operational security applied to agentic workloads.

It means:

  1. capabilities are explicit
    What tools exist? What actions are possible?

  2. capabilities are scoped
    What can those tools touch? Which paths, which domains, which accounts, which environments?

  3. capabilities are enforced at runtime
    Not in a README. Not in a prompt. In the actual execution environment.

  4. escalation is real
    There should be meaningful boundaries between “safe” and “dangerous,” and crossing those boundaries should be deliberate.

One concrete example: an agent might be allowed to run shell commands and modify files inside a repository, but unable to make outbound network requests unless the destination domain is explicitly allowlisted.

That is the shift:

That is what it means to put guardrails at the point of execution.

Not more text.

More constraint.

What comes next

In the next post, I’m going to get much more concrete.

We’ll define a simple risk taxonomy for agent actions (read-only → reversible writes → destructive ops → external side effects), and walk through the execution-time guardrail patterns that actually reduce failures without killing velocity.

Because the real question underneath all of this remains the same:

How do we let agents move at machine speed without forcing humans to surrender control?

That is the control gap.

And the only place it closes is the execution layer.

AgentSH

As part of this work, we're building AgentSH, an open-source runtime exploring execution-time controls for agentic workloads.

If you’re running agents in dev, CI, or production, I’d love to hear:

← All posts

Built by Canyon Road

We build Beacon and AgentSH to give security teams runtime control over AI tools and agents, whether supervised on endpoints or running unsupervised at scale. Policy enforced at the point of execution, not the prompt.

Contact Us →
Learn the category: Execution-Layer Security → See examples: Use Cases →