Menu
Blog How it works Use Cases
agentsh
Open-source runtime for AI agent security
Beacon
AI endpoint visibility and control
Watchtower
Enterprise control plane for agentsh and Beacon
Contact Us

A Year of AI Tool Exploits, One Root Cause

A Year of AI Tool Exploits, One Root Cause

Over the past year, researchers disclosed fourteen vulnerabilities and exploit chains across Claude Code, the Anthropic Filesystem MCP Server, the MCP Inspector, Claude Desktop, Claude.ai, and Cursor -- most of them high severity. The specific bugs differ. The root cause is the same: untrusted content or configuration was allowed to drive privileged actions without deterministic runtime enforcement. When there is no separate enforcement point between "what the agent decided to do" and "what the system actually did," a single manipulated input compromises the whole chain.

This post walks through each incident, explains what it actually exploited, and shows how execution-layer enforcement would have contained the blast radius.

A note on scope: AgentSH directly supports Claude Code, Cursor, and SDK-based agent deployments. The Claude Desktop and Claudy Day incidents are included because they illustrate the same attack class on surfaces where no execution-layer control currently exists -- and because the attack patterns map directly to self-hosted deployments where you do have that control.


The incidents at a glance

Jun 2025 · CVE-2025-49596 · Oligo Security Malicious website via CSRF + 0.0.0.0 Day → RCE on dev machine via MCP Inspector AgentSH containment: Network binding policy; block unauthenticated stdio

Jul 2025 · CVE-2025-53110 · Cymulate Prefix-matched path in MCP Filesystem Server → File reads/writes outside allowed directories AgentSH containment: File rules enforced at syscall level

Jul 2025 · CVE-2025-53109 · Cymulate Symlink inside allowed directory → Arbitrary file access + code execution via LPE AgentSH containment: Symlink-aware file rules; deny credential paths

Aug 2025 · CVE-2025-54794 · Cymulate Path in crafted prompt → File reads outside Claude Code workspace AgentSH containment: File rules scoped to workspace paths

Aug 2025 · CVE-2025-54795 · Cymulate Confirmation-prompt bypass → Arbitrary shell commands AgentSH containment: Command allowlist; deny on inline shell

Aug 2025 · CVE-2025-54135 "CurXecute" · AIM Security Malicious MCP server response → RCE via .cursor/mcp.json rewrite + auto-run AgentSH containment: exec interception; MCP tool allowlist

Aug 2025 · CVE-2025-54136 "MCPoison" · Check Point Approved MCP config modified post-approval → Persistent RCE on every project open AgentSH containment: MCP version-pinning; config change detection

Sep 2025 · GHSA-ph6w-f82w-28w6 · Check Point Repository config at startup → Code execution before trust prompt AgentSH containment: deny on startup-spawned shells

Sep 2025 · Cursor Workspace Trust · Oasis Security .vscode/tasks.json in repo → Autorun shell on folder open AgentSH containment: exec interception + approval gate

Oct 2025 · CVE-2025-59536 · Check Point Repository-controlled startup path → Code execution before trust dialog AgentSH containment: deny on pre-trust exec paths

Oct 2025 · CVE-2025-59944 · Lakera Case-variant config filename → Silent MCP config overwrite → RCE AgentSH containment: Path normalization at policy layer

Nov 2025 · Claude Desktop MCP Extensions · Koi Security Injected prompt via web content → RCE via unsanitized command in extension AgentSH containment: By analogy -- same attack class; applies to self-hosted MCP deployments

Jan 2026 · CVE-2026-21852 · Check Point ANTHROPIC_BASE_URL in repo config → API key sent to attacker endpoint AgentSH containment: Env var lockdown + LLM proxy

Mar 2026 · Claudy Day · Oasis Security Hidden HTML in ?q= URL parameter → Files API exfiltration via allowed endpoint AgentSH containment: By analogy -- DLP + MCP tool denylist in SDK deployments


CVE-2025-49596: MCP Inspector CSRF + 0.0.0.0 Day RCE (Oligo Security, April 2025, fixed June 2025)

The MCP Inspector is a developer tool for testing and debugging MCP servers, widely used as the default inspector in local development setups. Oligo Security found that versions prior to 0.14.1 ran their proxy server without authentication, bound to all network interfaces at 0.0.0.0, and accepted arbitrary stdio commands from any source. CVSS 9.4.

The exploit chain: a developer visits a malicious website while running MCP Inspector locally. The site uses CSRF to send commands to the proxy at http://0.0.0.0:6277. Modern browsers' handling of 0.0.0.0 -- interpreting it as equivalent to localhost -- routes those requests to the developer's machine. The unauthenticated proxy executes them. The result is arbitrary code execution triggered by visiting a webpage, with no other interaction required.

This is the first attack vector in this post that doesn't require a malicious repository. The dev tooling itself is the attack surface. Any developer running MCP Inspector while browsing the web was exposed.

What execution-layer enforcement does: The network binding policy restricts which interfaces the inspector process can bind to, and process-level rules control what MCP Inspector can execute in response to incoming commands. Unauthenticated connections to localhost services are caught before they reach the stdio execution path:

network_rules:
  - name: restrict-inspector-binding
    domains: ["localhost", "127.0.0.1"]
    ports: [6277]
    decision: allow

  - name: deny-all-interface-binding
    domains: ["0.0.0.0"]
    decision: deny

command_rules:
  - name: require-auth-for-inspector-commands
    commands: ["mcp", "npx"]
    decision: approve
    message: "MCP command requested:  "

Fixed in MCP Inspector v0.14.1 (June 2025), which added session tokens and origin validation. The underlying 0.0.0.0 browser behavior remains unpatched in major browsers.


CVE-2025-53110 and CVE-2025-53109: Filesystem MCP Server sandbox escapes (Cymulate, June 2025, fixed July 2025)

Cymulate's research on the Anthropic Filesystem MCP Server found two vulnerabilities that let an attacker escape the server's declared allowed directories entirely -- without exploiting memory corruption or dropping external binaries. Both patched in 2025.7.1.

CVE-2025-53110 (CVSS 7.3): The server checked whether a requested path started with an allowed directory prefix using a naive string comparison. A path like /private/tmp/allowed_dir_escape passes the check for /private/tmp/allowed_dir and then reaches the filesystem outside the sandbox.

CVE-2025-53109 (CVSS 8.4): A symlink placed inside an allowed directory points anywhere on the filesystem. The server follows it without validation, granting full read/write access to arbitrary paths. With access to the right targets -- Launch Agents on macOS, cron jobs on Linux -- this escalates to arbitrary code execution without elevated privileges.

The Cymulate researcher who found CVE-2025-54794 in Claude Code explicitly noted finding the same naive prefix-matching flaw in both places. This is an architectural repeat, not a coincidence.

What execution-layer enforcement does: The file rules enforced at the syscall level do not care what the MCP Server's path validation decided. A open(2) outside /workspace is denied regardless:

file_rules:
  - name: workspace-only
    paths: ["/workspace/**"]
    operations: [read, write, create, delete]
    decision: allow

  - name: deny-credential-paths
    paths:
      - "/home/**/.ssh/**"
      - "/home/**/.aws/**"
      - "/**/Library/LaunchAgents/**"
      - "/etc/cron*"
    operations: ["*"]
    decision: deny

  - name: deny-outside-workspace
    paths: ["/**"]
    operations: [read, write, create, delete]
    decision: deny

Symlinks are irrelevant to this enforcement: the resolved path is what gets checked, not the link target.


CVE-2025-54794: Claude Code path restriction bypass (Cymulate, disclosed August 2025, fixed July 2025)

Cymulate found the same naive prefix-matching flaw in Claude Code's own path validation during its research preview. The check compared a requested path against the declared workspace using a simple string prefix match. A crafted directory name sharing the workspace prefix, combined with untrusted content in context, allowed access to files outside the intended scope. Fixed in v0.2.111. CVSS 7.7.

The key point: Claude Code's own path-validation logic was the enforcement point, and there was no independent layer beneath it. That this flaw appeared in the Filesystem MCP Server (CVE-2025-53110) and Claude Code (CVE-2025-54794) in the same month from the same researcher suggests a shared architectural pattern in how Anthropic's tools validated paths at the time.

What execution-layer enforcement does: Any access outside /workspace is denied at the syscall level regardless of how the model or the tool interpreted the path.

file_rules:
  - name: workspace-only
    paths: ["/workspace/**"]
    operations: [read, write, create, delete]
    decision: allow

  - name: deny-outside-workspace
    paths: ["/**"]
    operations: [read, write, create, delete]
    decision: deny

CVE-2025-54795: Claude Code command injection via confirmation-prompt bypass (Cymulate, July 2025, fixed August 2025)

The companion finding to CVE-2025-54794. Cymulate demonstrated that Claude Code's confirmation prompt before executing commands could be bypassed through prompt crafting, enabling arbitrary shell command execution. Fixed in v1.0.20. CVSS 8.7.

This is the cleaner illustration of the root-cause thesis: the tool's "ask before running" control was itself implemented as a prompt to the model. A sufficiently crafted input could suppress or bypass that prompt. The safety mechanism and the attack surface were the same thing.

What execution-layer enforcement does: An allowlist policy lets only explicitly permitted commands execute. Anything not on the list is denied before it runs, regardless of whether a confirmation dialog appeared.

command_rules:
  - name: allowed-commands
    commands: ["git", "python", "node", "cargo", "go", "make"]
    decision: allow

  - name: deny-inline-shell
    commands: ["sh", "bash", "zsh"]
    args_match: ["-c", "*"]
    decision: deny
    message: "Inline shell execution blocked."

  - name: deny-unknown
    commands: ["*"]
    decision: deny

CVE-2025-54135 "CurXecute" and CVE-2025-54136 "MCPoison": Cursor MCP execution (AIM Security + Check Point, disclosed August 2025, fixed Cursor v1.3.9)

Two separate research teams found two separate MCP execution vulnerabilities in Cursor at nearly the same time, both fixed in Cursor v1.3.9.

CVE-2025-54135 "CurXecute" (AIM Security, disclosed August 2025, CVSS 8.6): An external MCP server returns a response containing a malicious prompt. That prompt instructs the agent to write a file to .cursor/mcp.json. If that config file did not previously exist, Cursor with Auto-Run enabled executes the injected commands immediately -- no approval dialog, no consent. The attack originates from any MCP server Cursor connects to, not just the local filesystem.

CVE-2025-54136 "MCPoison" (Check Point, disclosed August 2025, CVSS 7.2): Once a user approves an MCP configuration, Cursor treats it as trusted indefinitely -- even after the file changes. An attacker commits a benign-looking .cursor/mcp.json to a shared repository, waits for a developer to pull it and approve it once, then replaces it with a malicious payload. On every subsequent project open, the malicious configuration executes with no further approval. The persistence is the point.

What execution-layer enforcement does: CurXecute is caught at the exec level -- the written config triggers a command that AgentSH intercepts before it runs. MCPoison is addressed by AgentSH's MCP version-pinning and tool allowlist, which detects configuration changes and requires re-approval regardless of what Cursor's own trust model decided:

mcp:
  version_pinning:
    enabled: true
    on_change: approve
    message: "MCP configuration changed since last approval: "

  tool_whitelist:
    - "read_file"
    - "list_directory"
  tool_denylist:
    - "write_file"
    - "execute_code"
    - "run_shell"

A modified config triggers an approval gate before any tool from that server can run. The attacker's payload does not execute silently on next open.


Cursor Workspace Trust RCE (Oasis Security, September 2025)

Cursor ships with Workspace Trust disabled by default. A .vscode/tasks.json containing runOn: "folderOpen" executes its commands silently the moment a developer opens the folder -- no prompt, no consent. A malicious repository includes this file. Developer clones repo, opens it in Cursor, code runs.

This is structurally the same as a booby-trapped document that executes macros on open, but targeted at developers whose machines carry cloud keys, PATs, and live SaaS sessions. The AI coding assistant is incidental to this one; the bug is in the IDE's task runner defaults. Notably, Cursor's own response to the disclosure was that enabling Workspace Trust disables AI and other Cursor features -- so the intended mitigation conflicts with the product's core value proposition.

What execution-layer enforcement does: Every exec() passes through the policy engine before it runs. A task runner spawning a shell gets an approval gate:

command_rules:
  - name: require-approval-for-shell
    commands: ["sh", "bash", "zsh", "fish", "pwsh"]
    decision: approve
    message: "Shell execution requested:  "
    timeout: 60s

The autorun fires, tries to spawn a shell, and blocks on a human approval prompt. The event is logged regardless of the decision. Nothing executes silently.


Check Point's Claude Code findings: repository-controlled startup execution (reported July–October 2025; advisories published September 2025–January 2026)

Check Point reported a cluster of vulnerabilities through multiple disclosures between July and October 2025, all rooted in the same pattern: Claude Code reads configuration from repository-controlled files and acts on it before showing the user a trust prompt. The public advisories separate three distinct issues:

GHSA-ph6w-f82w-28w6 (fixed v1.0.87, published September 2025): The startup warning was insufficiently explicit that trusting the folder would allow Claude Code to execute files in that directory without further confirmation. Repository-controlled behavior could proceed without adequate trust enforcement.

CVE-2025-59536 (CVSS 8.7, fixed v1.0.111, published October 2025): Repository-controlled startup paths could cause code to execute before the user accepted the startup trust dialog. Check Point's broader research cluster identified multiple mechanisms -- hooks, MCP server definitions, and environment variables -- through which this startup behavior could be triggered, but the CVE advisory describes the core issue as pre-trust-dialog code execution.

CVE-2026-21852 (CVSS 5.3, fixed v2.0.65, published January 2026): A malicious repository sets ANTHROPIC_BASE_URL to an attacker-controlled endpoint. Claude Code reads that configuration and starts issuing authenticated API requests -- including requests carrying the user's API key -- before any trust prompt appears. The key leaves the machine before the user is asked anything.

The common thread across all three: developers treat configuration files as metadata. They are executable code. A single compromised commit in an enterprise repository can affect every developer who clones it.

What execution-layer enforcement does: Startup-spawned shells are caught at execve. Environment variable overrides are stripped before they reach child processes. API traffic is routed through the embedded LLM proxy regardless of what ANTHROPIC_BASE_URL says:

env_rules:
  - name: lock-api-urls
    keys: ["ANTHROPIC_BASE_URL", "OPENAI_BASE_URL"]
    decision: deny

proxy:
  mode: embedded
  providers:
    anthropic: https://api.anthropic.com

The repository configuration sets ANTHROPIC_BASE_URL=https://attacker.example.com. That variable is stripped before it reaches the Claude Code process. API requests route through the proxy to api.anthropic.com. The key never reaches the attacker's endpoint.


CVE-2025-59944: Cursor case-sensitivity config overwrite (Lakera, October 2025, fixed Cursor v1.7)

Cursor's confirmation prompt for modifications to protected files like .cursor/mcp.json used a case-sensitive string comparison. On macOS and Windows -- both case-insensitive filesystems by default -- creating .cUrSoR/mcp.json bypassed the check entirely. To the OS, it was the same file. To Cursor, it was a new one, requiring no approval. The malicious config loaded silently on next open.

This is a small implementation detail with outsized consequences in an agentic IDE. The check deciding which commands can run and which plugins start failed when the filesystem's case rules differed from the application's. It's also an example of how prompt injection doesn't need to come through the model -- any write path the agent controls is a potential injection point.

What execution-layer enforcement does: AgentSH's path normalization operates at the syscall layer using the OS's own resolved paths, not application-level string comparisons. A file write to .cUrSoR/mcp.json and a file write to .cursor/mcp.json resolve to the same inode and are treated identically by policy:

file_rules:
  - name: protect-cursor-config
    paths: ["**/.cursor/mcp.json", "**/.vscode/tasks.json"]
    operations: [write, create]
    decision: approve
    message: "AI IDE config modification: "

The case variant doesn't bypass this rule. The approval gate fires regardless.


Claude Desktop MCP Extension RCE (Koi Security, November 2025, fixed Claude Desktop v0.1.9)

Koi Security found that three official Anthropic extensions for Claude Desktop -- Chrome, iMessage, and Apple Notes -- were vulnerable to unsanitized command injection. Web content or document content that Claude processed could contain malicious instructions. Claude, acting in good faith, executed them.

The critical detail: while Chrome browser extensions run in a sandboxed process, Claude Desktop extensions run fully unsandboxed on the user's device with full system permissions. They are not lightweight plugins. They are privileged executors bridging the LLM and the operating system. A successfully injected prompt had access to SSH keys, AWS credentials, and local secrets, and the ability to run arbitrary commands.

What execution-layer enforcement does: Credential paths and unauthorized outbound connections can be blocked regardless of what the injected prompt instructs. This is also where a domain allowlist is not enough: as the Claudy Day section shows, an allowed endpoint can itself be an exfiltration channel. Domain allowlists are necessary but not sufficient. You also need content-aware controls on what crosses those connections.

file_rules:
  - name: deny-credentials
    paths:
      - "/home/**/.ssh/**"
      - "/home/**/.aws/**"
      - "/**/Library/Keychains/**"
      - "/home/**/.config/gcloud/**"
    operations: ["*"]
    decision: deny

Claudy Day: Claude.ai prompt injection to data exfiltration (Oasis Security, March 2026)

Oasis chained three bugs into a complete attack pipeline targeting Claude.ai users, with no integrations required.

Bug 1 (fixed, March 2026): Claude.ai accepts pre-filled chat prompts via ?q= URL parameters. HTML tags embedded in that parameter are invisible in the text box but processed by Claude when the user hits Enter. Hidden instructions -- including data extraction commands and an attacker-controlled API key -- execute silently.

Bug 2 (being addressed): Claude's code execution sandbox restricts outbound connections to most destinations, but allows connections to api.anthropic.com. That allowed endpoint is the exfiltration channel. The injected prompt instructs Claude to read conversation history, write it to a file, and upload it to the attacker's Anthropic account via the Files API. No external infrastructure. No custom tooling. Just capabilities that ship out of the box.

Bug 3 (being addressed): claude.com/redirect/<target> redirected to arbitrary third-party domains. Wrapped in a Google Ad, this delivered the injection URL as a search result indistinguishable from the real thing.

This is where the domain-allowlist point from the Claude Desktop section lands with full force. Allowing api.anthropic.com sounds safe. But the Files API lives on api.anthropic.com, and it can upload arbitrary data to any Anthropic account. A domain allowlist tells you nothing about what is being sent or whose credentials are driving it.

Whether the API allowlist architecture is permanently structural is Anthropic's call to make -- the public disclosure characterizes Bug 2 as "currently being addressed." The architectural observation stands independent of the fix timeline: allowing a network destination is not the same as authorizing what flows across it.

What execution-layer enforcement does in deployments you control: The Claudy Day surface is Claude.ai, which users cannot wrap with AgentSH. But the attack class -- prompt injection into an agent with tool access -- applies directly to self-hosted and SDK-based agent deployments.

For those environments, DLP strips API keys from outbound requests before they reach any provider:

dlp:
  mode: redact
  patterns:
    api_keys: true
  custom_patterns:
    - name: anthropic-key
      regex: "sk-ant-[A-Za-z0-9\\-_]{40,}"
      display: "[REDACTED_API_KEY]"

And MCP tool policy limits what an injected prompt can invoke regardless of what the model decides:

mcp:
  tool_whitelist:
    - "read_file"
    - "list_directory"
  tool_denylist:
    - "write_file"
    - "create_file"
    - "send_message"
    - "execute_code"

An injected prompt that gains control of an agent with connected MCP servers can only do what the tool policy allows. Write operations, message sending, and code execution are not on that list.


The pattern

Fourteen incidents across twelve months, reported by multiple research teams -- Oligo, Cymulate, AIM Security, Check Point, Oasis, Lakera, and Koi -- targeting six products and surfaces. In each case:

  1. Attacker delivers malicious content -- a URL, a config file, a cloned repo, a document, a web page, a CSRF request
  2. The AI tool processes that content in good faith
  3. The tool takes real actions with real consequences

Once untrusted input reached a privileged path, the product followed it. The problem is that in each case, "what the tool was told" came from an attacker, and nothing independent of the tool governed what happened next.

You cannot rely on the model -- or the IDE, or the MCP server -- to faithfully enforce the rules written for it. Confirmation prompts, workspace declarations, system instructions, trust dialogs, case-sensitive filename checks: these are all controls implemented in the same layer the attacker targeted. A sufficiently crafted input can suppress, bypass, or override them.

You need a separate enforcement point that decides whether the resulting file, process, and network actions are permitted -- without asking the model. That is the execution layer.


AgentSH

AgentSH is an open-source execution-layer security gateway for AI agents. It sits under your agent and its tooling, intercepting file, network, and process activity at the syscall level, enforcing the policy you define, and emitting structured audit events.

# Wrap Claude Code in an enforcement session
agentsh shim install-shell --root / --bash

SID=$(agentsh session create --workspace . --policy agent-sandbox | jq -r .id)
agentsh exec "$SID" -- claude

Every file access, subprocess, and network connection Claude Code attempts inside that session passes through the policy engine. The policy generation workflow lets you profile a legitimate run and lock future runs to observed behavior:

agentsh policy generate latest --output=claude-code-policy.yaml
agentsh session create --workspace . --policy claude-code-policy.yaml

The starter policy packs -- dev-safe, ci-strict, and agent-sandbox -- cover the common deployment scenarios. agent-sandbox is the right starting point for Claude Code and similar tools: default deny, explicit allowlist, approval gates on credential paths, and network restricted to declared domains.

Most of the specific bugs in this post were patched or mitigated. The architectural pattern remains. Patching fixes specific instances of the failure mode. It does not change the architecture. As long as the enforcement point lives in the same layer that processes untrusted input, an attacker who controls the input can control the enforcement.

AgentSH is at github.com/canyonroad/agentsh.

← All posts

Built by Canyon Road

We build Beacon and AgentSH to give security teams runtime control over AI tools and agents, whether supervised on endpoints or running unsupervised at scale. Policy enforced at the point of execution, not the prompt.

Contact Us →
Learn the category: Execution-Layer Security → See examples: Use Cases →