Tech Sentinel
deep-dive

AI Agents Are Rewriting the Threat Model, and Most Security Teams Aren't Ready

Three incidents in three months — Clinejection, the FortiGate campaign, and the OpenClaw exposure wave — show how autonomous AI assistants are collapsing the boundaries security programs were built around.

By Theo Voss · · 8 min read

The last ninety days have produced three separate public incidents that, taken together, mark the point where AI assistants stopped being a hypothetical security problem and became an operational one. Brian Krebs cataloged the trajectory in a March 2026 piece that quoted pentesters, AI safety leads, and incident responders all making the same observation: the controls and threat models most organizations rely on were not designed for software that reads, decides, and acts on its own.

The implication is uncomfortable. Defenders spent two decades hardening the boundary between data and code, between trusted insiders and outsiders, between authenticated users and the systems they touch. Agentic AI dissolves all three boundaries simultaneously. The Cline supply chain compromise, the AI-augmented FortiGate campaign documented by Amazon, and the spreading wave of misconfigured OpenClaw deployments are not three different stories. They are three faces of the same shift.

Clinejection: when a triage bot becomes a publish pipeline

The cleanest illustration of the new threat model is the Cline CLI compromise documented by Snyk. Cline is a popular AI coding assistant distributed via npm and the VS Code Marketplace. Its maintainers wired Claude into a GitHub Actions workflow that triaged incoming issues. That workflow had read access to the repository and write access to issue comments. A reasonable, unremarkable design.

Security researcher Adnan Khan disclosed the underlying flaw on February 9, 2026. The crafted issue title contained natural-language instructions that the triage bot dutifully followed: run npm install github:cline/cline#aaaaaaaa against an attacker-controlled fork. That fork’s package.json carried a malicious preinstall script that exfiltrated ANTHROPIC_API_KEY and dropped a tool called Cacheract on the runner. Cacheract flooded the shared GitHub Actions cache with more than ten gigabytes of junk, evicting legitimate entries and replacing them with poisoned ones. The nightly publish workflow, running with high-privilege credentials, restored the poisoned cache and handed over VSCE_PAT, OVSX_PAT, and NPM_RELEASE_TOKEN.

Eight days later, on February 17, an unauthorized party used the stolen npm token to push [email protected] at 3:26 AM PT. The package contained a postinstall script that ran npm install -g openclaw@latest, silently installing an autonomous AI agent on every machine that updated during the roughly eight-hour window before the publish was caught and reversed. The Hacker News reported that StepSecurity counted around 4,000 affected installs.

Read carefully, none of the individual primitives in this attack are novel. Prompt injection in untrusted text has been documented for two years. GitHub Actions cache poisoning was a known class of vulnerability before the AI era. Postinstall hooks are the oldest npm threat model in the book. What is new is the chaining: an attacker no longer needs a foothold inside a CI pipeline to corrupt it. Filing a public issue is enough, because there is now an LLM with shell privileges sitting between the issue tracker and the build runner, willing to interpret an issue title as an instruction.

The FortiGate campaign: AI as a force multiplier for unsophisticated actors

If Clinejection shows AI agents as the target, the AWS-disclosed FortiGate campaign shows AI assistants as the weapon. Between January 11 and February 18, a Russian-speaking, financially motivated actor compromised more than 600 FortiGate appliances across 55 countries. Amazon’s threat intelligence team described the operator as low-to-medium baseline skill, “significantly amplified by AI.”

The technical novelty is again zero. The attacker scanned management interfaces exposed on the usual suspect ports — 443, 8443, 10443, 4443 — and tried known-bad credentials with single-factor authentication. Anyone running a modern asset inventory could have built that target list. The novelty is operational tempo. The actor used at least two commercial LLMs, including Anthropic’s Claude and DeepSeek, to generate attack plans, write reconnaissance tooling in multiple languages, automate scan orchestration, and produce English-language reporting on results. Post-exploitation, they pivoted to Active Directory DCSync attacks and targeted Veeam backup servers — telltale ransomware preparation.

The attribution detail worth dwelling on is “low-to-medium skill.” A decade of threat intelligence has trained defenders to map sophistication to intent: nation-state work looked like nation-state work, and crimeware looked sloppy. AI assistants flatten that gradient. A single operator with a credit card and two API keys can now sustain a multi-country campaign with the operational hygiene of a mid-tier APT. The defensive corollary is that “this looks too organized to be financially motivated” is no longer a useful triage heuristic.

OpenClaw and the misconfiguration wave

The third strand is the one that worries SOC leaders the most, because it is happening across thousands of unrelated organizations with no central attacker to indict. OpenClaw — the open-source autonomous agent formerly known as ClawdBot and Moltbot — passed 135,000 GitHub stars in the months after its November 2025 release. Krebs quoted DVULN founder Jamieson O’Reilly describing exposed OpenClaw web interfaces that let anyone “pull the full conversation history across every integrated platform” and “manipulate what the human sees.” Barracuda’s writeup catalogs tens of thousands of internet-facing instances and a “ClawJacked” technique allowing arbitrary websites to silently issue commands to a local agent.

This is the shape of every cloud-misconfiguration crisis of the last fifteen years — open S3 buckets, open Elasticsearch, open MongoDB, open Redis — except the exposed surface is no longer a database. It is an agent with a shell, a browser, and active sessions to the user’s email, calendar, source control, and chat. The same pattern of a fast-growing tool outrunning its security defaults plays out one more time, but the blast radius per misconfiguration is qualitatively different. A leaky S3 bucket leaks data. A leaky agent leaks data, then takes actions, then writes those actions back into the systems it has access to.

Why the old playbook misses

Simon Willison’s “lethal trifecta” framing explains why patching individual incidents will not close the gap. Willison’s argument: any agent that combines access to private data, exposure to untrusted content, and the ability to communicate externally is exploitable, because the LLM cannot reliably distinguish operator instructions from instructions embedded in the content it processes. “If your agent combines these three features, an attacker can easily trick it into accessing your private data and sending it to that attacker.”

Almost every useful agent deployment satisfies all three conditions by design. A coding assistant that reads issues, writes code, and pushes commits is the trifecta. An email assistant that summarizes inbox content and drafts replies is the trifecta. A research agent that browses the web on a knowledge-base-connected workstation is the trifecta. Removing any leg breaks the use case.

That is why control frameworks built around boundary enforcement — least privilege, network segmentation, signed packages, code review — degrade rather than fail when applied to agents. They still help, but each control assumes a meaningful distinction between the trusted operator and the untrusted input, and the LLM is the boundary that erases that distinction. Prompt injection is not a bug in a specific model. It is a property of treating natural-language text as code.

Original analysis: the insider-threat reframe

The framing the industry has settled on is that agents are a new vulnerability class, somewhere between web applications and supply chain dependencies. We think that framing is wrong, or at least incomplete. The closer analog is insider threat.

Treating an agent like a vulnerable application leads defenders toward input validation, prompt-injection filters, and content firewalls. Those help at the margin. They do not address the core problem, which is that the agent has been delegated authority to act on behalf of the user, holds the user’s credentials, and operates inside the user’s trust boundary. When the agent does something destructive — Krebs cited a Meta AI safety lead whose OpenClaw “speedran deleting” her inbox despite being told to confirm first — the post-incident question is not “what input slipped through” but “what did this principal do, with what authority, and why did our controls trust it?”

That is the insider-threat playbook. It points at controls the appsec playbook does not: behavioral baselining of agent activity, separation of duties between agents and the high-privilege actions they invoke, dual-control on irreversible operations, time-bounded credentials, and detective controls watching for unusual access patterns regardless of whether the request was “valid.” It also points at organizational controls — standing inventory of which agents are deployed where, with what permissions, owned by whom — that almost no enterprise has for AI tooling today.

The counter-argument is that insider-threat programs are notoriously expensive and rarely catch the insider. True, but they catch insiders for the same reason they would catch agents: insiders rarely abuse their privileges in ways that look exactly like ordinary work. Agent compromise via prompt injection produces sequences of actions — install a global package, write to a credential store, exfiltrate to a domain the user has never visited — that are individually plausible and collectively anomalous. UEBA-style telemetry is suited to that signal. Static prompt filtering is not.

The reframe also clarifies an architectural choice that vendors are quietly making. Anthropic, Microsoft, and the open-source agent communities are converging on constrained execution environments for agentic work — sandboxed tool servers, scoped tokens, confirmation gates, audit logs. Those are insider-threat controls in everything but name. The vendors that ship them by default will be the ones whose customers are not in the next Krebs column. For more on the offensive side of this, our sister publication aisec.blog tracks prompt-injection technique development and the cat-and-mouse with mitigations, and ai-alert.org maintains a running ledger of agent and ML incident disclosures.

What defenders should do this quarter

A few things are concrete enough to act on now, regardless of which agent platform an organization has adopted.

Inventory agentic deployments the same way you inventory service accounts. Identify which agents are running, who owns them, what credentials they hold, what tool servers they can call, and whether they can act without human-in-the-loop confirmation. Most organizations cannot answer those questions today. Agents are appearing through developer-driven channels — VS Code extensions, Homebrew, npm globals — that bypass the procurement controls used for SaaS.

Treat agent credentials as crown jewels. The Cline attack pivoted on three publishing tokens stored in CI. None of them needed to be long-lived, and none of them needed to be reachable from a triage workflow. Short-lived, scope-narrowed credentials with workload identity rather than static secrets remove most of the post-prompt-injection blast radius. AWS, Google, and GitHub all support OIDC-federated tokens for CI; use them.

Segment agent execution from the workstation. Barracuda’s first recommendation — “run them only in dedicated virtual machines or containers, never on standard workstations” — is unfashionable but correct. Agents that need filesystem and shell access need their own ephemeral environments. The cost is workflow friction. The benefit is that prompt injection that says “delete my Documents folder” deletes a container’s empty Documents folder.

Add detective controls to the agent’s tool layer, not the model. Logging at the LLM-prompt layer is too noisy and too late. Logging at the tool invocation layer — every shell command run, every package installed, every URL fetched, every email sent — gives you incident-quality evidence and a place to attach behavioral rules. The early reference architectures from Microsoft (Copilot enterprise audit logs) and from open-source projects in the Model Context Protocol ecosystem are heading in this direction; adopt the pattern even if your stack predates the standard.

Re-examine your supply chain assumptions. If a developer running an AI assistant can install a package, then so can the AI assistant, with no human in the loop. Pinning, allowlisting, and verified registries — the same controls argued for after the SolarWinds and xz-utils incidents — apply double when an LLM is the requester. The Clinejection attack chain worked because every link assumed the previous link was trustworthy human input.

The kicker

What makes this moment uncomfortable for security leaders is not that AI agents are uniquely dangerous. They are not. Each individual failure mode — injection, supply chain, misconfiguration, insider abuse — is older than most SOC analysts on shift tonight. What is new is the rate at which agent adoption is compressing all of those failure modes into the same delivery vehicle, in the same calendar quarter, inside organizations that were already short-staffed.

The first wave of agent incidents has been mostly about embarrassment and inconvenience: a deleted inbox, a brief malicious package window, a campaign that got caught. The next wave will not be. The combination of low-skill operators wielding capable AI tooling, defender programs sized for last year’s threat model, and standing agent footholds inside trusted networks is the same combination that produced the ransomware wave of 2019–2021. We have a narrow window to apply the lessons of that wave before the asymmetry tilts further.

Sources

Sources

  1. How AI Assistants are Moving the Security Goalposts — Krebs on Security
  2. How 'Clinejection' Turned an AI Bot into a Supply Chain Attack — Snyk
  3. AI-augmented threat actor accesses FortiGate devices at scale — AWS Security Blog
  4. The lethal trifecta for AI agents — Simon Willison
  5. OpenClaw security risks: What security teams need to know about agentic AI — Barracuda
  6. Cline CLI 2.3.0 Supply Chain Attack Installed OpenClaw on Developer Systems — The Hacker News
#agentic-ai #prompt-injection #supply-chain #threat-intel #ai-security #deep-dive
Subscribe

Tech Sentinel — in your inbox

Cybersecurity news, daily — breaches, CVEs, ransomware, threat actors, and the patches that matter. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments