LLM Security Risks: The 10 Vulnerabilities Every Security Team Must Address
A practitioner's breakdown of the OWASP Top 10 LLM security risks for 2025 — from prompt injection and data and model poisoning to excessive agency and misinformation — with actionable defense controls.
Large language models now sit inside corporate chat assistants, autonomous code reviewers, customer-facing support bots, and security operations tools — and the attack surface they introduce is poorly understood by most defenders still calibrated to traditional application threats.
The OWASP Top 10 for Large Language Model Applications, updated for 2025 by a community of more than 600 security experts, provides the clearest public taxonomy of LLM security risks to date. This article works through the most operationally significant entries and maps the controls that actually reduce exposure.
Prompt Injection: The Defining LLM Threat
Prompt injection holds the top position — LLM01 — in the OWASP Top 10 for LLM Applications ↗ for a reason: it is the mechanism through which nearly every other category of harm is triggered. An attacker crafts a natural-language input — delivered through a user chat field, a retrieved document, a web page the LLM browses, or a tool response — that overrides the model’s original system instructions.
Attack success rates range from 50 to 84 percent depending on system configuration, according to recent benchmark data. Direct injection targets the user-facing interface. Indirect injection is more dangerous: malicious instructions are embedded in content the model retrieves autonomously — a document in a RAG corpus, a web page fetched by an agent, metadata in an uploaded file. The model reads the payload and executes it without any user interaction.
In agentic deployments, the consequences are proportionally worse. A prompt-injected agent with tool access can exfiltrate data, trigger API calls, send emails, or modify files before any human sees the output. The MITRE ATLAS framework ↗, updated to 16 tactics and 84 techniques as of late 2025, now explicitly covers agentic AI technique chains, reflecting how quickly this attack surface has matured.
The primary mitigations are privilege separation (give agents the minimum tool access needed), contextual output inspection (validate what the model is about to do, not just what it says), and treating retrieved content as untrusted input from a potential adversary.
For a deeper technical treatment of prompt injection variants — including jailbreaks and agent exploitation — aisec.blog ↗ maintains ongoing coverage of offensive technique developments.
Data and Model Poisoning and Supply Chain Compromise
LLM04 (Data and Model Poisoning) and LLM03 (Supply Chain) are closely related and increasingly targeted together. Poisoning attacks inject malicious or biased samples into training, fine-tuning, or embedding data to corrupt model behavior — either broadly, degrading output reliability, or through targeted backdoors that activate on specific trigger phrases.
The OWASP 2025 guidance ↗ notes that supply chain risk extends beyond the base model to include third-party plugins, fine-tuning datasets, vector database providers, prompt templates, and the inference infrastructure itself. An organization that carefully evaluates a base model but ingests fine-tuning data from an unvetted source has inherited an opaque risk.
Concrete controls for both: require a software bill of materials (SBOM) for all AI components; audit dataset provenance before any fine-tuning run; pin model versions and validate hashes on load; isolate inference environments from production databases; apply code-level access controls to the vector store powering any RAG system.
Sensitive Information Disclosure and Improper Output Handling
LLM02 (Sensitive Information Disclosure) covers the failure mode where models disclose confidential content — training data, PII, or API credentials — through their responses. This is not theoretical. Membership inference attacks have recovered verbatim training examples from production models, and the closely related system prompt extraction problem (tracked separately in 2025 as LLM07, System Prompt Leakage) remains a persistent issue across commercial LLM APIs.
LLM05 (Improper Output Handling) is the adjacent problem: the model’s output is routed directly into a downstream component — a SQL engine, a shell, a web renderer — without validation. Classic injection categories (SQL injection, cross-site scripting, remote code execution) re-emerge when LLM output is treated as trusted data.
SentinelOne’s LLM security guidance ↗ recommends output scanning layers that block PII leakage and flag data exfiltration patterns before responses reach end users. For teams monitoring model behavior in production, sentryml.com ↗ covers drift detection and anomalous-output monitoring that can surface these disclosure patterns early.
Excessive Agency: The Autonomous Action Problem
LLM06 (Excessive Agency) addresses what happens when LLMs are granted capabilities disproportionate to the task: the ability to read mailboxes, write to databases, call external APIs, or spawn subprocesses. When paired with a prompt injection attack, over-permissioned agents take unintended actions at machine speed.
The 2025 OWASP list was significantly revised to reflect agentic architectures that barely existed when the original 2023 list shipped. The guidance is specific: apply least privilege to tool grants, require human-in-the-loop confirmation for consequential actions, and log all tool invocations to an immutable audit trail that the model cannot modify.
LLM09 (Misinformation) is the human-factors complement: the 2025 entry that reworked the 2023 “Overreliance” category to center on confident, fabricated, or unverified model output. Security teams that accept LLM-generated analysis — threat reports, vulnerability assessments, code reviews — at face value without independent verification are introducing a new class of decision-making risk into their workflow.
What Defenders Should Do
Five actionable priorities, ordered by impact:
-
Audit LLM tool access now. Map every permission granted to each LLM-based application or agent. Remove anything not strictly required. Document what remains and who authorized it.
-
Treat LLM input as untrusted, always. Implement input validation — layered pattern matching plus adversarial prompt classifiers — on every channel through which data reaches an LLM, including retrieved documents in RAG pipelines.
-
Scan outputs before they reach downstream systems. Block model output from flowing directly to SQL, shell, or HTML rendering without structured validation. Treat LLM responses the same way you treat user-supplied input.
-
Inventory the AI supply chain. Enumerate base models, plugins, fine-tuning datasets, and vector database providers. Apply the same vendor risk management process used for software dependencies.
-
Log tool invocations in agentic systems. Ensure any autonomous action taken by an LLM is recorded in a tamper-evident log, with enough context to reconstruct the input that triggered it.
For incident and vulnerability tracking as new LLM exploits emerge, ai-alert.org ↗ maintains a running tracker of disclosed AI security incidents, including jailbreak disclosures and ML-specific CVEs.
Sources
-
OWASP Top 10 for Large Language Model Applications ↗ — the canonical community-maintained vulnerability taxonomy for LLM deployments, covering all 10 risk categories referenced in this article.
-
OWASP Top 10 for LLM Applications 2025 — Gen AI Security Project ↗ — the 2025 update, significantly revised to cover agentic architectures, RAG-specific risks, and expanded supply chain scope.
-
What Is LLM Security? — SentinelOne ↗ — practitioner-oriented breakdown of LLM attack vectors and the six core defense domains, with coverage of model extraction and vector database isolation.
-
MITRE ATLAS: AI Security Framework — Vectra AI ↗ — overview of the MITRE ATLAS framework (v5.1.0, 16 tactics, 84 techniques, 42 case studies), which maps adversarial AI techniques to the ATT&CK-style taxonomy now used across enterprise threat modeling.
Sources
Tech Sentinel — in your inbox
Cybersecurity news, daily — breaches, CVEs, ransomware, threat actors, and the patches that matter. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
A CVSS 10.0 SD-WAN Bypass and What Emergency Directive 26-03 Signals
CVE-2026-20182 is a maximum-severity authentication bypass in Cisco Catalyst SD-WAN, added to CISA's KEV catalog on May 14 amid active exploitation. The deeper story is what the emergency directive says about edge infrastructure as a target.
LLM Security Risks: The Top Threats to Language Models in 2025
Prompt injection, data poisoning, excessive agency, and system prompt leakage — a practitioner breakdown of the LLM security risks catalogued by OWASP and NIST for 2025 deployments.
This Month in Security: May 2026's Edge-Device Reckoning
A roundup of May 2026's verified security developments: a CVSS 10.0 Cisco SD-WAN bug under active exploitation, an exploited Exchange XSS flaw, a critical Exim use-after-free, and a rare zero-day-free Patch Tuesday.