Machine Learning Security: Key Threats, Attack Types, and Defenses
Machine learning security covers adversarial attacks, data poisoning, model theft, and supply chain risks targeting ML systems. Here is what practitioners need to know.
Machine learning security is no longer a research boundary problem. As ML models move into fraud detection, malware classification, autonomous threat response, and access control, they inherit a threat surface that traditional application security was not designed to handle. Attackers are not just targeting the applications that run on ML systems — they are targeting the models themselves.
NIST formalized this landscape in March 2025 with the release of AI 100-2e2025 ↗, an updated taxonomy covering adversarial machine learning attacks and mitigations across both predictive and generative AI systems. The document establishes common terminology across evasion, poisoning, privacy, and misuse attack categories — a signal that the field has matured enough to need shared vocabulary for compliance and governance purposes.
The Core Attack Types
Data poisoning is the attack most organizations are least prepared to detect. An adversary injects malicious samples into training data before or during the model training cycle, causing the resulting model to learn incorrect patterns. In a spam filter, poisoned training data can carve out specific senders as perpetually clean. In a network intrusion detection system, it can train the model to ignore particular traffic signatures. The attack happens before the model is deployed, which means it produces no runtime anomalies that monitoring tools can catch.
Evasion attacks hit deployed models at inference time. The attacker modifies input data — often in ways imperceptible to humans — to cause a misclassification. In image recognition, the classic example is adversarial perturbations: pixel-level changes that flip a model’s prediction with high confidence. In network security, the equivalent is packet manipulation: carefully adjusted timing, size, or encoding that makes malicious traffic appear normal to an ML-based detection system. ISACA’s 2025 analysis ↗ of adversarial ML in cybersecurity contexts notes that evasion is the most operationally immediate threat to ML-based security tooling.
Model extraction allows an adversary to replicate a proprietary model by querying it systematically and training a surrogate on the outputs. The surrogate does not need to be identical — it needs to be close enough to probe for weaknesses. In practice, extracted models are used to develop evasion inputs offline, where there are no rate limits or detection mechanisms, then replay those inputs against the original system. This two-stage attack pattern is increasingly common against production fraud detection APIs.
Membership inference is a privacy-layer attack rather than a reliability attack: the adversary determines whether a specific data record was part of a model’s training set. For healthcare, financial, or HR applications trained on real user data, successful membership inference leaks information about individuals whose data was used without directly breaching a database. The attack exploits the tendency of models to behave with subtly higher confidence on training samples than on novel inputs.
Backdoor attacks embed hidden triggers during training that cause specific behavior — often misclassification — when that trigger appears in a future input. A facial recognition system with a backdoor might grant access to anyone wearing a specific pattern. The model behaves normally on all other inputs; the trigger activates only when the attacker chooses. Detecting backdoors requires access to training data provenance and dedicated testing with synthetic trigger patterns, neither of which is standard practice in most MLOps pipelines.
See adversarialml.dev ↗ for current research on attack techniques and defenses across these categories.
ML Pipeline Security: Where Vulnerabilities Accumulate
The attack surface for machine learning systems extends well beyond the model itself. The Cloud Security Alliance’s 2025 analysis ↗ of ML pipeline threats identifies five distinct stages where security controls are typically weak: data ingestion, training infrastructure, model storage, deployment APIs, and monitoring systems.
Supply chain exposure is particularly acute. Modern ML pipelines depend on pre-trained foundation models, open-source libraries, third-party datasets, and hosted inference endpoints. A single compromised dependency — a tampered model checkpoint downloaded from a public registry, a poisoned dataset hosted on a shared storage bucket, a malicious package update in a training framework — can propagate corruption across every model trained on that infrastructure.
Access control fragmentation compounds the problem. A typical MLOps environment spans data warehouses, training clusters, model registries, feature stores, and serving infrastructure, often with different identity systems and permission models for each. Misconfigurations in any layer create paths for lateral movement that would not exist in a more unified application stack.
Monitoring gaps are the default, not the exception. Traditional security tooling monitors application behavior at the network and endpoint layers. It does not monitor model prediction distributions, feature drift, or decision boundary shifts — the signals that would indicate an active evasion campaign or a poisoning event taking hold. Teams without ML-specific observability have no baseline from which to detect that something has changed. Sentryml.com ↗ covers the MLOps tooling and monitoring practices that address this gap.
Defensive Posture: What Works
Adversarial testing before deployment is the most direct control available. Red teams with ML expertise can generate evasion inputs, test model behavior under distribution shift, and probe for trigger-based backdoors using published attack libraries. This is not a one-time exercise; models retrained on updated data inherit new vulnerabilities that require retesting.
Input sanitization — feature squeezing, input transformation, and anomaly scoring on incoming requests — reduces the attack surface for evasion. No sanitization scheme is complete, but raising the cost of successful evasion meaningfully reduces opportunistic attacks.
Data provenance tracking makes poisoning detectable after the fact and harder to execute without detection. Logging the origin, timestamp, and hash of every training sample, and auditing that log before each training run, creates accountability for what the model learned from.
Strict access control across the entire ML pipeline — unified identity, least-privilege service accounts, and separation between training and serving environments — limits blast radius if any component is compromised.
Mlcves.com ↗ maintains a database of documented ML vulnerabilities and CVEs, a useful reference when assessing which attack patterns have been exploited against real deployed systems.
The NIST AI 100-2e2025 taxonomy is the current authoritative reference for teams building a machine learning security program. It is voluntary guidance, not a compliance requirement, but it provides the shared terminology that makes risk discussions across security, data science, and legal functions tractable.
Sources
-
NIST AI 100-2e2025: Adversarial Machine Learning — A Taxonomy and Terminology of Attacks and Mitigations ↗ — Published March 2025, this is the current authoritative NIST taxonomy covering evasion, poisoning, privacy, and misuse attacks for both predictive and generative AI systems, with corresponding mitigation guidance.
-
The Hidden Security Threats Lurking in Your Machine Learning Pipeline ↗ — Cloud Security Alliance analysis of ML-specific vulnerabilities across data ingestion, training, deployment, and monitoring stages, with detail on supply chain and access control risks.
-
Combating the Threat of Adversarial Machine Learning to AI-Driven Cybersecurity ↗ — ISACA’s practitioner-focused breakdown of evasion, poisoning, prompt injection, and model extraction attacks against cybersecurity ML systems, with multi-layered defense recommendations.
Sources
Tech Sentinel — in your inbox
Cybersecurity news, daily — breaches, CVEs, ransomware, threat actors, and the patches that matter. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Generative AI Risks: A Practitioner's Guide to What Actually Matters
From prompt injection to supply chain poisoning, the generative AI risk landscape is broader than most security teams realize. Here is what the frameworks say and what attackers are doing.
ChatGPT Security: Key Risks, Real Vulnerabilities, and Enterprise Controls That Work
From DNS-based data exfiltration and command injection in Codex to credential theft and prompt injection, here is what security teams need to know about ChatGPT security in practice.
AI Assistants Are Rewriting the Threat Model, Not Just the Workflow
Autonomous coding agents and AI 'co-workers' are blurring the line between data and code, trusted user and insider threat. A wave of recent incidents shows organizations are not ready for what they have already deployed.