Machine Learning Security: Key Threats, Attack Types, and Defenses

Machine learning security is no longer a research boundary problem. As ML models move into fraud detection, malware classification, autonomous threat response, and access control, they inherit a threat surface that traditional application security was not designed to handle. Attackers are not just targeting the applications that run on ML systems — they are targeting the models themselves.

NIST formalized this landscape in March 2025 with the release of AI 100-2e2025 ↗, an updated taxonomy covering adversarial machine learning attacks and mitigations across both predictive and generative AI systems. The document establishes common terminology across evasion, poisoning, privacy, and misuse attack categories — a signal that the field has matured enough to need shared vocabulary for compliance and governance purposes.

The Core Attack Types

Data poisoning is the attack most organizations are least prepared to detect. An adversary injects malicious samples into training data before or during the model training cycle, causing the resulting model to learn incorrect patterns. In a spam filter, poisoned training data can carve out specific senders as perpetually clean. In a network intrusion detection system, it can train the model to ignore particular traffic signatures. The attack happens before the model is deployed, which means it produces no runtime anomalies that monitoring tools can catch.

Evasion attacks hit deployed models at inference time. The attacker modifies input data — often in ways imperceptible to humans — to cause a misclassification. In image recognition, the classic example is adversarial perturbations: pixel-level changes that flip a model’s prediction with high confidence. In network security, the equivalent is packet manipulation: carefully adjusted timing, size, or encoding that makes malicious traffic appear normal to an ML-based detection system. ISACA’s 2025 analysis ↗ of adversarial ML in cybersecurity contexts notes that evasion is the most operationally immediate threat to ML-based security tooling.

Model extraction allows an adversary to replicate a proprietary model by querying it systematically and training a surrogate on the outputs. The surrogate does not need to be identical — it needs to be close enough to probe for weaknesses. In practice, extracted models are used to develop evasion inputs offline, where there are no rate limits or detection mechanisms, then replay those inputs against the original system. This two-stage attack pattern is increasingly common against production fraud detection APIs.

Membership inference is a privacy-layer attack rather than a reliability attack: the adversary determines whether a specific data record was part of a model’s training set. For healthcare, financial, or HR applications trained on real user data, successful membership inference leaks information about individuals whose data was used without directly breaching a database. The attack exploits the tendency of models to behave with subtly higher confidence on training samples than on novel inputs.

Backdoor attacks embed hidden triggers during training that cause specific behavior — often misclassification — when that trigger appears in a future input. A facial recognition system with a backdoor might grant access to anyone wearing a specific pattern. The model behaves normally on all other inputs; the trigger activates only when the attacker chooses. Detecting backdoors requires access to training data provenance and dedicated testing with synthetic trigger patterns, neither of which is standard practice in most MLOps pipelines.

See adversarialml.dev ↗ for current research on attack techniques and defenses across these categories.

ML Pipeline Security: Where Vulnerabilities Accumulate

The attack surface for machine learning systems extends well beyond the model itself. The Cloud Security Alliance’s 2025 analysis ↗ of ML pipeline threats identifies five distinct stages where security controls are typically weak: data ingestion, training infrastructure, model storage, deployment APIs, and monitoring systems.

Supply chain exposure is particularly acute. Modern ML pipelines depend on pre-trained foundation models, open-source libraries, third-party datasets, and hosted inference endpoints. A single compromised dependency — a tampered model checkpoint downloaded from a public registry, a poisoned dataset hosted on a shared storage bucket, a malicious package update in a training framework — can propagate corruption across every model trained on that infrastructure.

Access control fragmentation compounds the problem. A typical MLOps environment spans data warehouses, training clusters, model registries, feature stores, and serving infrastructure, often with different identity systems and permission models for each. Misconfigurations in any layer create paths for lateral movement that would not exist in a more unified application stack.

Monitoring gaps are the default, not the exception. Traditional security tooling monitors application behavior at the network and endpoint layers. It does not monitor model prediction distributions, feature drift, or decision boundary shifts — the signals that would indicate an active evasion campaign or a poisoning event taking hold. Teams without ML-specific observability have no baseline from which to detect that something has changed. Sentryml.com ↗ covers the MLOps tooling and monitoring practices that address this gap.

Defensive Posture: What Works

Adversarial testing before deployment is the most direct control available. Red teams with ML expertise can generate evasion inputs, test model behavior under distribution shift, and probe for trigger-based backdoors using published attack libraries. This is not a one-time exercise; models retrained on updated data inherit new vulnerabilities that require retesting.

Input sanitization — feature squeezing, input transformation, and anomaly scoring on incoming requests — reduces the attack surface for evasion. No sanitization scheme is complete, but raising the cost of successful evasion meaningfully reduces opportunistic attacks.

Data provenance tracking makes poisoning detectable after the fact and harder to execute without detection. Logging the origin, timestamp, and hash of every training sample, and auditing that log before each training run, creates accountability for what the model learned from.

Strict access control across the entire ML pipeline — unified identity, least-privilege service accounts, and separation between training and serving environments — limits blast radius if any component is compromised.

Mlcves.com ↗ maintains a database of documented ML vulnerabilities and CVEs, a useful reference when assessing which attack patterns have been exploited against real deployed systems.

The NIST AI 100-2e2025 taxonomy is the current authoritative reference for teams building a machine learning security program. It is voluntary guidance, not a compliance requirement, but it provides the shared terminology that makes risk discussions across security, data science, and legal functions tractable.

Sources

NIST AI 100-2e2025: Adversarial Machine Learning — A Taxonomy and Terminology of Attacks and Mitigations ↗ — Published March 2025, this is the current authoritative NIST taxonomy covering evasion, poisoning, privacy, and misuse attacks for both predictive and generative AI systems, with corresponding mitigation guidance.
The Hidden Security Threats Lurking in Your Machine Learning Pipeline ↗ — Cloud Security Alliance analysis of ML-specific vulnerabilities across data ingestion, training, deployment, and monitoring stages, with detail on supply chain and access control risks.
Combating the Threat of Adversarial Machine Learning to AI-Driven Cybersecurity ↗ — ISACA’s practitioner-focused breakdown of evasion, poisoning, prompt injection, and model extraction attacks against cybersecurity ML systems, with multi-layered defense recommendations.