Machine Learning Security: Governance Frameworks, Supply Chain Risks, and Defender Priorities
Machine learning security requires more than adversarial testing. This guide maps NCSC attack categories to NIST AI RMF controls and covers model supply chain risks that most organizations haven't addressed.
Machine learning security ↗ encompasses a broader threat surface than most security programs are currently built to address. Organizations deploying ML in fraud detection, intrusion detection, or content moderation face adversarial risks at every stage of the model lifecycle — data collection, training, deployment, and third-party model integration — and most defensive frameworks haven’t caught up.
The gap is governance, not technology. Adversarial testing tools exist. The problem is that most security teams don’t have an inventory of which models are in production, where those models came from, or what the chain of custody looks like for the weight files being served at inference.
Mapping the Threat: Seven Attack Categories ↗
The UK National Cyber Security Centre’s taxonomy ↗ provides a practical starting point for threat modeling ML deployments. It identifies seven distinct attack classes, each requiring different defensive controls.
Evasion attacks manipulate inference-time inputs to cause misclassification. In network security, this means crafting traffic that defeats an ML-based anomaly detector. In content moderation, it means generating inputs that slip past classifiers. The attacker needs query access to the model — available in almost every deployed system.
Training data poisoning corrupts the learning process by injecting adversarial samples before training completes. A poisoned intrusion detection model can be made to ignore specific attack signatures. Because the effect is baked into the model weights at training time, runtime monitoring won’t catch it.
Model inversion uses the model’s own outputs to reconstruct training data. For models trained on sensitive records, this creates direct privacy exposure — health data, financial profiles, PII — without the attacker ever breaching the data store directly.
Model characterisation extracts model structure and parameters through repeated queries, enabling an attacker to build a local copy for offline white-box attack development. The extracted surrogate is then used to generate evasion inputs that transfer to the original target.
Malicious model training and model artifact manipulation target the training environment and the trained model file itself, respectively. Both fall under supply chain risk. The NCSC notes that defenses for hardware side-channel attacks — the seventh category — remain an active research problem with limited production-ready mitigations available today.
The Model Supply Chain Gap
Training data poisoning and model artifact manipulation converge on a shared structural problem: the model supply chain is largely unverified in most enterprise deployments.
OWASP’s LLM Top 10:2025 ↗ classifies supply chain compromise as LLM03, citing several concrete attack vectors:
- Pre-trained model tampering: Direct parameter editing — demonstrated publicly by the PoisonGPT proof-of-concept — modifies model behavior in ways that survive standard benchmark evaluation but activate only under specific inputs.
- LoRA adapter injection: Malicious low-rank adaptation adapters merged into base models can introduce backdoors that aren’t visible in base model weights and survive the merge process.
- Weak provenance: Most publicly distributed model weights carry no cryptographic signature. A model downloaded through unofficial redistribution channels — quantized GPTQ or GGUF formats are frequently redistributed outside original repositories — may not match what it claims to be.
OWASP’s recommended controls include signed Software Bill of Materials (SBOMs) for model artifacts, red-team evaluation of third-party models before integration, and anomaly detection on model outputs in production. None of these require novel capabilities, but most organizations have no process to apply them systematically.
AI-Alert.org ↗ maintains a running tracker of disclosed ML vulnerabilities and model compromise incidents as they are publicly reported.
NIST AI RMF: The Governance Anchor
NIST’s AI Risk Management Framework ↗, released January 2023, organizes ML security controls under four functions: Govern, Map, Measure, and Manage. The Govern function addresses organizational accountability — who owns AI risk, what policies exist, how risk decisions are escalated. Map identifies specific AI risks present in a given system. Measure assesses those risks. Manage defines how identified risks are treated and tracked.
For machine learning security ↗ practitioners, the framework’s most practical application is as an audit structure: walk through each function and identify gaps between what the framework expects and what the organization can demonstrate. A companion Generative AI Profile, published by NIST in July 2024, extends coverage to generative system-specific risks including prompt injection, model abuse, and generated content harms.
For teams building monitoring into ML inference pipelines, Sentry ML ↗ covers model drift detection, output anomaly monitoring, and production ML health observability — controls that map to the Measure and Manage functions of the AI RMF. Defensive guardrail tooling for content filtering and output safety is covered at GuardML ↗.
Defender Action Items
Based on the NCSC threat taxonomy and OWASP LLM03 mitigations, five controls address the highest-likelihood risks:
- Build a model inventory. List every ML model in production — source, hash, and deployment date. Until this list exists, scope is unknown and governance is impossible.
- Verify model provenance before integration. Treat unverified third-party model weights as untrusted binaries. Require cryptographic verification or internal red-team evaluation before production promotion.
- Instrument model outputs. Establish baseline output distributions at deployment and alert on significant statistical deviation. Poisoned models frequently produce detectable anomalies when triggered.
- Harden the training pipeline. Access controls on training datasets, logging of data modification events, and change management for hyperparameters are baseline controls that most training infrastructure doesn’t enforce by default.
- Apply the NIST AI RMF Govern function first. Without organizational accountability for ML risk, technical controls don’t have a home. Designate ownership before purchasing tooling.
Sources
-
NIST AI Risk Management Framework — U.S. government voluntary framework organizing AI risk controls under Govern, Map, Measure, and Manage, with a July 2024 Generative AI Profile extension. https://www.nist.gov/itl/ai-risk-management-framework ↗
-
NCSC: Understanding adversarial attacks against Machine Learning and AI — UK government technical paper enumerating seven ML attack classes — evasion, poisoning, model inversion, characterisation, malicious training, artifact manipulation, and hardware side-channel — with defensive guidance and research gaps. https://www.ncsc.gov.uk/paper/understanding-adversarial-attacks-against-machine-learning-and-ai ↗
-
OWASP LLM Top 10:2025 — LLM03: Supply Chain — Covers model supply chain attack vectors including pre-trained model tampering, LoRA adapter injection, and provenance failures, with recommended controls including SBOMs, red-teaming, and output anomaly detection. https://genai.owasp.org/llmrisk/llm03-training-data-poisoning/ ↗
Related across the network
- Machine Learning Security Across the Pipeline: Training Data to Deployed Model ↗ — ai-alert.org
- Machine Learning Security: Threats, Frameworks, and Defenses ↗ — ai-alert.org
- AI Security: Attack Categories, Defense Gaps, and How to Respond ↗ — ai-alert.org
- LLM Security Risks: A Practitioner’s Field Guide for 2025 ↗ — ai-alert.org
- Generative AI Risks: A Practical Taxonomy for Security and Operations Teams ↗ — ai-alert.org
Sources
Tech Sentinel — in your inbox
Cybersecurity news, daily — breaches, CVEs, ransomware, threat actors, and the patches that matter. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Deepfake Cybersecurity: How AI Voice Cloning Is Reshaping Enterprise Fraud
Voice deepfake incidents rose 680% in 2025 as attackers clone executives from seconds of audio. Here is what security teams need to know about detection, FBI advisories, and NIST standards.
How AI Fraud Detection Works: Techniques, Trade-offs, and What's Next
AI fraud detection systems catch 70–90% more suspicious activity than rules-based methods. Here's how machine learning, graph neural networks, and behavioral analysis work — and where the structural gaps remain.
RubyGems Suspends New Signups After Hundreds of Malicious Packages Are Uploaded
RubyGems has temporarily disabled new account registrations after attackers uploaded hundreds of malicious packages and launched a DDoS campaign against the popular Ruby package registry.