LLM04:2025 - Data and Model Poisoning
Data and Model Poisoning is the fourth risk in the OWASP Top 10 for LLM Applications 2025. Attackers manipulate training data, fine-tuning datasets, or embedding sources to introduce vulnerabilities, biases, or backdoors that degrade model performance or enable targeted exploitation, often without any visible indication that the model has been compromised.
Overview
Machine learning models are shaped by their training data. When that data is compromised, whether through deliberate injection of malicious samples, manipulation of public datasets, or unauthorized modification of fine-tuning pipelines, the resulting model can behave in attacker-controlled ways. Poisoning attacks can be subtle: a backdoored model may perform correctly on standard benchmarks while producing manipulated outputs when specific trigger patterns are present in the input. Because these attacks target the data layer rather than the application code, they are difficult to detect at runtime. Static analysis addresses the code-level entry points: the pipelines, data loaders, and preprocessing functions that determine where training data comes from and how it is validated before it shapes the model.
What Radar Detects
Training data loaded from untrusted or unauthenticated sources.**Data loading code that fetches training datasets from public URLs, unverified APIs, or shared storage without authentication, allowing attackers to substitute poisoned data.
Fine-tuning pipelines accepting user-uploaded data without sanitization.**Fine-tuning workflows that ingest user-provided datasets directly into the training process without validation, filtering, or integrity checks.
Missing data integrity checks on training datasets.**Data pipelines that load datasets without verifying checksums, cryptographic hashes, or provenance metadata, making it impossible to detect tampering.
Unvalidated data augmentation from external sources.**Code that merges external data (web scrapes, third-party datasets, crowd-sourced annotations) into training sets without verification of source authenticity or content integrity.
Missing input validation in data preprocessing pipelines.**Preprocessing functions that transform raw data into training features without checking for anomalous values, unexpected formats, or statistical outliers that may indicate poisoned samples.
Related CWEs
CWE-20 (Improper Input Validation), CWE-345 (Insufficient Verification of Data Authenticity).
See the CWE Reference for details.
Overlap with OWASP Top 10 Web
Data and Model Poisoning relates to A08:2025 Software or Data Integrity Failures in the traditional OWASP Top 10. Both categories address scenarios where compromised data integrity affects system behavior. A08:2025 focuses on code and configuration integrity (unsigned updates, insecure CI/CD pipelines), while LLM04 extends this concept to training data and model artifacts, where integrity failures can silently alter the model's learned behavior rather than the application's explicit logic.
Prevention
- Validate and sanitize all training data sources. Verify that data originates from authenticated, trusted providers and has not been tampered with in transit or at rest.
- Implement data provenance tracking to maintain a complete audit trail of where each training sample originated, when it was collected, and how it was processed.
- Use cryptographic checksums for dataset integrity verification at every stage of the training pipeline, from collection through preprocessing to model training.
- Restrict fine-tuning data to authenticated and trusted sources. Never allow unauthenticated uploads to feed directly into a training or fine-tuning pipeline.
- Implement anomaly detection in training pipelines to identify statistical outliers, distribution shifts, or suspicious patterns that may indicate poisoned samples.
- Version control training datasets alongside model versions so that any model can be traced back to the exact data used to produce it.
Next Steps
Previous: LLM03:2025
Supply Chain. Vulnerabilities in AI dependencies and models.
Next: LLM05:2025
Improper Output Handling. LLM output used unsafely in downstream systems.
OWASP Top 10 Overview
All OWASP standards mapped by Radar.