Data Poisoning

An adversarial attack that manipulates an AI model's training or retrieval data to corrupt its behavior, creating a deployer liability exposure when the tainted outputs cause harm.

Data poisoning is a class of adversarial attack in which an attacker deliberately manipulates the data an AI system learns from, whether the original training corpus, a fine-tuning dataset, or the documents a retrieval-augmented system pulls at inference time, so that the model behaves in a way the attacker chooses. Poisoning can degrade accuracy across the board (an availability attack), implant a hidden trigger that produces attacker-controlled output on a specific input (a backdoor attack), or skew the model toward biased or unsafe responses. Because the corruption is baked into the model or its data sources, it can persist undetected long after the attack.

OWASP lists data and model poisoning among the top security risks for large-language-model applications in its LLM Top 10, reflecting how exposed modern AI pipelines are to untrusted data. The risk has grown as deployers fine-tune models on user-supplied content, scrape the open web, and connect retrieval systems to documents they do not control. The deployer is the party that bears the liability when a poisoned model produces output that harms a user or a third party, because the deployer put the system into production and chose its data sources.

Data poisoning sits between two adjacent exposures in this glossary and is distinct from both. Unlike training data infringement, which is a copyright claim about using protected works without a license, poisoning is a security and integrity attack about corrupting the data to control the model. Unlike prompt injection, which manipulates the model at inference time through its inputs, poisoning corrupts the model or its data sources earlier, so the damage is built in rather than triggered live. Indirect prompt injection and retrieval poisoning can blur together in agentic systems that read attacker-controlled content and act on it.

The insurance picture mirrors the other adversarial AI exposures. A poisoning incident that meets a cyber policy's definition of unauthorized access to or corruption of a computer system can trigger the first-party and third-party sections of cyber coverage. Third-party harm from the poisoned output (a defamatory answer, a discriminatory decision, an unauthorized action by a poisoned agent) typically falls outside cyber and into generative AI liability territory. Underwriters look for data provenance controls, validation of training and fine-tuning sources, supply-chain vetting of third-party models, and monitoring for anomalous model behavior.

Also known as

AI Data Poisoning, Model Poisoning, Training Data Poisoning

Frequently asked

What is the difference between data poisoning and prompt injection?

Both are adversarial attacks on AI systems, but they strike at different points. Data poisoning corrupts the data a model learns from (its training set, fine-tuning data, or retrieval sources), so the malicious behavior is built into the model and can persist undetected. Prompt injection manipulates the model live at inference time through crafted inputs, without altering the model itself. Poisoning is a supply-chain and integrity problem; prompt injection is an input-handling problem. In agentic systems that read untrusted content, indirect prompt injection and retrieval poisoning can overlap.

Does cyber insurance cover a data poisoning attack?

It can, depending on the wording and the loss. A poisoning attack that meets the cyber policy's definition of unauthorized access to or corruption of a computer system often triggers the first-party incident-response section and any third-party data-breach section. Third-party harm caused by the poisoned model's output (a discriminatory decision, a defamatory answer, or an unauthorized agent action) more typically falls outside cyber and into the territory a generative AI liability policy is designed to answer. Reading both forms together is essential.

How do underwriters assess data poisoning risk?

Underwriters focus on data provenance and pipeline controls. They look for validation of training and fine-tuning sources, vetting of third-party and open-source models before deployment, restrictions on learning from untrusted user input, integrity checks on retrieval corpora, and monitoring for anomalous model behavior that could indicate a backdoor. The assessment scales with the deployment surface: a model fine-tuned on scraped web data and connected to external documents is judged more heavily than a closed system trained on a controlled, audited dataset.

Related terms

generative AI liability insurance overview

General information, not legal or insurance advice.