Artificial Intelligence Security

The Basics of AI Security

Adversaries are developing algorithmic and mathematical approaches to degrade, deny, deceive, and/or manipulate AI systems. As governments continue to operationalize AI across mission sets, often to automate processes and decision making, they must implement defenses that impede adversarial attacks.

In broad terms, adversaries employ the following five types of attacks to debase, evade, and exfiltrate the predictions and private information of AI systems:

poisoning icon


Adversaries pollute training data such that the model learns decision rules that further the attackers’ goals. This is possible by altering only a very small fraction of the training data, and it represents a growing threat given the increased popularity of foundation models pre-trained on data scraped from the web.

Poisoning occurs before model training.

bug in a network


Adversaries package malware within models such that the malware is executed when the model is loaded or when a particular node in a neural network is activated. ML libraries can also be compromised with malicious dependencies. These vulnerabilities are not detected by traditional anti-virus systems. 

Malware can target any phase of the AI lifecycle.

sneaky spy icon


Adversaries engineer inputs with manipulations that result in the model making misclassified and/or unintended predictions. If not caught, these errors can result in dangerous behavior of downstream systems. Adversaries can often make evasive maneuvers with very little cost; for example, inexpensive adversarial stickers/patches can fool a state-of-the-art computer vision model.

Evasion occurs during model inference.

broken shield icon


Adversaries exfiltrate private or revealing information concerning the AI model and its training data. This can be part of a reconnaissance effort for an adversary planning a future attack or a direct attempt to seize sensitive information.

Inversion occurs after model inference and puts the training data at risk of theft.

burgular icon

Model Theft

Adversaries steal intellectual property by exactly or approximately reproducing a model. Adversaries can identify additional vulnerabilities to exploit by examining the replicated model.

Model theft occurs after model inference and puts the trained model weights at risk of theft.

As agencies seek to employ methods to limit or eliminate attacks, it’s important to recognize that AI threats are highly asymmetric:

  • Allowing adversaries to reap rewards with a single successful attack, while requiring defenders to implement controls that need to be resilient to all attacks
  • Requiring defenders to often utilize 100 times the compute power of an attacker

Differential Privacy

Watch the video to learn about how differential privacy is a powerful strategy for protecting sensitive information.

Click Expand + to view the video transcript
Expand + Collapse

Imagine you’re a medical researcher examining patient outcomes for a particular treatment. You need access to a vast amount of data to correlate outcomes with treatment methods, but you don’t want to risk violating any patient’s privacy rights. Differential Privacy can help. Differential privacy is a powerful strategy for protecting sensitive information. Following the aggregation of large amounts of data for a machine learning model, a calibrated amount of random ‘noise’ is added to the collected information. This minimizes the risk of revealing information about individuals from the dataset. Sometimes, malicious actors will try to extract certain data about a topic or person from a dataset using a “Membership Inference Attack.” They do this by training a classifier to discriminate between outputs of models that specifically include or exclude these data. The random noise added to the data reduces the chances of a successful Membership Inference Attack by obscuring the detailed information of each individual. Differential privacy strikes a balance which safeguards individual information through obfuscation while still enabling the creation of accurate and useful predictive models in aggregate. As the adoption and scope of AI increases, the urgency to protect individual privacy demands the usage of privacy-protecting tools. Differential Privacy is key to enabling individuals to safeguard their personal information while allowing data to be used in useful and predictive models. Booz Allen can help. We work closely with our clients across the federal and commercial sectors to develop and deploy machine learning methods that prioritize privacy, safety, and security. Find out more today. 

AI Security Services from Booz Allen

AI Security Slick Sheet

As the single largest provider of AI services to the federal government, Booz Allen works closely with implementers, researchers, and leaders across the government to build, deploy, and field secure machine learning algorithms that deliver mission advantage.

Case Studies

Static Malware Detection & Subterfuge: Quantifying the Robustness of ML and Current Anti-Virus Systems

Challenge: Understand the weaknesses of commercial and open-source machine learning malware detection models to targeted injections of bytes under threat of an adversary with only black-box query access.

Solution: An efficient binary-search that identified 2048-byte windows whose alteration will reliably change detection model output labels from “malicious” to “benign.”

Result: A strong black-box attack and analysis method capable of probing vulnerabilities of malware detection systems, resulting in important insights toward robust feature selection for defenders and model developers.

A General Framework for Auditing Differentially Private Machine Learning

Challenge: More accurately audit the privacy of machine learning systems while significantly reducing computational burden.

Solution: Novel attacks to efficiently reveal maximum possible information leakage and estimate privacy with higher statistical power and smaller sample sizes than previous state-of-the-art Monte Carlo sampling methods.

Result: A set of tools for creating dataset perturbations and performing hypothesis tests that allow developers of general machine learning systems to efficiently audit the privacy guarantees of their system. 

Adversarial Transfer Attacks With Unknown Data and Class Overlap

Challenge: Quantify the risk associated with adversarial evasion attacks under a highly realistic threat model, which assumes adversaries have access to varying fractions of the model training data.

Solution: A comprehensive set of model training and testing experiments (e.g., more than 400 experiments on Mini-ImageNet data) under differing mixtures of “private” and “public” data, as well as a novel attack that accounts for data class disparities by randomly dropping classes and averaging adversarial perturbations.

Result: Important and novel insights that, counterintuitively, conclude adversarial training can increase total risk under threat models for which adversaries have gray-box access to training data. 

AI Security Research Papers

Since 2018, Booz Allen has been a leader in advancing the state of the art in machine learning methodologies that safeguard systems against adversarial attacks. Methods range from adversarial image perturbation robustness for computer vision models and differentially private training to behavior-preserving transformations of malware.

Research by Year




HiddenLayer offers a security platform to safeguard AI machine learning models without requiring access to raw data and algorithms. Booz Allen’s AI and cyber security professionals use HiddenLayer's software to augment AI risk and vulnerability assessments, strengthen managed detection and response, and enhance AI security engineering.


NVIDIA is the premier provider of processors optimized for AI and deep learning tasks. Booz Allen teams with NVIDIA to support high-performance compute needs, such as those used in our research developing techniques that defend against adversarial samples.

Contact Us

Contact Booz Allen to learn more about advanced AI security strategies to safeguard trusted information and AI from adversarial attacks.