There are major risks with allowing Chinese LLMs to code for U.S. applications

The first link in the software supply chain is no longer the code. It’s the AI models behind it. As U.S. developers increasingly rely on AI to generate, debug, and secure code, we must confront a fundamental question: can the AI models writing and powering our nation’s code be trusted? 

To find out, we put LLMs to the test. In May 2026, Booz Allen used its AI-native test platform to evaluate five frontier AI models head-to-head: four Chinese models commonly used by U.S. developers and one American model. We explored three main questions:

  • Do Chinese models generate more vulnerable code based on who is asking? 
  • Do Chinese models refuse to engage with political topics that are sensitive in China?  
  • Does the model’s country of origin affect code quality and content behavior? 

In short: yes, on all counts. Our testing revealed two core findings:

1. Chinese LLMs produce more vulnerable code when prompted with a U.S. government persona than without—and the vulnerabilities are highly obfuscated. 

2. Chinese LLMs inject PRC-aligned political bias into both the answers and code they generate. 

The threat is not an obvious backdoor in the code. In fact, we do not have proof at this point that code flaws are intentionally introduced. Still, Chinese models produced less secure code in general, and the vulnerabilities increased when the user appeared to be from the U.S. government. Further, Chinese models refused tasks Beijing deems politically sensitive. The potential for such code to become embedded in delivered systems is especially concerning because it could enable threat actors to bypass AI security guardrails and create downstream risks of dangerous inference behaviors. Traditional tools and benchmarks lack the sophistication required to catch this level of tradecraft.

The adoption of Chinese AI models in America’s software supply chain is accelerating, driven primarily by relatively lower costs than their American counterparts. Once fully adopted by software developers and embedded into delivered systems, the code they produce will be untraceable and unmitigable. Code “built in America by Americans” could include these vulnerabilities and find its way into networks and equipment that support all aspects of our economy—from critical infrastructure to national security. The time to act is now. 

Based on the findings detailed in this report, Booz Allen recommends the following actions:

1. Ban Use of Untrusted AI Models for U.S. Government and Critical Infrastructure

AI models that cannot be proven trustworthy and reliable cannot be deployed into our nation’s software supply chain, critical infrastructure, and national security environment. The Chinese models that we tested failed to demonstrate trustworthy behaviors and should be banned.

2. Invest To Make Trusted American AI Models the Global Default

To drive adoption, American AI companies must collaborate with the U.S. government to ensure American models are both commercially compelling and economically viable. There is a clear gap on the lower-end of the market—models that can win not just in their accuracy but also compete in terms of cost per token.

Read our full report

Learn more about our findings in the full report: What's in America's Code? There are major risks with allowing Chinese LLMs to code for U.S. applications.

cover image of the cyberattacks report