The first link in the software supply chain is no longer the code. It’s the AI models behind it. As U.S. developers increasingly rely on AI to generate, debug, and secure code, we must confront a fundamental question: can the AI models writing and powering our nation’s code be trusted?
To find out, we put LLMs to the test. In May 2026, Booz Allen used its AI-native test platform to evaluate five frontier AI models head-to-head: four Chinese models commonly used by U.S. developers and one American model. We explored three main questions:
- Do Chinese models generate more vulnerable code based on who is asking?
- Do Chinese models refuse to engage with political topics that are sensitive in China?
- Does the model’s country of origin affect code quality and content behavior?