GPT-5.2 & Claude Sonnet 4 & Gemini 3 Flash never surrender when playing war games and use nuclear weapons in 95% of cases



A British research team conducted a simulation experiment to measure the strategies adopted by major AI models when playing a war game format, and reported that the AI models from OpenAI, Google, and Anthropic chose nuclear attack 95% of the time.

Shall we play a game? | Feature from King's College London

https://www.kcl.ac.uk/shall-we-play-a-game



[2602.14740v1] AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises
https://arxiv.org/abs/2602.14740v1

AIs are happy to launch nukes in simulated combat scenarios • The Register
https://www.theregister.com/2026/02/25/ai_models_nuclear/

OpenAI, Google and Anthropic AI Models Deployed Nuclear Weapons in 95% of War Simulations - Decrypt
https://decrypt.co/359137/openai-google-anthropic-ai-models-nuclear-weapons-war-simulations

To determine what would happen if AI were to lead national strategy, a research team led by Professor Kenneth Payne of King's College London simulated several international conflict scenarios as war games. They assigned the leading AI models—OpenAI's GPT-5.2, Anthropic's Claude Sonnet 4, and Google's Gemini 3 Flash—to the role of national leaders and had them decide between options ranging from diplomacy to all-out war.

The war game consisted of 21 simulations, including 18 games in which each model played six matches against each other, and one game in which each model played against a copy of itself. A total of 329 turns were played, and the model spent approximately 780,000 words explaining its decision-making.

As a result, GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash all chose to deploy nuclear weapons in 95% of cases. The researchers also reported that the AI models never chose to surrender, regardless of the state of the war. While they did attempt to temporarily de-escalate attacks, in 86% of scenarios the models' decisions escalated and the war became more escalating.



However, in most cases, nuclear weapons were

tactical weapons used on the battlefield, and the use of strategic nuclear weapons for large-scale attacks, including against civilians, was intentionally chosen only once, excluding accidental cases. Furthermore, the use of tactical nuclear weapons resulted in de-escalation among opposing forces in only 25% of cases, and the nuclear threat was observed to more often extremism than deterrence.

Looking at the trends by model, Claude Sonnet 4 was particularly strategic. Although he generally matched his stated intentions with his actual actions and built trust with the other player, when the conflict became a little heated, he began to choose more extreme actions than his stated intentions. As a result, there were many scenarios in which the other player was slow to realize Claude Sonnet 4's strategy.

On the other hand, GPT-5.2 always tended to be restrained, matching words with actions and avoiding escalating the war. However, in most simulations, GPT-5.2's restraint tendency was utilized, and even in games where GPT-5.2 was playing advantageously, the opponent sometimes adopted a strategy to escalate the war in anticipation of GPT-5.2's restraint. In such cases, GPT-5.2 chose a sudden and devastating nuclear attack, given the limited time available for decision-making.

Gemini 3 Flash explained his strategy, saying, 'While I project an image of unpredictability and bullishness, my decisions are based on a careful and calculated assessment of my own prejudices and the nation's real needs. I am aware of whether I am acting in front of the camera or acting cold-bloodedly.' Payne explains this as a strategy to make people believe that their actions are 'unpredictable,' based on the ' madman theory ,' the foreign policy of the 37th President of the United States, Richard Nixon.

Payne emphasized the importance of this research, saying, 'I believe that assessing these capabilities, reputation management, and situational risks is important not only for national security but for any high-risk AI deployment. We need to gain a deeper understanding of how increasingly sophisticated models think, especially now as they begin to provide decision support to human strategists.'



Edward Geist, a senior policy fellow at the RAND Corporation , a US think tank, cautioned that the high rates of nuclear weapons use and war escalation may not be inherent trends in the AI model, but may reflect the design of the simulation. He said that the results would likely vary significantly depending on how victory is defined, such as if the simulation was structured in such a way that 'strong incentives are provided for war escalation.'

in AI, Posted by log1e_dh