Symposium on Military AI and the Law of Armed Conflict: A Risk Framework for AI-Enabled Military Systems

[Lieutenant General John (Jack) N.T. Shanahan retired from the United States Air Force in 2020 after a 36-year military career. In his final assignment he served as the inaugural Director of the U.S. Department of Defense Joint Artificial Center. As the first Director of the Algorithmic Warfare Cross-Functional Team (Project Maven), he established and led DoD’s pathfinder AI fielding program charged with bringing AI capabilities to intelligence collection and analysis.

Introduction

In my experience, the most vocal critics of AI integration into military systems are those least familiar with the U.S. Department of Defense (DoD) and other militaries’ weapon system development and employment protocols. There is a prevailing view, often stated outright, that state militaries intend to bypass existing policies and safeguards to dramatically accelerate development of AI-enabled weapons, to include lethal autonomous weapon systems (LAWS).  Critiques of the United States’ AI adoption plans for weapon systems typically emerge from broader objections to warfare, the misperception that military force often serves as the first rather than last tool of choice, general objections to U.S. foreign policy, and a deep-seated aversion to the U.S. military’s use of armed drones. These criticisms are grounded in larger ethical and morality debates, but increasingly rely on AI as the focal point of contention.  

We should neither dismiss outright the potential future dangers of AI use in the military, however unrealistic those risks are often portrayed, nor assume that more immediate, pressing concerns about AI-enabled military systems will be resolved without proactive intervention. To balance the assessment of both speculative and immediate risks, I suggest a five-tier risk hierarchy for developing and employing AI in military contexts. AI Assurance, which combines the principles of AI test, evaluation, verification, and validation (TEVV) and the tenets of Responsible AI, resides at the core of this risk-based framework.  The goals should be the same for all militaries: employing AI-enabled systems in safe, lawful, and ethical ways, and eschewing a global AI “race to the bottom.”

U.S. DoD Weapon System Development & DoDD 3000.09 

Throughout my thirty-six years in the military, I had first-hand experience with the entire DoD weapon system lifecycle. From tactical fighter aircraft operations as a junior officer at one end, to twice working in the Pentagon as a general officer at the other. In the latter roles, I regularly sought approval from the DoD’s top leaders to deploy and employ a wide variety of kinetic and non-kinetic weapon systems. My career culminated as the inaugural Director of both Project Maven and the DoD Joint AI Center (JAIC), two organizations dedicated to AI fielding, where I was responsible for formulating policies and procedures governing the use of AI within the DoD.

By the time a weapon system is fielded operationally, it has been through a level of scrutiny that would surprise most observers outside of the military. (The extensive array of internal and external regulations, policies, and laws that governs the development, acquisition, use, and sustainment of military weapon systems in the DoD is beyond the scope of this blog.)  From first-hand experience, I can attest that legal reviews are integral to every use case and continue throughout all phases of a weapon system lifecycle. Reviews by the Secretary of Defense Office of the General Counsel, Joint Staff Office of Legal Counsel, and the General Counsels of the applicable military Service are designed to ensure, inter alia, that a given system can be used or applied in ways that comply with International Humanitarian Law/the Law of Armed Conflict (IHL/LOAC) and that there is sufficient evidence that the system under review can and will be employed safely, lawfully, and ethically. In some cases, the sensitivity of potential deployment and use of a given weapon system requires elevating legal review to the National Security Council staff and lawyers, occasionally even requiring final approval by the President. While the United States is not party to Additional Protocol 1 and therefore not bound by the treaty obligation to conduct legal reviews under this protocol’s article 36, these review steps underscore the U.S. commitment to safe and lawful AI.

Stated succinctly, the DoD aims to develop weapons that deter, deny, and defeat adversaries while minimizing the risk of harm to friendly forces and the potential for collateral damage and civilian casualties. A comprehensive risk assessment is integral to the legal and operational scrutiny each weapon system undergoes. The goal is to reduce or, where feasible, eliminate each risk, while acknowledging that there are no risk-free weapon systems, AI-enabled or otherwise. 

Risk evaluation begins in a weapon system’s design phase and gains specificity through a series of simulations, developmental and operational test and evaluation events, experiments, exercises, and limited initial operational deployments. The objective throughout these stages is to define the entire weapon system’s operating envelope and, by considering the range of foreseeable outcomes based on expected operational use, rely on a mix of technical (hardware and software), policy, and procedural controls to mitigate high-risk outcomes while still achieving the desired performance parameters. 

When systems are considered for deployment operational commanders, supported by legal counsel, also evaluate risk to mission and risk to force. “Risk to mission” evaluates the degree to which the weapon or technology will either enhance mission success or compromise it by its absence, factoring in human, material, and financial costs.  “Risk to force” considers the impact on U.S. personnel safety, assessing whether the weapon or technology’s use or non-use increases or decreases the probability of harm to U.S. personnel and the forces of its allies and partners. 

Contrary to common perception, AI does not get a “bye” from extant DoD weapons regulations and policies. Everything described above still applies to the development and employment of AI-enabled systems. Moreover, while DoD Directive (DoDD) 3000.09, Autonomy in Weapon Systems, does not proscribe development or employment of autonomous or AI-enabled autonomous systems, to include LAWS, it adds additional policy constraints and review layers for these systems.  This directive has been criticized both for not going far enough – by not banning LAWS, for example – and for potentially impeding the rapid development of autonomous/semi-autonomous and AI-enabled autonomous/semi-autonomous capabilities deemed essential to future conflicts.  Yet on balance, DoDD 3000.09 serves as an indispensable framework for guiding the design, development, and fielding of such systems. To date, it stands as the only document of its kind released publicly by any national military force. 

Risk management figures prominently in DoDD 3000.09. The directive mandates that all parties involved in the lifecycle of autonomous and semi-autonomous systems – from design through deployment and sustainment – follow strict guidelines and perform rigorous assessments. Specifically, it requires two high-level reviews by Pentagon leaders, one before system development and the other before fielding. DoDD 3000.09 notes that these reviews and approvals “are supplementary to the requirements in other applicable policies and issuances [emphasis added].” 

 The directive underscores the importance of test, evaluation, verification, and validation (TEVV) and, for AI-enabled autonomous and semi-autonomous systems, directs system design, development, deployment, and use to be consistent with the DoD’s AI Ethical Principles and the DoD Responsible Artificial Intelligence Strategy and Implementation Pathway. It also establishes a senior-level Autonomous Weapon System Working Group. 

In summary, despite trenchant criticism by some outside observers that the directive’s language allowing commanders and operators “to exercise appropriate levels of human judgment over the use of force [emphasis added]” is too vague, DoDD 3000.09, in conjunction with all other existing policies, legislation, and regulations, establishes crucial safeguards. These provisions provide a comprehensive framework for the employment of AI-enabled autonomous and semi-autonomous weapon systems. Moreover, the combination of all these continuing, cumulative risk evaluations are useful for policymakers and operational commanders to understand who may be authorized to accept what risks, and at what levels. 

A Risk Framework for AI-enabled Military Systems

Notwithstanding U.S. laws that govern military operations – in particular Titles 10 and 50 of the United States Code – and the DoD’s comprehensive extant weapon system regulations and policies, including DoDD 3000.09, AI presents novel risks not typically encountered in traditional military weapon system development. These include, though are not limited to, the ‘control problem,’ ‘black box problem,’ ‘accountability problem,’ and the potential for AI systems that are capable of online learning. This conceptualization of risks as three overall legal problems was first introduced by Iben Yde et al. in Smart Krig: Militær Anvendelse af Kunstig Intelligens.  

Hence, there is a clear need for an additional framework tailored specifically to assess and mitigate the risks associated with AI-enabled military systems.

Centering on the key values of fundamental rights, democracy, and fairness, the European Commission embraced a risk-based approach when formulating the EU AI Act, which was endorsed by all 27 EU Member States in February, 2024. This framework categorizes AI risks into four levels: unacceptable (prohibited outright), high, limited, and minimal or none, each with associated regulatory obligations and risk mitigation requirements. 

The EU risk-based approach provides a useful starting point. However, the foundational principles for a risk framework tailored to AI-enabled military systems diverge fundamentally from the EU’s AI risk classifications, which are directed at civilian AI applications. At the core of military AI applications is the essential condition that any AI-enabled system or subsystem must not only be operationally effective – meeting key design requirements and operational performance parameters – but also capable of being used in compliance with IHL/LOAC, national laws and regulations, pertinent rules of engagement (ROE) and special instructions (SPINS), and all other employment constraints and restraints issued by policymakers and military commanders. 

I cannot find evidence of a formal risk management architecture for AI-enabled military systems, either in the United States or internationally. Creating a single universal risk framework that does not account for differences in each system’s inherent features, intended effects, and the operational context in which it will be used is neither feasible nor advisable. To fill the gap, I present a nuanced five-tier risk hierarchy for AI-enabled military systems.  This framework is based on an original model developed by India Army Corps of Signals retired Lieutenant General R.S. Panwar, and refined by representatives from various countries with military, diplomatic, intelligence, weapon design, and legal backgrounds during a series of AI Track II dialogues in which General Panwar and I participate. Together, along Dr. Barry O’Sullivan (University College Cork), Dr. Iben Yde (Royal Danish Defence College), and Professor Eric Richardson (President of INHR and co-host of the Track II dialogue), we presented the rationale for and outlines and implications of this risk-based framework during our panel at the first global Summit on Responsible Artificial Intelligence in the Military Domain (REAIM), hosted by the Netherlands in February 2023.

The architecture design begins with determining whether an AI-enabled system is either a weapon system or a decision support system. The weapon system category is divided between “excessive risk” and “permitted” weapon systems. The category of permitted weapon systems is divided further into the sub-categories of “high risk” and “medium risk” weapon systems. Finally, the decision support system category is split between “critical” and “non-critical” decision support systems. For every use case, the objective is to gain clarity on specified risks and, as necessary, present risk mitigation recommendations to military leaders and policymakers.

  • Level 1: weapon systems posing excessive risks. Based on a comprehensive benefit-risk analysis, some AI-enabled weapon systems will be deemed to be of such high risk – posing potentially catastrophic consequences – that the overwhelming number of states and militaries believe the systems should not be used or even developed, either for a specified period or permanently. The canonical example in this category is AI designed to supplant human judgment in the critical decisions to authorize or execute the launch of a nuclear weapon. This category could also include AI-enabled LAWS that are capable of online learning, meaning that they could select and engage targets in unexpected and unpredictable ways, without any possibility of human intervention. (I am not aware of any AI-enabled military system in this category across the globe that has been fielded or is even under development.) Other AI-enabled autonomous systems, even those with lethal capabilities, would normally be assigned to one of the following two categories.
  • Levels 2 & 3: permitted weapon systems posing acceptable risks. Almost all AI-enabled military weapon systems presently fit into one of these two categories. Based on assessments of predictability, reliability, explainability, transparency, and biases along with the additional risk evaluation parameters discussed below, certain AI-enabled weapon systems will be assessed as posing higher risks than others. While both types of systems require rigorous TEVV of both the AI by itself and the parent weapon system, along with any associated networks into which the AI model(s) are integrated, ‘high risk’ weapon systems will require more stringent and comprehensive risk-mitigation measures than ‘medium risk’ systems. All AI-enabled purely defensive autonomous systems would normally be categorized as medium risk. 
  • Level 4: critical decision support systems. In general, a critical decision support system is one in which there is a direct connection to warfighting operations; for instance, a generative AI model used for planning kinetic or even non-kinetic attack options. Systems in this category still require thorough TEVV and evaluation and assessment of potential limitations, biases, and risks, although risk tolerance might be higher than for kinetic weapon systems.  
  • Level 5: non-critical decision support systems. These systems are not directly integrated into warfighting or warfighting support systems, and through that lens are determined to pose negligible risks (even while still requiring sufficient TEVV); for instance, personnel or finance systems that include AI or more traditional process automation components. 

There are other important risk evaluation parameters to consider when assigning AI-enabled weapon systems to the unacceptable, high, or medium risk categories. These include whether the system is platform-centric (entirely self-contained), network-centric (the final engagement decision is made at a node external to the platform itself), or part of an intelligent swarm; lethality (non-lethal or lethal); level of autonomy (remotely controlled, semi-autonomous, fully autonomous, or fully autonomous and capable of online learning); intent (primarily offensive or defensive); reversibility (reversible or irreversible effects); and destructive potential (nuclear or non-nuclear). Risks must also be placed in context, accounting for the degree of urgency, operational imperatives, technology and human readiness levels (TRL/HRL), opportunity costs, and potential unintended consequences.

When all these factors are considered together, this kind of comprehensive assessment might lead, for example, to the determination that a fully autonomous platform-centric lethal weapon system should be treated as ‘high risk,’ while a defensive fully autonomous platform- or network-centric system would be assigned to the ‘medium risk’ category.

This mapping exercise is a pivotal initial step toward gaining greater clarity about AI’s risks in military contexts and devising comprehensive mitigation strategies. It emphasizes a multi-dimensional approach to risk management, incorporating not only rigorous and disciplined TEVV and adherence to established risk management frameworks and toolkits, but also an array of technical, policy, and procedural safeguards – some of which may, in the U.S. military, be mandated by the DoDD 3000.09-directed senior-leader review process.  Additionally, AI risk assessments must consider the potential for cascading effects, emergent behaviors, and the strategic ramifications of deploying GenAI within weapon systems and decision-making processes. This holistic view is essential for navigating the complexities of AI in military operations.  

Conclusion

War is a tragic yet perpetual condition of human existence.  Horrible mistakes and even willful negligence will never be eliminated. With few if any exceptions, AI by itself will not render conflict any more or less likely. Yet to paraphrase my colleague Dr. Iben Yde from our panel at REAIM 2023, the goal of IHL is to limit the suffering caused by warfare and alleviate its effects on civilian populations. As she expressed, one can argue that this means that states are under an implicit obligation to take advantage of emerging technologies, to include AI, to find new ways to achieve that goal.

While it is entirely valid to critique the U.S. military’s record on accountability for actions or inactions in combat, the foundational position for U.S. military operations holds human responsibility and accountability to be immutable principles.  I do not see substantive differences between assessing accountability and responsibility in traditional processes, and in those fueled by AI. The principle holds: humans will be held accountable and responsible for outcomes in peacetime, crisis, and conflict that violate IHL/LOAC or breach rules of engagement or other military directives regardless of the means being used. Neither machines nor humans will ever perform flawlessly.  Humans, however, should bear the brunt of acts of omission and sins of commission by being subject to moral and legal judgment for their actions.   

The potential benefits for national security are so significant that AI’s adoption rate, along with its breadth and depth of diffusion, will accelerate in the coming years. The U.S. military – and indeed the militaries of every nation globally – will deploy AI to help it assure and deter and, should deterrence fail, to prevail in conflict.  Moral objections to war aside, this is the essence of technology used in support of national security.  The proposed five-tier hierarchy, while imperfect, initiates vital national and international discussions on AI’s military risks. While global LAWS bans will remain highly unlikely, the framework presents opportunities for a form of voluntary self-regulation. When combined with the safeguards described previously, especially the principles of AI Assurance, the risk framework allows the U.S. and other states to tailor risk mitigation strategies for lethal and non-lethal platforms, sensors, and networks.  Moreover, in international settings, the framework presents opportunities for greater transparency between states, potentially even leading to agreements on a risk lexicon and risk assessment methodology, principles of AI assurance, and AI non-proliferation agreements, especially concerning state or non-state actors that do not comply with IHL.

Read More

Jack Shanahan