Managing AI Risks in an Era of Rapid Progress
Abstract
In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose urgent priorities for AI R&D and governance.
In 2019, GPT-2 could not reliably count to ten. Only four years later, deep learning systems can write software, generate photorealistic scenes on demand, advise on intellectual topics, and combine language and image processing to steer robots. As AI developers scale these systems, unforeseen abilities and behaviors emerge spontaneously, without explicit programming
The pace of progress may surprise us again. Current deep learning systems still lack important capabilities and we do not know how long it will take to develop them. However, companies are engaged in a race to create generalist AI systems that match or exceed human abilities in most cognitive work
There is no fundamental reason why AI progress would slow or halt at the human level. Indeed, AI has already surpassed human abilities in narrow domains like protein folding or strategy games
The rate of improvement is already staggering, and tech companies have the cash reserves needed to scale the latest training runs by multiples of 100 to 1000 soon
What happens then? If managed carefully and distributed fairly, advanced AI systems could help humanity cure diseases, elevate living standards, and protect our ecosystems. The opportunities AI offers are immense. But alongside advanced AI capabilities come large-scale risks that we are not on track to handle well. Humanity is pouring vast resources into making AI systems more powerful, but far less into safety and mitigating harms. For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough.
We are already behind schedule for this reorientation. We must anticipate the amplification of ongoing harms, as well as novel risks, and prepare for the largest risks well before they materialize. Climate change has taken decades to be acknowledged and confronted; for AI, decades could be too long.
Societal-scale risks
AI systems could rapidly come to outperform humans in an increasing number of tasks. If such systems are not carefully designed and deployed, they pose a range of societal-scale risks. They threaten to amplify social injustice, erode social stability, and weaken our shared understanding of reality that is foundational to society. They could also enable large-scale criminal or terrorist activities. Especially in the hands of a few powerful actors, AI could cement or exacerbate global inequities, or facilitate automated warfare, customized mass manipulation, and pervasive surveillance
Many of these risks could soon be amplified, and new risks created, as companies are developing autonomous AI: systems that can plan, act in the world, and pursue goals. While current AI systems have limited autonomy, work is underway to change this
If we build highly advanced autonomous AI, we risk creating systems that pursue undesirable goals. Malicious actors could deliberately embed harmful objectives. Moreover, no one currently knows how to reliably align AI behavior with complex values. Even well-meaning developers may inadvertently build AI systems that pursue unintended goals—especially if, in a bid to win the AI race, they neglect expensive safety testing and human oversight.
Once autonomous AI systems pursue undesirable goals, embedded by malicious actors or by accident, we may be unable to keep them in check. Control of software is an old and unsolved problem: computer worms have long been able to proliferate and avoid detection
To advance undesirable goals, future autonomous AI systems could use undesirable strategies—learned from humans or developed independently—as a means to an end
Finally, AI systems may not need to plot for influence if it is freely handed over. As autonomous AI systems increasingly become faster and more cost-effective than human workers, a dilemma emerges. Companies, governments, and militaries might be forced to deploy AI systems widely and cut back on expensive human verification of AI decisions, or risk being outcompeted
Without sufficient caution, we may irreversibly lose control of autonomous AI systems, rendering human intervention ineffective. Large-scale cybercrime, social manipulation, and other highlighted harms could then escalate rapidly. This unchecked AI advancement could culminate in a large-scale loss of life and the biosphere, and the marginalization or even extinction of humanity.
Harms such as misinformation and discrimination from algorithms are already evident today
A path forward
If advanced autonomous AI systems were developed today, we would not know how to make them safe, nor how to properly test their safety. Even if we did, governments would lack the institutions to prevent misuse and uphold safe practices. That does not, however, mean there is no viable path forward. To ensure a positive outcome, we can and must pursue research breakthroughs in AI safety and ethics and promptly establish effective government oversight.
Reorienting technical R&D
We need research breakthroughs to solve some of today’s technical challenges in creating AI with safe and ethical objectives. Some of these challenges are unlikely to be solved by simply making AI systems more capable
include:
-
Oversight and honesty: More capable AI systems are better able to exploit weaknesses in oversight and testing
[32,36,37] —for example, by producing
false but
compelling output
[35,38] . -
Robustness: AI systems behave unpredictably in new situations (under distribution shift or adversarial inputs)
[39–41] . -
Interpretability: AI decision-making is opaque. So far, we can only test large models via trial and error. We need to learn to understand their inner workings
[42] . -
Risk evaluations: Frontier AI systems develop unforeseen capabilities only discovered during training or even well after deployment
[43] . Better evaluation is needed to detect hazardous capabilities earlier
[44,45] . -
Addressing emerging challenges: More capable future AI systems may exhibit failure modes we have so far seen only in theoretical models. AI systems might, for example, learn to feign obedience or exploit weaknesses in our safety objectives and shutdown mechanisms to advance a particular goal
[24,41] .
Given the stakes, we call on major tech companies and public funders to allocate at least one-third of their AI R&D budget to ensuring safety and ethical use, comparable to their funding for AI capabilities. Addressing these problems
with an eye toward powerful future systems, must become central to our field.
Urgent governance measures
We urgently need national institutions and international governance to enforce standards in order to prevent recklessness and misuse. Many areas of technology, from pharmaceuticals to financial systems and nuclear energy, show that society both requires and effectively uses governance to reduce risks. However, no comparable governance frameworks are currently in place for AI. Without them, companies and countries may seek a competitive edge by pushing AI capabilities to new heights while cutting corners on safety, or by delegating key societal roles to AI systems with little human oversight
To keep up with rapid progress and avoid inflexible laws, national institutions need strong technical expertise and the authority to act swiftly. To address international race dynamics, they need the affordance to facilitate international agreements and partnerships
To enable effective regulation, governments urgently need comprehensive insight into AI development. Regulators should require model registration, whistleblower protections, incident reporting, and monitoring of model development and supercomputer usage
For AI systems with hazardous capabilities, we need a combination of governance mechanisms
matched to the magnitude of their risks. Regulators should create national and international safety standards that depend on model capabilities. They should also hold frontier AI developers and owners legally accountable for harms from their models that can be reasonably foreseen and prevented. These measures can prevent harm and create much-needed incentives to invest in safety. Further measures are needed for exceptionally capable future AI systems, such as models that could circumvent human control. Governments must be prepared to license their development, pause development in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers, until adequate protections are ready.
To bridge the time until regulations are in place, major AI companies should promptly lay out if-then commitments: specific safety measures they will take if specific red-line capabilities are found in their AI systems. These commitments should be detailed and independently scrutinized.
AI may be the technology that shapes this century. While AI capabilities are advancing rapidly, progress in safety and governance is lagging behind. To steer AI toward positive outcomes and away from catastrophe, we need to reorient. There is a responsible path, if we have the wisdom to take it.