Earlier this year, the Future of Life Institute and the Center for AI Safety published open letters that promoted existential risks (x-risks) from AI as a global priority. In July, Google Research fellow Blaise Aguera y Arcas and MILA-affiliated AI researchers Blake Richards, Dhanya Sridhar, and Guillaume Lajoie co-wrote an op-ed in Noema criticizing these letters. In a nutshell, their argument is:
- Rogue AI is unlikely in the near future
- Human attention is limited
- Promoting x-risks from AI as a global priority is likely to:
- Cause overbearing regulation that limits beneficial AI
- Obscure current harms from AI
In September, I met with the three MILA-affiliated researchers to better understand their position because I believe that the AI safety community would benefit from understanding the intricacies of their criticisms of the open letters in order to improve the way we communicate.
This post should be read as an invitation to consider the thoughts of people with a different worldview who are skeptical of efforts to position x-risks from AI as a global priority. In particular, their skepticism is not synonymous with a denial of these risks.
The importance of defining terms
From the get-go, the interviewees indicated their preference for the term “rogue AI takeover” instead of “x-risks from AI” because they were weary of a tendency to move the goalpost. Sridhar mentions that she has seen motte-and-bailey tactics regarding which agent is “the seed of chaos”, where an interlocutor would introduce the idea of an agentic AI with its own goals, and would then switch the agent to a human organization using a powerful AI.
When discussing this with people who take existential risk very seriously, […] when I articulate the reasons why I personally think that an existential risk [in the form of extinction] is not really a great concern, what typically happens is that the person in the conversation will bring up other risks that are not existential, but which are very bad, and effectively imply that I am denying those risks by suggesting that existential risks [are] unlikely. […] I’m [not denying] that AI could lead to all sorts of gravely negative harms in our society in the future.
This suggests that we might want to define our terms and scenarios more rigorously at the start of such conversations to avoid the use of motte-and-baileys.
Who is the AI safety community?
Determining who gets included in the AI safety community can be a challenge. Lajoie thinks of the general community of people that make AI do what humans want as the alignment community, a subgroup of which concentrate on x-risk. The former includes people working on narrow AI alignment at big labs as well as people working in AI ethics. Sridhar fears that the open letters might push people, priorities and funding too far from the former group towards the latter. She sees the latter group’s focus as too mathematical, at the cost of a more holistic view embraced by the former, for instance emphasizing rogue AI over misuse. Richards remarks that the latter group seems uninterested in listening to researchers who wish to address risk from a different perspective, such as the social science inspired perspective of AI ethicists.
The perception that work on x-risk is mostly technical work surprised me since I routinely see organizations that have emphasized x-risks mention the need for governance to address both rogue AI and misuse scenarios. This hints that we might want to emphasize, both in press releases and in discussion with AI researchers, the importance of sociotechnical research into risks from misuse.
Making x-risk a global priority
These researchers are not opposed to research on x-risk from AI, but were bothered by seeing people with well-respected voices in the AI community calling for it to be a global priority.
Although I think it’s great if someone is working on it, and if there are conferences about it, people talking about it, and even funding bodies devoted to it […],that’s a very different scenario from being a global priority in the sense that global warming is, where everyone gets involved, where all the world’s governments contribute massive funds to it, where it’s something that all researchers need to talk about and think about […] and where our regulation really get shaped by this as one of the prime concerns.
Sridhar highlights the opportunity cost of AI regulation oriented towards avoiding x-risks, especially considering that there is no consensus on whether regulation to pause or restrict the development of frontier models is robustly good for AI safety. She contrasts this with, for instance, regulation on lethal autonomous weapons, the risks of which are much better understood and easier to act upon in robustly beneficial ways.
They believe that due to the extreme risks involved, there is a danger that exposing decision makers to x-risks will overwhelm them, leading them to ignore other issues. Richards mentioned:
Politicians have a very limited attention span, and they are only going to hold a small number of key items from [a] discussion [with people concerned with risks from AI]. If you give them a long laundry list of things that might happen with AI, they’re going to retain just a few of those, and if one of those is the potential extinction from a rogue AI, I guarantee you they will remember that one. That will therefore occupy a slot that could have gone to “current models show bias and you should not use them in mortgage calculations”.
Additionally, they posit that by emphasizing the capabilities of future models in our warnings, we are inadvertently suggesting that current models are on the verge of reaching human-level capabilities. On the one hand, this may lead to some of the companies that use these models overestimating their capabilities and making ill-advised use of them. On the other hand, this de-emphasizes failures of current models. Together, they believe this means that when harm comes from these ill-advised uses, people who have emphasized x-risks are partially responsible for that harm.
I don’t think the AI safety community wishes to hype existing models or hide their flaws. This suggests that we might want to add a caveat on the limitations of current models when communicating about the x-risks posed by future models.
I asked the researchers to share their approximate timelines for transformative AI. They gave two reasons for being hesitant to answer.
- Sridhar felt like aggregating the guesses of researchers such as in the AI Impacts survey does not represent reliable data and the result might be extrapolated beyond its relevance.
- Building on this, Richards argues that there is a tendency for people concerned about x-risks to quantify their beliefs with numbers with confidence intervals that are pulled out of thin air, make calculations with these unsubstantiated numbers, and arrive at a result that they don’t actually believe but assume are true since there are numbers involved. 
Richards uses the analogy of a paleontologist who is trying to determine the color of a long-extinct dinosaur species. The evidence that could be used to answer this question is sparse and tenuous, and the honest thing to do would be to admit that she does not know. If instead she conducted a poll and asked colleagues to place a probability with confidence intervals on different colors, this would give the result a veneer of credibility, but would not change the fact that this community has no significant insight into the answer.
While we could debate the appropriateness of such an analogy, it might be worth keeping these criticisms in mind when we discuss aggregated predictions of highly uncertain phenomena.
Richards expressed that he thought people who recently became concerned about x-risks from AI might be over-updating based on their surprise at the capabilities of frontier models. Without committing to a specific timeline, he expects a Drexler-like CAIS scenario to develop before agentic AI. He foresees robotics advancing at a much slower pace than in silicon AI, and this delay might limit the ability of AIs to learn from the physical world. Lajoie believes that models will generally improve but will keep failing on very specific tasks in a way that keeps these models unable to match humans’ robustness and generality for some time.
Long-term, uncertainty, and effectiveness of mitigation
In order to better understand the short-term vs long-term aspects of their criticisms, I presented an analogous situation and I asked them to explain if and why their opinion would differ from their position on x-risk on the following hypothetical argument:
Climate change is causing immediate harms, we should concentrate on these harms instead of preventing longer-term risks by reducing our greenhouse gas (GHG) emissions in the next few years.
They thought the situations differed for two reasons.
- GHG emissions are (from their point of view) much better understood and the current level of emissions is virtually guaranteed to cause substantial harm in the future in a way that AI is not.
- Sridhar points out that we have a very good understanding of how to fix climate change: reduce global GHG emissions. Contrast this to the case with AI, where there is serious disagreement about whether a pause would robustly help with safety.
Richards perceives we are in a similar position regarding AI as climate scientists were in the 1970s, when the data was not yet clear. If he felt like the current situation with AI were more analogous to climate science at the turn of the millennium, he would switch to making AI safety a priority. While limiting GHG emissions earlier would have made our current transition easier, Richards counters that if we had limited industrial development too early, we would have greatly limited the benefits of industrialization, just like limiting AI today would curtail its benefits.
In terms of cooperation moving forward, they suggest petitioning politicians with the concrete, robustly beneficial policies that both the ethics and x-risk communities endorse, for instance:
- Demand auditing of AI systems
- Demand interpretable AI systems
- Ban the development of lethal autonomous weapons
Richards thinks that amplifying the more moderate voices on both sides would make collaboration easier. If the CAIS statement had said “Extinction from AI is something that researchers need to consider”, he would have been on board.
By toning down the statements, we may be able to build broader coalitions. On the other hand, diluting the message might not be the best strategy to enable effective action directed towards preventing x-risks. Each organization that petitions policymakers or the public must decide for itself if this tradeoff is worth it.
Sincere thanks to Blake Richards, Dhanya Sridhar, and Guillaume Lajoie for taking the time to engage with me.
The authors also published a shorter version of their op-ed in The Economist.
They clarified that while they are sympathetic to the positions of AI ethicists in many ways, they are not ethicists themselves and their opinions might not reflect those of other people critical of x-risks from AI.
They actually encourage this research and think it should be part of the broader conversation. They would be open to what they call a “well-balanced pitch” of short-term, medium-term and long-term risks.
They believe that the AI ethics community has often been too focused on the models’ failures and have been unwilling to admit when some models have clearly evolved new capabilities.
Although they did not elaborate on this, I thought that the researchers might not agree with, for instance, this representation.
Richards has seen this tendency in conversations with mainstream AI researchers that have recently endorsed x-risks, and attributes it to their adoption of rationalist literature. For instance, he is dubious that anybody can predict complex social phenomena with a 95% confidence interval.
Richards calls this “science theater” in an analogy to security theater.
In his report on existential risk, Carlsmith calculates a probability by conjunction and feels that he must amend it afterwards, mentioning “I’d probably bump this up a bit—maybe by a percentage point or two, though this is especially unprincipled […] — to account for power-seeking scenarios that don’t strictly fit all the premises above”. This suggests a reluctance to directly trust the result of such calculations when this result conflicts with our intuitions.
One could argue that there is a lot of debate over the best way to reduce GHG emissions, but there is agreement about the direction of what must be done, which is not the case for restricting AI development.