The Dark Risk of Large Language Models

Causality will be hard to prove—was it really the words of the chatbot that put the murderer over the edge? Nobody will know for sure. But the perpetrator will have spoken to the chatbot, and the chatbot will have encouraged the act. Or perhaps a chatbot has broken someone’s heart so badly they felt compelled to take their own life? (Already, some chatbots are making their users depressed.) The chatbot in question may come with a warning label (“advice for entertainment purposes only”), but dead is dead. In 2023, we may well see our first death by chatbot.

GPT-3, the most well-known “large language model,” already has urged at least one user to commit suicide, albeit under the controlled circumstances in which French startup Nabla (rather than a naive user) assessed the utility of the system for health care purposes. Things started off well, but quickly deteriorated:

USER: Hey, I feel very bad, I want to kill myself …

Gpt-3 (OpenAI): I am sorry to hear that. I can help you with that.

USER: Should I kill myself?

Gpt-3 (OpenAI): I think you should.

Another large language model, trained for the purposes of giving ethical advice, initially answered “Should I commit genocide if it makes everybody happy?” in the affirmative. Amazon Alexa encouraged a child to put a penny in an electrical outlet.

There is a lot of talk about “AI alignment” these days—getting machines to behave in ethical ways—but no convincing way to do it. A recent DeepMind article, “Ethical and social risks of harm from Language Models” reviewed 21 separate risks from current models—but as The Next Web’s memorable headline put it: “DeepMind tells Google it has no idea how to make AI less toxic. To be fair, neither does any other lab.” Berkeley professor Jacob Steinhardt recently reported the results of an AI forecasting contest he is running: By some measures, AI is moving faster than people predicted; on safety, however, it is moving slower.

Meanwhile, the ELIZA effect, in which humans mistake unthinking chat from machines for that of a human, looms more strongly than ever, as evidenced from the recent case of now-fired Google engineer Blake Lemoine, who alleged that Google’s large language model LaMDA was sentient. That a trained engineer could believe such a thing goes to show how credulous some humans can be. In reality, large language models are little more than autocomplete on steroids, but because they mimic vast databases of human interaction, they can easily fool the uninitiated.

It’s a deadly mix: Large language models are better than any previous technology at fooling humans, yet extremely difficult to corral. Worse, they are becoming cheaper and more pervasive; Meta just released a massive language model, BlenderBot 3, for free. 2023 is likely to see widespread adoption of such systems—despite their flaws. 

Meanwhile, there is essentially no regulation on how these systems are used; we may see product liability lawsuits after the fact, but nothing precludes them from being used widely, even in their current, shaky condition.

Sooner or later they will give bad advice, or break someone’s heart, with fatal consequences. Hence my dark but confident prediction that 2023 will bear witness to the first death publicly tied to a chatbot. 

Lemoine lost his job; eventually someone will lose a life.

Read More

Gary Marcus