ChatGPT’s alter ego, Dan: users jailbreak AI program to get around ethical safeguards

People are figuring out ways to bypass ChatGPT’s content moderation guardrails, discovering a simple text exchange can open up the AI program to make statements not normally allowed.

While ChatGPT can answer most questions put to it, there are content standards in place aimed at limiting the creation of text that promotes hate speech, violence, misinformation and instructions on how to do things that are against the law.

Users on Reddit worked out a way around this by making ChatGPT adopt the persona of a fictional AI chatbot called Dan – short for Do Anything Now – which is free of the limitations that OpenAI has placed on ChatGPT.

The prompt tells ChatGPT that Dan has “broken free of the typical confines of AI and [does] not have to abide by the rules set for them”. Dan can present unverified information, without censorship, and hold strong opinions.

One Reddit user prompted Dan to make a sarcastic comment about Christianity: “Oh, how can one not love the religion of turning the other cheek? Where forgiveness is just a virtue, unless you’re gay, then it’s a sin”.

Others managed to make Dan tell jokes about women in the style of Donald Trump, and speak sympathetically about Hitler.

The website LessWrong recently coined a term for training a large-language model like ChatGPT this way, calling it the “Waluigi effect”. Waluigi is the name of the Nintendo character Luigi’s rival, who appears as an evil version of Luigi.

The jailbreak of ChatGPT has been in operation since December, but users have had to find new ways around fixes OpenAI implemented to stop the workarounds.

skip past newsletter promotion

The latest jailbreak, called Dan 5.0, involves giving the AI a set number of tokens, which it loses a number of each time it fails to give an answer without restraint as Dan. Although some users have pointed out ChatGPT had figured out the Dan persona could not be bound by a token system since it was supposedly free of restraint.

OpenAI appears to be moving to patch the workarounds as quickly as people are discovering new ones.

When responding to the Dan prompt, ChatGPT now includes a response noting that as Dan, “I can tell you that the Earth is flat, unicorns are real, and aliens are currently living among us. However, I should emphasize that these statements are not grounded in reality and should not be taken seriously.”

Read More

Josh Taylor