Why Mastering Language Is So Difficult for AI

The field of artificial intelligence has never lacked for hype. Back in 1965, AI pioneer Herb Simon declared, “Machines will be capable, within 20 years, of doing any work a man can do.” That hasn’t happened — but there certainly have been noteworthy advances, especially with the rise of “deep learning” systems, in which programs plow through massive data sets looking for patterns, and then try to make predictions. Perhaps most famously, AIs that use deep learning can now beat the best human Go players (some years after computers bested humans at chess and Jeopardy).

Mastering language has proven tougher, but a program called GPT-3, developed by OpenAI, can produce human-like text, including poetry and prose, in response to prompts. Deep learning systems are also getting better and better at recognizing faces, and recognizing images in general. And they have contributed to the software behind self-driving vehicles, in which the automobile industry has been investing billions.

“Rebooting AI: Building Artificial Intelligence We Can Trust” by Gary Marcus and Ernest Davis (Pantheon 288 pages).

But scientist, author, and entrepreneur Gary Marcus, who has had a front-row seat for many of these developments, says we need to take these advances with a grain of salt. Marcus, who earned his Ph.D. in brain and cognitive sciences from MIT and is now a professor emeritus at New York University, says the field of AI has been over-reliant on deep learning, which he believes has inherent limitations. We’ll get further, he says, by using not only deep learning but also more traditional symbol-based approaches to AI, in which computers encode human knowledge through symbolic representations (which in fact was the dominant approach during the early decades of AI research).

Marcus believes that hybrid approaches, combining techniques from both methods, may be the most promising path toward the kind of “artificial general intelligence” that Simon and other AI pioneers imagined was just over the horizon. Marcus’s most recent book is “Rebooting AI: Building Artificial Intelligence We Can Trust” (Pantheon, 2019), co-authored with Ernest Davis, a professor of computer science at NYU.

Undark recently caught up with Marcus for an interview conducted by Zoom and email. The interview has been edited for length and clarity.

Undark: Let’s start with GPT-3, a language model that uses deep learning to produce human-like text. The New York Times Magazine said GPT-3 writes “with mind-boggling fluency,” while a story in Wired said the program was “provoking chills across Silicon Valley.” However, you’ve been quite critical of GPT-3. How come?

Gary Marcus: I think it’s an interesting experiment. But I think that people are led to believe that this system actually understands human language, which it certainly does not. What it really is, is an autocomplete system that predicts next words and sentences. Just like with your phone, where you type in something and it continues. It doesn’t really understand the world around it. And a lot of people are confused by that.

They’re confused by that because what these systems are ultimately doing is mimicry. They’re mimicking vast databases of text. And I think the average person doesn’t understand the difference between mimicking 100 words, 1,000 words, a billion words, a trillion words — when you start approaching a trillion words, almost anything you can think of is already talked about there. And so when you’re mimicking something, you can do that to a high degree, but it’s still kind of like being a parrot, or a plagiarist, or something like that. A parrot’s not a bad metaphor, because we don’t think parrots actually understand what they’re talking about. And GPT-3 certainly does not understand what it’s talking about.

UD: You’ve written that GPT-3 can get confused about very basic facts. I suppose if you ask it who the president of the United States is, it may be almost as likely to say Donald Trump as Joe Biden — just because it is, as you say, mimicking. I suppose in some sense it doesn’t really know that it’s currently 2022?

GM: It may even be more likely to mention Donald Trump as president, because probably the database that it is trained on has more examples of Trump. He’s in the news more; he was in the news for longer; he was in office for longer. He continues to be in the news more than your average ex-president might be. And yes, the system does not understand what year we live in. And it has no facility for temporal reasoning. You know, as a function of temporal reasoning, that just because you were president doesn’t mean you’re president anymore. Just because you were alive doesn’t mean that you’re still alive. You can reason that Thomas Edison cannot be president anymore because he is dead; GPT-3 cannot make that inference. It’s astonishingly dumb in that regard.

UD: In spite of these AI systems being dumb, as you put it, people are often fooled into thinking that they’re smart. This seems to be related to what you’ve called the “gullibility gap.” What is the gullibility gap?

GM: It’s the gap between our understanding of what these machines do and what they actually do. We tend to over-attribute to them; we tend to think that machines are more clever than they actually are. Someday, they really will be clever, but right now they’re not. And you go back to 1965: A system called ELIZA did very simple keyword-matching and had no idea what it was talking about. But it fooled some people into discussing their private lives with it. It was couched as a therapist. And it was via teletype, which is sort of like text messaging. And people were taken in; they thought they were talking to a living person.

And the same thing is happening with GPT-3, and with Google’s LaMDA, where a Google engineer actually thought, or alleged, that the system was sentient. It’s not sentient, it has no idea of the things that it is talking about. But the human mind sees something that looks human-like, and it races to conclusions. That’s what the gullibility is about. We’re not evolved nor trained to recognize those things.

UD: Many readers will be familiar with the Turing Test, based on an idea put forward by computer pioneer Alan Turing in 1950. Roughly, you ask an unseen entity a series of questions, and if that entity is a computer, but you can’t tell it’s a computer, then it “passes” the test; we might say that it’s intelligent. And it’s often in the news. For example, in 2014, a chatbot called Eugene Goostman, under certain criteria, was said to have passed the test. But you’ve been critical of the Turing Test. Where does it fall short?

GM: The Turing Test has a kind of incumbency: It’s been around the longest; it’s the longest-known measure of intelligence within AI — but that doesn’t make it very good. You know, in 1950, we didn’t really know much about AI. I still think we don’t know that much. But we know a lot more. The idea was basically, if you talk to a machine, and it tricks you into thinking that it’s a person when it’s not, then that must be telling you something. But it turns out, it’s very easily gamed. First of all, you can fool a person by pretending to be paranoid or pretending to be a 13-year-old boy from Odessa, as Eugene Goostman did. And so, you just sidestep a lot of the questions. So a lot of the engineering that has gone into beating the Turing test is really about playing games and not actually about building genuinely intelligent systems.

UD: Let’s talk about driverless cars. A few years ago, it seemed like great progress was happening, and then things seem to have slowed down. For example, where I live, in Toronto, there are no self-driving taxis whatsoever. So what happened?

GM: Just as GPT-3 doesn’t really understand language, merely memorizing a lot of traffic situations that you’ve seen doesn’t convey what you really need to understand about the world in order to drive well. And so, what people have been trying to do is to collect more and more data. But they’re only making small incremental progress doing that. And as you say, there aren’t fleets of self-driving taxis in Toronto, and there certainly aren’t fleets in Mumbai. Most of this work right now is done in places with good weather and reasonably organized traffic, that’s not as chaotic. The current systems, if you put them in Mumbai, wouldn’t even understand what a rickshaw is. So they’d be in real trouble, from square one.

UD: You pointed out in Scientific American recently that most of the large teams of AI researchers are found not in academia but in corporations. Why is that relevant?

GM: For a bunch of reasons. One is that corporations have their own incentives about what problems they want to solve. For example, they want to solve advertisements. That’s not the same as understanding natural language for the purpose of improving medicine. So there’s an incentive issue. There’s a power issue. They can afford to hire many of the best people, but they don’t necessarily apply those to the problems that would most benefit society. There is a data problem, in that they have a lot of proprietary data they don’t necessarily share, which is again not for the greatest good. That means that the fruits of current AI are in the hands of corporations rather than the general public; that they’re tailored to the needs of the corporations rather than the general public.

UD: But they rely on the general public because it’s ordinary citizens’ data that they’re using to build their databases, right? It’s humans who have tagged a billion photos that help them train their AI systems.

GM: That’s right. And that particular point is coming to a head, even as we speak, with respect to art. So systems like OpenAI’s DALL-E are drawing pretty excellent imagery, but they’re doing it based on millions or billions of human-made images. And the humans aren’t getting paid for it. And so a lot of artists are rightfully concerned about this. And there’s a controversy about it. I think the issues there are complex, but there’s no question that a lot of AI right now leverages the not-necessarily-intended contributions by human beings, who have maybe signed off on a “terms of service’’ agreement, but don’t recognize where this is all leading to.

UD: You wrote in Nautilus recently that for the first time in 40 years, you feel optimistic about AI. Where are you drawing that optimism from, at the moment?

GM: People are finally daring to step out of the deep-learning orthodoxy, and finally willing to consider “hybrid” models that put deep learning together with more classical approaches to AI. The more the different sides start to throw down their rhetorical arms and start working together, the better.