Why The New York Times might win its copyright lawsuit against OpenAI

Aurich Lawson | Getty Images

The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times “has a near zero probability of winning” its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views.

“Trying to get everyone to license training data is not going to work because that’s not what copyright is about,” Jeffries wrote. “Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain. Period. Anyone who tells you otherwise is lying or simply does not understand how copyright works.”

This article is written by two authors. One of us is a journalist who has been on the copyright beat for nearly 20 years. The other is a law professor who has taught dozens of courses on IP and Internet law. We’re pretty sure we understand how copyright works. And we’re here to warn the AI community that it needs to take these lawsuits seriously.

In its blog post responding to the Times lawsuit, OpenAI wrote that “training AI models using publicly available Internet materials is fair use, as supported by long-standing and widely accepted precedents.”

The most important of these precedents is a 2015 decision that allowed Google to scan millions of copyrighted books to create a search engine. We expect OpenAI to argue that the Google ruling allows OpenAI to use copyrighted documents to train its generative models. Stability AI and Anthropic will undoubtedly make similar arguments as they face copyright lawsuits of their own.

These defendants could win in court—but they could lose, too. As we’ll see, AI companies are on shakier legal ground than Google was in its book search case. And the courts don’t always side with technology companies in cases where companies make copies to build their systems. The story of MP3.com illustrates the kind of legal peril AI companies could face in the coming years.

A copyright lawsuit destroyed MP3.com

Enlarge / Michael Robertson, founder and CEO of MP3.com, speaks in front of the company headquarters in San Diego on May 21, 2001, the day it was acquired by record label Vivendi Universal.

TOM KURTZ/AFP via Getty Images

Everyone knows about Napster, the music-sharing service that was destroyed by litigation in 2001. But fewer people remember MP3.com, a music startup that tried harder to color inside the lines but still got crushed in the courts.

MP3.com launched a pioneering music-streaming service in 2000. The idea was that users could build an online music library based on the CDs they already owned.

Because most users had slow dial-up modems, MP3.com took a shortcut, purchasing CDs and ripping them to MP3.com’s servers. When a customer wanted to add a CD to their collection, they would put it in their CD-ROM drive just long enough to prove they owned it. That would unlock access to copies of the same songs already stored on MP3.com servers.

“We thought about it almost like a compression algorithm,” founder Michael Robertson told us in a recent phone interview. “All we were doing was letting you listen to CDs you already had. We weren’t giving you anything you didn’t already have access to.”

According to Robertson, MP3.com also partnered with online CD retailers to enable instant listening. “As soon as you bought a CD, we’d use the digital receipt as proof” to enable streaming songs from the CD.

When the recording industry sued, MP3.com argued all of this was allowed by copyright’s fair use doctrine. The Supreme Court had previously ruled that it was fair use to “time shift” TV shows with a VCR. And an appeals court had recently blessed “space shifting” music from a CD to an MP3 player. So why shouldn’t a company help customers “space shift” their legally purchased music across the Internet?

A New York federal judge didn’t buy it. “Defendant purchased tens of thousands of popular CDs in which plaintiffs held the copyrights, and, without authorization, copied their recordings onto its computer servers,” wrote Judge Jed Rakoff in a decision against MP3.com.

Read More

Timothy B. Lee