The New York Times is making it clear that the AI industry won’t be given free rein to pilfer the newspaper’s content to train algorithms. In a recent change to its Terms of Service policy, the Times has explicitly forbidden the use of its vast media archives for the purposes of training “any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”
Can AI Help with Mental Health?
The paper’s TOS change originally occurred on August 3rd and was first noted by Adweek. The policy applies to the Times’ text content, photos, video, and metadata, and explicitly forbids companies’ web crawlers from accessing that data to train proprietary products.
One of the most controversial aspects of the artificial intelligence industry is the question of where AI companies can or should get their data. Training AI systems takes an immense amount of data (as well as a frightening amount of computing power), and companies like OpenAI—which have built their data troves largely by scraping the open (or not so open) internet—have already run into legal problems as a result of these practices. Case in point: OpenAI is currently facing a bevy of lawsuits from people and organizations that say the company stole their data and then repackaged and monetized it.
Still, a number of big AI vendors have recently made it known that they’re doubling down on web scraping. Google, for instance, recently announced that it would continue scraping the web unless forced not to. The Times’ decision to forbid free usage of its media archive for such purposes shows that the newspaper understands the value of its data—and doesn’t intend to give it away for free, setting the stage for potential legal challenges.
The Times’ TOS change also comes at an interesting juncture in the evolving relationship between the news media and the emergent AI industry.
Lately, AI companies have aggressively courted newspapers and media organizations, in an obvious attempt to normalize the use of AI tools in news-curation and content creation. This makes sense, since one of the big projected markets for AI is digital media. AI companies are doing their best to establish a foothold in a wary news market, and one of the ways they’ve been doing that is by offering free services and partnerships. Case in point: not long ago, Google ominously approached a number of legacy news organizations—including the Times and the Washington Post—in an effort to sell them on a new AI tool, dubbed “Genesis,” that it claimed would “help” journalists.
At the same time, AI companies seem to smell another kind of opportunity when it comes to partnering with the news media. Since indiscriminate web scraping has proven to be a controversial practice, AI companies are currently searching for new, legally safer methods to attain the data they need to make their products run properly. One of those methods appears to be partnering with news organizations and offering up free automation services in exchange for access to the papers’ vast text archives. Recently, the Associated Press made a deal with OpenAI that would allow the artificial intelligence startup to access and use the AP’s text archives. In exchange, OpenAI offered the AP access to “OpenAI’s technology and product expertise.”
In short: the AI industry really wants to achieve integration with the news. Given the unknown dangers that such a collaboration may involve, however, it may make more sense for media outlets to continue playing hard ball for the time being.