AP gives OpenAI access to its vast archive of news stories

OpenAI, a pioneer in the field of artificial intelligence (AI), has signed a license agreement with The Associated Press (AP) to use the AP’s massive text story archive to train its AI systems. OpenAI will have access to the AP’s text story archive, which dates back to 1985, thanks to this groundbreaking agreement.

The length of the two-year deal between OpenAI and AP is unknown, nor is the purchase price. As part of the agreement, the AP will provide OpenAI with access to its library of text stories that can be used to train its AI systems. In exchange, OpenAI will give the AP access to its technology, which it can use to test out ways to enhance the quality of its reporting.

Some local sports coverage and financial earnings reports at the AP have been generated automatically for years. However, “generative” technology (chatbots like ChatGPT) are not used by the AP in their reporting process.

There is a growing controversy in the tech world over the use of copyrighted content for AI training. The “large language models” that fuel chatbots at OpenAI, Google, and other AI companies were trained on billions of sentences culled from the public internet. These models incorporate third-party content such as news articles, Wikipedia entries, social media comments, and blog posts without the authors’ knowledge or consent.

An increasing number of writers, musicians, news outlets, and social media companies are speaking out against the use of their work to train AI, claiming that this represents a fundamental change in the nature of the internet. They claim that artificial intelligence tools are already being used to replace humans by learning from human-created content. Since then, a flood of lawsuits have been filed against the industry, including class-action suits against OpenAI and Google and individual suits brought against OpenAI by comedian Sarah Silverman and two prominent fiction authors.

The FTC has also launched an investigation into OpenAI’s use of customer data for model training. The FTC has requested documentation detailing OpenAI’s efforts to mitigate potential threats to the security of its AI models and has requested an explanation of the company’s efforts to improve the accuracy of its language models, which have been shown to “hallucinate” when asked a question to which they do not know the answer.

Because chatbots like ChatGPT are trained on a static set of data, they are unable to accommodate new information without being retrained. This makes them less reliable sources of up-to-date information and breaking news. To address this issue, some tech firms have implemented systems that let chatbots conduct their own web searches or get answers from a separate, constantly updated database.

The agreement between the two parties only grants OpenAI access to the AP’s archive, but that archive is regularly updated with breaking news. Tech firms have been known to pay for news articles in the past, but usually for a different purpose. Both Google and Facebook, in some countries, pay news organizations for the right to display their content directly on their platforms. Australia’s government has passed a law mandating the procedure, and a bill in Canada would do the same.

Both OpenAI and AP have expressed optimism about the development of AI, stating that they “believe in the responsible creation and use of these AI systems.” Although the AP has used AI for nearly a decade in various capacities, including to automate corporate earnings reports and to recap some sporting events, it has stated that it does not currently use any generative AI in its news stories.

It also recently launched an AI-powered image archive search and manages a program that assists local news organizations in integrating AI into their operations. The Associated Press strongly endorses a system that will safeguard intellectual property and guarantee that authors are paid fairly for their efforts.

University of Miami intellectual property law professor Andres Sawicki said, “The data sets include a lot of content that is copyrighted. The copyright holders do not approve of these exploitations. It’s not hard to conceive of more deals like the AP one being made between tech firms and content producers in an effort to build a “clean database.” The problem is that the data sets needed to train the models are so massive that I doubt it will be possible to secure permission from a sufficient number of owners to make the technology practical.

In conclusion, the agreement between OpenAI and AP is a watershed moment that may encourage other tech firms to pay for content to train their AI algorithms. It also highlights the growing urgency of resolving the issue of content compensation in the artificial intelligence sector. OpenAI, the Associated Press, and other thought leaders are committed to ensuring that AI systems are developed and used responsibly even as the debate rages on.

First reported on The Washington Post

Frequently Asked Questions

Q: What is the agreement between OpenAI and The Associated Press (AP)?

A: OpenAI has signed a license agreement with The Associated Press to use the AP’s massive text story archive for training its AI systems.

Q: How long is the agreement between OpenAI and AP?

A: The length of the two-year deal between OpenAI and AP is unknown.

Q: What content will OpenAI have access to through the agreement?

A: OpenAI will have access to the AP’s library of text stories, which includes news articles and other textual content.

Q: What does the AP gain from the agreement?

A: In exchange for access to the AP’s text story archive, OpenAI will provide the AP with access to its technology. The AP can use this technology to explore ways to enhance the quality of its reporting.

Q: Does the AP use generative AI technology in its reporting?

A: No, the AP does not currently use generative AI technology, such as chatbots, in its news stories. However, it has utilized automation technology for tasks like generating some local sports coverage and financial earnings reports.

Q: What is the controversy surrounding the use of copyrighted content for AI training?

A: There is a growing controversy over the use of copyrighted content, such as news articles, for training AI models. Critics argue that incorporating third-party content without the authors’ knowledge or consent represents a significant change in the nature of the internet and raises concerns about intellectual property rights.

Q: Who has spoken out against the use of their work to train AI?

A: Writers, musicians, news outlets, and social media companies have expressed concerns about their work being used to train AI models without their permission. Some of them have filed lawsuits, including class-action suits against OpenAI and Google, as well as individual suits brought against OpenAI by comedian Sarah Silverman and two fiction authors.

Q: What investigation has the FTC launched regarding OpenAI?

A: The FTC has launched an investigation into OpenAI’s use of customer data for model training. They are particularly interested in understanding OpenAI’s efforts to mitigate potential threats to the security of its AI models and improve the accuracy of its language models.

Featured Image Credit: Unsplash

Brad Anderson

Editor In Chief at ReadWrite

Brad is the editor overseeing contributed content at ReadWrite.com. He previously worked as an editor at PayPal and Crunchbase. You can reach him at brad at readwrite.com.