Content Farms Are Using AI Chatbots to Plagiarize News Outlets

Online content farms are using AI chatbots to “scramble and rewrite” thousands of news stories from major publications like The New York Times and republish them to earn advertising revenue, according to a new report from misinformation monitor NewsGuard. The stories, which often repurposed entire lines directly from other articles without credit, were found on 37 different sites. In some cases, NewsGuard notes, those sites appeared to be completely automated, no humans involved.

Why is Everyone Suing AI Companies? | Future Tech

NewsGuard, which makes a browser extension rating the trustworthiness of news websites, says the content farms it identified used chatbots to rewrite stories first published in CNN, Reuters, and other mainstream outlets. That explicit reliance on the text of already edited and published stories means the quality of the writing in the plagiarized AI articles marked an improvement from past cases where content mills simply instructed AI models to generate stories without any source material. The result, NewsGuard said, were articles that would appear nearly indistinguishable from an authentic story to the average reader.

The report identified 37 sites repurposing news stories, but NewsGuard says the actual number could be much, much higher. NewsGuard was only able to identify the sites in question because each of them featured at least one article with a common chatbot error message, like “As an AI model I cannot rewrite this title.” But other sites that take a moment to remove those telltale signs could go totally undetected.

“There are likely hundreds—if not thousands —of websites that are using AI to lift content from outside sources that NewsGuard could not identify because they have not mistakenly published an AI error message,” NewsGuard wrote. Gizmodo could not independently verify the 37 sites identified. NewsGuard didn’t immediately respond to our request for further comment.

These sites varied widely in terms of subject matter, with some focused on science and space and others on sports and politics or breaking news. Several of the websites featured names like DailyHeadliner.com and TalkGlitz.com. One of the sites, called WhatsNew2Day.com, appeared to have written an article about AI based on a June 21 article in The Verge, ironically, about ads running against AI-generated news stories.

In many cases, these plagiarized articles are being used to generate advertising revenue from major brands. NewGuard claims it discovered programmatic ads from 55 blue-chip companies running on 15 of the 37 sites analyzed. That means brands, knowingly or not, are directly funding these works of AI plagiarism. NewsGuard did not respond to Gizmodo’s request for comment seeking the names of the blue-chip brands identified.

“Because the programmatic ad process—which uses algorithms to deliver highly targeted ads to users on the internet—is so opaque, the advertising brands likely have no idea that they are funding the proliferation of these AI copycat sites,” NewsGuard added.

It’s unclear exactly what AI models were used to create these plagiarized works, but Gizmodo confirmed it can be easily accomplished using the most popular tools available from Google and OpenAI. In a test, Gizmodo asked Google Bard to rewrite this recent Gizmodo story about a near collision in the airline industry to be more SEO-friendly. Bard quickly responded by saying “Sure, here is the rewritten news article” before providing a shortened 258-word story. NewsGuard found similar results when it asked ChatGPT to rewrite a New York Times article.

Both OpenAI and Google have policies prohibiting users from using their models to engage in plagiarism or aid in the “Misrepresentation of the provenance of generated content.” But those policies, for now, feel like mere suggestions. Neither OpenAI nor Google responded to Gizmodo’s requests for comment.

News industry grapples with AI

News aggregation and content mills aren’t new and far predate the current wave of hyped-up, quickly developing large language models like OpenAI’s ChatGPT and Google’s Bard. Still, the speed with which these models can recreate stories, usually in a matter of seconds, means bad actors looking to quickly fill up sites with copied content can generate hundreds or thousands or articles in a day, all potentially sucking up some advertising revenue.

Traditional news publishers, meanwhile, are grappling with the impact AI will have on newsrooms. Tech publications like CNET, and others, have been caught using AI to generate articles without clearly explaining how they are created. Some, like Insider, have begun working with AI tools to brainstorm story ideas and propose interview questions. Not everyone is onboard the AI news train though. Last week, the Associated Press said any output generated by an AI should be “treated as unvetted source material.” Even the best AI models are known to hallucinate facts and are almost certainly trained on copyrighted material, making them a nightmare for ethical journalism.

The New York Times pushed back against AI companies earlier this month by altering its Terms of Service policies to explicitly forbid ​​ companies from using its archives to train machine learning or artificial intelligence systems. Now, it appears the Times may be taking OpenAI to court over the data scraping issue.

Read More

Mack DeGeurin