A notorious GPT-3-generated blog shows AI still can’t imitate human writing

So you’re interested in AI? Then join our online event, TNW2020, where you’ll hear how artificial intelligence is transforming industries and businesses.

Last week, many tech publications broke news about a blog generated by artificial intelligence that fooled thousands of users and landed on top of the Hacker News forum. GPT-3, the massive language model developed by AI research lab OpenAI, had written the articles.

Since its release in July, GPT-3 has caused a lot of excitement in the AI community. Developers who have received early access to the language model have used to do many interesting things, showing just how far AI research has come.

But like many other developments in AI, there’s also a lot of hype and misunderstanding surrounding GPT-3, and many of the stories published about it misrepresent its capabilities. The blog written by GPT-3 resurfaced worries about fake news onslaughts, robots deceiving humans, and technological unemployment, which have become the hallmark of AI reporting.

I decided to take a deep look at the blog and the excitement surrounding it, and my findings were troubling. But the problems I found were mostly with humans, not GPT-3.

The AI-generated blog

Screenshot of Adolos, a blog written by GPT-3

In case you haven’t read the stories, a computer science student at the University of California, Berkeley, set up a blog on Substack under the pseudonym Adolos. OpenAI has currently made GPT-3 available to a limited audience of developers, and Liam Porr, the student, was not one of them. So he asked a Ph.D. student who already had access to the AI to run his queries on GPT-3.

Basically, Porr gave a headline and intro for the post, and GPT-3 returned a full article. He chose the best of several outputs of the AI model and copy-pasted it into his blog with very little editing.

The first post, titled, “Feeling unproductive? Maybe you should stop overthinking” reached the number one spot on Hacker News with nearly 200 upvotes and more than 70 comments. In one week, the blog reached 26,000 views and acquired 60 subscribers. According to Porr, very few people had pointed out that the blog might have been written by AI.

Porr ended the experiment with a confession and some speculation on how GPT-3 could change the future of writing.

Poor AI reporting

Naturally, such a setting is an attractive subject for sensational articles. And the media did not disappoint. I looked at the reporting of several reputable tech publications. The one thing they all used in their headlines was the term “fake blog.”

Tech media referred to the AI-generated blog as “fake blog.”

The word “fake” is vague to start with. We use it loosely to refer to counterfeit products (fake Nike shoes) or forgery (a fake passport). It can also mean pretension (faking surprise) or impersonation and involve some form of trickery or deception (fake news).

For instance, during the runup to the presidential elections, a group of Macedonian youth set up what looked like real American news websites and used them to spread fake news articles with false information and sensational headlines. The articles tricked users on social media to click on their links and promote them, generating revenue for the sites’ owners and wreaking havoc during the U.S. elections.

Looking at Porr’s blog, I couldn’t see how the definition “fake” could apply to the blog. The author was not spreading misinformation. He wasn’t trying to influence public opinion by giving a false narrative of events. And he never mentioned the word “fake” in his own account of the events.

The author used the penname Adolos, which is clearly a pseudonym or at the very least an incomplete name. Using a pen name is a known and accepted practice among bloggers. There’s nothing wrong with it as long as you’re not using it for ulterior motives or to cause harm to other people. So, I wouldn’t count that as an argument for calling the blog fake.

Also, the fact that an AI helped write the articles didn’t make them fake. It did make them different from human writing, but not fake. I think the term “AI-written” or “AI-generated” would have been more precise.

But then again, the term “fake” is also very subjective. For instance, there isn’t a consensus on what fake news is today. Perhaps the writers of those news stories have their own reasons for calling the AI-generated blog fake. But in the same vein, I could call their stories “fake” for misleading their audience about the capabilities of GPT-3.

But one thing is for sure. Putting “fake blog” in the title will generate a lot of clicks and revenue for ad-driven media outlets, especially as the sensitivity is at an all-time high before the 2020 U.S. presidential elections.

The Hacker News post

The media used the AI-generated blog’s popularity on tech-focused Hacker News forum as a measure that GPT-3 had managed to fool readers that a human had written the posts.

A post written by GPT-3 made it to the top of the Hacker News forum

The post has indeed received 198 points and 71 comments. “What most commenters didn’t realize: The post was generated entirely by artificial intelligence,” Business Insider wrote.

But a closer look at discussion paints a clearer picture of why the post performed very well. There are 22 comment threads in the Hacker News discussion. Only one of them is an approval of the points raised in the article. Most of the comments focus on other users’ viewpoint and experience on dealing with unproductivity. Some of them had debated the title of the article (which was written by a human, by the way).

Basically, this hints that, rather than being interested in the AI-generated article, the community had found the topic to be thought-provoking and discussion worthy. In fact, I think the points raised in the comments were much more interesting than the article itself.

Regretfully, none of the media outlets covering the story took care to look into this. Instead they (and Porr himself) highlighted one comment where user had voiced their suspicion about the article being written by GPT-3, which was downvoted by others.

Users on Hacker News downvoted a comment that alleged the article was written by GPT-3

I think this is pretty natural. While the article was written by an AI, the discussion was purely human, and some people were probably following it with interest (more the reason to upvote the post itself and bring more people into the discussion), which comes with some expectations of participants to remain cordial and on-topic.

The blog stats

According to Porr, the blog received 26,000 views and 60 subscribers in one week. Again, the media picked this up as proof that AI had fooled people into thinking a human had written the blog.

Here’s an excerpt from The Verge: “The post went viral in a matter of a few hours, Porr said, and the blog had more than 26,000 visitors. He wrote that only one person reached out to ask if the post was AI-generated, although several commenters did guess GPT-3 was the author.”

But 26,000 views doesn’t mean 26,000 people enjoyed the article. It only means that many people found the title of the articles intriguing enough (do I need to remind you the titles were written by a human?) to click on them.

I would also want to know more before I would use the view stats as a measure of the blog’s popularity. Were the view stats distributed across all blog posts or did they mostly belong to the one popular post that made it to the top of Hacker News? How many return users did the blog have? How distributed were the traffic channels of the blog? How engaged were the subscribers with the blog’s posts? What is the blog’s bounce rate? Answering these questions would give us a better picture of the organic virality of the blog and how well the AI had managed to convince its readers its articles were genuine.

Hacker News is a top-10k website on Alexa.com, which means it receives millions of visitors per month. I suspect that the views of the blog spiked when that one post that made it to the top of forum, and then plateaued at a very low daily rate when it dropped off the chart. The news coverage in the recent week has probably given it another boost in traffic.

I did a quick search for “adolos.substack.com” on Twitter to see how many users were sharing the blog’s content. Recent shares were caused by the media hype around GPT-3 having written the blog and most users are discussing how convincing the AI writing is. But if you scroll down to mid-July, when the productivity article was published, the frequency of shares reduced, and it mostly included bot accounts that monitor top posts on Hacker News. As the article started going up the chart in Hacker News, a few other users also shared it.

Many of the Twitter users sharing the AI-generated blog were bots that monitor Hacker News top posts

The other articles in the blog received very few shares, which speaks much about the blog’s popularity.

According to Porr, aside from a few users on Hacker News, only one person reached out to ask whether the blog was written by GPT-3. This was another one of the key highlights of the articles written about the AI-written blog.

Again, I think there’s a misunderstanding of the stats here. One user expressing doubts about the blog being written by AI doesn’t mean others didn’t have such suspicions. Also, the topic of the blog was creative thinking, which means many of the people who read it didn’t necessarily know about GPT-3 and advances in natural language processing.

There’s a likely chance that a lot of people got frustrated by the poor and inconsistent writing and left the site without looking back. And a few more people might have seen the telltale signs of AI writing but didn’t bother to comment on an anonymous blog that was just set up a week ago.

To give more context: People are more likely to point out mistakes if they see it in a reputable source (say Wired or TechCrunch). But when you see a poor writing domain-less blog with bad writing, you’ll just dismiss it as one of the millions of other bad websites that exist.

How well does GPT-3 write?

To further investigate, I read a few of the articles on the blog, starting with the one that became very popular, “Feeling unproductive. Maybe you should stop overthinking.”

It’s not top-notch writing, definitely not something a professional writer would deliver. There was a lot of repetition. I had to re-read some of the sentences to grasp the meaning.

But even though my mind was primed to look for signs of artificial writing, I had to admit that it stayed on topic, and it didn’t have the confusing references found in other AI writing. It had consistency and read more like an article written by a non-professional writer. It shows how far AI has come in spitting out coherent text.

In fact, it was written well enough that some users became suspicious about AI having generated the text. “Now, if you spin this further I could come and guess that the real experiment here is that you actually wrote the ‘overthinking’ article yourself, are now claiming that GPT-3 did it and keep on watching the upcoming debate about it,” one user wrote after Porr made the revelation.

So, was this really GPT-3 writing coherent text or a huge publicity stunt? Did GPT-3 manage to nicely stitch together parts of its training data? Was there more than a little human help involved?

At this point, I can neither confirm nor reject conspiracy theories. But as I checked the other articles in the blog, the quality of the writing was visibly inferior to that of the overthinking post.

For instance, in this article, the AI starts with dealing with plateaus when writing new posts. Then he talks about a friend who had shared experience about hurdles in Marines bootcamp. Further in the article, the author speaks about his own time in Marines bootcamp and then moves on to the business world. Although there’s a sort of logic involved, the sequence of events is more than a bit confusing.

There are also signs of human manipulation. For instance, in the same blog post, one of the paragraphs starts with: “Since I’ve started this blog I’ve overcome one plateau after another.” When spinning out articles, GPT-3 knows nothing about the medium where it will be published or the previous articles published there. Porr would have to be extremely lucky for the AI to have randomly generated that sequence.

The only way we can find out the truth is to perform some reproducibility experiments. Porr would have to disclose full details of how he used GPT-3. This includes the configuration of the randomness parameter and the response length. We would also have to know how much of the intro for each article was written by Porr himself. Then someone who has access to GPT-3 can run the same queries in the AI and check whether the output (or its quality) matches the articles on the Adolos blog.

What is the impact of GPT-3?

In his final blog post, Porr described his observations, including the shortcomings of GPT-3: “If you read some of the content I made, you may not be convinced about its quality. Indeed, there are traces of illogic, difficulty with staying on topic, issues with repetition, etc.”

This is why he chose productivity and self-help as the topic of his blog posts. “GPT-3 is great at creating beautiful language that touches emotion, not hard logic and rational thinking,” he writes.

If you look at the articles, they mostly read like personal experience. There’s no fact-based logic involved, which would make it easier to hide the inconsistencies and hard to debate the veracity of the claims.

Porr believes GPT-3 can become a writing tool and help writers become more productive and save media companies millions of dollars by cutting staff. Alternatively, according to Porr, GPT-3 will give rise to a new breed of “fast and lean” media companies. These organizations use AI to create vast amounts of articles and small teams that only make the final edits to fix logical mistakes and inconsistencies.

After a fashion, he’s right. There’s a lot of poor content out there. Many of the things you read on the web are spinoffs of other articles. There’s too much cheap plagiarism and too little original content. GPT-3 might be able to automate all those tasks and put many “content writers” out of work.

But this only shows how poor human writing has become, not how good AI writing is. People are writing articles for search engines, for social media content-ranking algorithms. As we have come to rely on algorithms to curate our content, our own writing has become optimized for those algorithms. And that is something that can be automated. GPT-3 or some other AI might enable content farms and online media to fill social media feeds and search engine results pages without the need for human writers.

But it won’t necessarily lead to an increase in revenue, as one user pointed out in the comments section of Porr’s final blog, and can have the reverse effect.

What will the impact be? Overall, there will be some adjustments, but I don’t think people will stop reading online content or lose trust in written content. In contrast, it might lead to more appreciation for human creativity.

The rise of AI-generated articles might cause a shift in the way people find content online. For instance, as the quality of search results and social media feeds decreases, the work of human curators who find quality articles and share them with their audience will become more valuable. An example is the newsletters of business analysts Benedict Evans and Shelly Palmer, who regularly share read-worthy articles from around the web. The saturation of the web might in fact create new jobs for people who can separate the wheat from the chaff.

AI writing is real, but it won’t replace humans

First, hats off to Liam Porr for the interesting experiment. Although I do not agree with his conclusions, I think his work has highlighted some fundamental problems we face.

The media need to change how they cover advances in AI. They should take a deeper look at things before writing sensational articles and creating misleading hype about language models.

As I’m writing these last lines, I’m pondering on the process, from the moment I had the idea for this article, to the investigation, my previous experience with the hype surrounding AI, and everything else that led to what you’re reading now. Could all of this be automated by a machine that looks for patterns in numbers? I don’t think so.

I think GPT-3 will have a role in writing. Give it a query and it will find something relevant in the terabytes of data it has analyzed. That can help you find new directions for your writing, and it is how AI is being used in other fields. But for the time being, the human creative process will be beyond GPT-3 or its successor(s).

Some might argue that the same was said for chess and Go, but both proved to be problems that could be solved through brute-force math. They might be right, but language is perhaps the most complicated function of the human mind and probably one of the last areas AI will conquer. So there will be many other signs before we see AI that can imitate human writing.

To fellow human writers, don’t worry about being replaced with AI, unless you’re writing for the robots (search engines, social media networks, etc.). Things might change when (if) someone manages to create human-level AI, but it probably won’t happen in our lifetimes.

And to the robots reading this article, no hard feelings.

This article was originally published by Ben Dickson on TechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech and what we need to look out for. You can read the original article here.

Published September 1, 2020 — 09:27 UTC

Read More

tianze.zhang@graduateinstitute.ch