A deep dive into the Reddit-Google data-sharing deal

by Kayla Zhu · Mar 20, 2024

All eyes are on Reddit, as the discussion board site announces a new partnership with Google on the eve of its IPO

Introduction

Reddit announced a data-sharing partnership with Google in late February, just before its long-awaited IPO this month.

The partnership gives the search engine tech giant access to Reddit’s Data API to train Google’s AI models. This estimated $60 million dollar per year partnership also allows Google to display Reddit content in “new ways” across its products.

On the other side, Reddit will gain access to Google’s Vertex AI service – an AI tool designed to improve companies’ website search.

Reddit’s new data-sharing deal with Google is one of the first cases of a social media platform providing user-generated content to another entity for the express purpose of training an AI model. 

This data licensing deal will be a key source of revenue for Reddit and potentially help boost its valuation as the company ramps up to go public this month.

What does this deal mean for Google, and for Reddit and its users? 

Reddit’s User Base

The Typical Reddit User

Reddit’s user base has grown considerably since its humble beginnings.

From 70 million active monthly users in 2013 to 850 million active monthly users in 2023, Reddit’s usership growth year-over-year has been steady. Viewership has also seen significant growth – from 2018 to 2020, monthly views jumped from 14 billion to 30 billion, according to company data.

But who are these users?

There isn’t too much demographic data on Reddit’s user base, but according to a poll conducted by Pew Research Center in 2016, Reddit users are predominantly male, young and white. 

In reality, the Reddit user base is a massive and incredibly diverse community. With over 100,000 active subreddits – ranging from the ultra-mainstream to the hyper-specific, Reddit is a widely used platform for both casual lurkers and dedicated moderators. 

However, the enduring archetype of the average Reddit user, and one that has proved on multiple accounts to be true, is somewhat of a mean Internet troll. In fact, there are a plethora of studies analyzing the prevalence of trolling and toxic behavior on Reddit. 

As this TikTok portrays, the “average Redditor” stereotype involves being a know-it-all, sarcastic and condescending  – in short, insufferable. 

While not all users fit in this stereotype, these bad actors are often the first types of people we think of when we talk about Reddit.

Issues With Reddit Content

With incidents like the r/WallStreetBets “To the moon!” GameStop saga (that earned itself a Netflix series), Reddit users have been notorious for being unruly, unpredictable, oftentimes a little nasty, and for not taking themselves too seriously.

Furthermore, the discussion forum website has had its fair share of controversies, including the proliferation of misogynistic, racist and otherwise discriminatory content, as well as the spreading of misinformation. There is a long list of controversial subreddits, many of which have since been banned. 

With these types of users generating content on Reddit, some people have raised concerns about using the website’s content – some of which is quite unhinged – to train and potentially steer AI models into unsavory directions.

  • “So, Google AI will be trained on YouTube and Reddit? Well, I certainly don’t envy the people that will have to work on making the resulting AI model not toxic,” – r/REOreddit
  • “Garbage in, garbage out” – u/michigician
  • “How dare they appropriate my years of sh*tposting for commercial gains. The humanity” – u/Alb4t0r

However, some users expressed optimism at the data-sharing deal, believing that the interactions on Reddit could be hugely informative for AI training.

  • “I see a lot of people talk about how Reddit is a garbage place for information. While there are a lot of bad parts, any company worth its salt is going to filter the data to find high-quality conversations. If Google can do that well, I think Reddit has an insanely high number of good chat examples that would make for a very good model,” – u/a_slay_nub

Needless to say, there will need to be some serious filtering on the Reddit data. Issues with pornographic content and other illegal content have always been rampant on the site. While the site has doubled down on content moderation, it still could be a concern for Google when training its AI.

Monetizing Reddit Users

Reddit started getting aggressive with their ad strategy in 2017-2018, but the success of its ads have been below par, according to some users. It’s well known that Reddit ads generally have a lower conversion rate compared to ads on other social media platforms like Facebook.

This isn’t surprising though. Reddit’s user base just doesn’t seem to really engage with its platform’s ads.  

  • “Reddit IPO success hinges on its infamously unqualified web traffic (userbase that does NOT engage with advertisements or buy things from the platform). This is the key reason why reddit is unprofitable” – u/awesomedan24
  • “Compared to all the others (facebook, (old) twitter, snapchat…), we really don’t like being advertised to. It will be interesting to see if that has changed at all after they tried to squeeze more money out of the API changes” – u/FoolishChemist
  • “I think it’s mainly because this is a relatively anonymous platform so there is very little user data to sell when compared to Facebook or Instagram where people post their faces, lives and routines,” u/brianswichkow

Some brands are literally scared of posting Reddit ads with commenting turned on because of the oftentimes hostile and troll-ish comments they get from the notoriously anti-ad user base.

  • “Just ran my first ad on the relevant Python subreddits and the comments I got were mostly pretty discouraging / downright hostile. Not a great experience – especially considering I can’t do anything about the comments even though I paid for the ad + this is my own money, so it’s disheartening to pay for the privilege of largely being trolled,” wrote u/uncloud_hq
  • “It’s just trolling. The vocal part of reddit’s community is generally not very friendly towards advertisers, so when an advertiser has comments enabled on their ad post, it gets spammed with meaningless copypastas and weird drivel,” wrote u/HireALLTheThings

Still, in 2023, advertisements accounted for 98% of Reddit’s $804 million in revenue, and the company anticipates that ads, alongside its data-sharing deal, will be its main source of revenue. 

Navigating ads on the sometimes hostile Reddit subreddits will continue to be a challenge for both Reddit and advertisers.

What Reddit Means To Google

Google, Reddit And Search Results

Reddit and Google have always been closely intertwined, even before this data-sharing deal. Their relationship has been generally favorable over the years, with Google rolling out a number of updates that benefitted Reddit content, including its Discussions and forums feature and improving its search to prioritize real reviews – often from Reddit

But how important is Reddit to Google now?

According to a Detailed search engine result page (SERP) analysis of various keyphrases targeted for product reviews, the answer is: very.

In this analysis, Reddit posts were present in 97.5% of results under the “Discussions and forums” results for over 10,000 keyphrases, dominating over most other domains. 

The numbers don’t lie. Users have also noticed the recent spike in Reddit traffic from Google in the past few months.

Users on subreddit r/SEO also brought up the recent phenomenon of Reddit posts consistently ranking very high on SERPs. The apparent push from Google to promote user-generated content (UGC) from sites like Reddit and Quora has marketers questioning their traditional SEO methods.

Google’s apparent preference of UGC makes sense – in our modern internet filled to the brim with AI-generated content, some suggest that Google may be looking to boost human content as a way to improve search results.

However, this approach may have backfired on Google. While the intention was to improve search by promoting useful, user-generated content from sites like Reddit, marketers and brands who realized how much preferential treatment Google was giving Reddit quickly found a loophole to promote their stuff. 

In the Detailed SERP analysis, 51% of the top comments on Reddit posts that were ranked at the top for Google’s search results were spam. Reddit mods and users have also been noticing the wave of bot posts and activity in recent months as well.

The influx of spammy and AI-generated content is no doubt a major detriment for user experience and overall usefulness of Reddit.

“My take: Fighting against AI-content isn’t going to be as simple as Google thinks. While promoting user generated content could work as a very short term solution, its eventually going to become either overrun with spam or have comments that feature AI content,” wrote Twitter user Chris Long.

Having SEO bots spam top Reddit threads could lead to some users dropping off or distrusting the platform, as suggested by Twitter user Lily Ray.

“Now that I know Reddit is being subtly spammed for SEO reasons, I have less desire to read and/or trust Reddit content,” wrote Ray.

The Amount Of Reddit Content Google Is Getting

While there aren’t any details about what kind of content, how much, or what time period of content Google will be receiving from Reddit, it will no doubt be a lot.

It’s hard to quantify the amount of data produced by Reddit users. A group of researchers built a dataset of Reddit comments called Pushshift, includes all the submissions and comments posted on Reddit between June 2005 and April 2019 – a total of 651 million submissions and 601 million comments posted on 2.9 million subreddits. Their estimate of 14 years of Reddit data was “several terabytes.” 

The researchers stated that by the end of their dataset (April 2019), they averaged 5 million comments per day. Now in 2024, it wouldn’t be surprising to see figures exceeding that.

Similar data-sharing deals between a UGC website and an AI company don’t exist yet, so it’s difficult to compare the $60 million per year Reddit is getting from Google against other deals. 

Coming up on Reddit’s IPO

Why IPO Now?

Reddit’s data-sharing deal with Google comes as the community-centered discussion forum site announced plans for its IPO a few days prior to the partnership announcement. 

Reddit announced that it is planning to launch its IPO sometime in March 2024. Its current valuation is $6.5 billion, and the company said in its filings that it’s planning to sell 22 million shares between $31 to $34. 

There’s a lot of buzz around Reddit’s IPO. They’re the first major social media website to go public since Snap’s 2017 IPO. Both 2022 and 2023 were slow years for IPOs, and Reddit’s high-profile IPO could inspire other tech companies to follow suit.

The company previously attempted to go public in 2021 to no avail – but it did end up raising over $1 billion that same year, earning a private market valuation of $10 billion back then.

The company’s current financial figures aren’t exactly music to investors’ ears. In its SEC filing, Reddit reported a net loss of $90.8 million in 2023, compared with a $158 million loss the year before. They did report $804 million in 2023 annual revenue, up about 21% from $666 million in 2022.

Still, a $90.8 million net loss doesn’t paint a favorable picture for the financial health of a company poised to go public.

The company also just went through a rough patch last year, when it started charging for access to its API, drawing ire from many subreddits that relied heavily on third party apps to moderate and view Reddit content.

So, why go public now? 

Some posit that Reddit is trying to “ride the wave of generative AI” with this Google deal to maximize their valuation. It’s possible Reddit may strike similar deals with other AI companies, providing them a steady, lucrative stream of income for the foreseeable future.

Others are a bit more pessimistic, suggesting that the company is executing a pump-and-dump IPO with the goal of providing liquidity for the existing shareholders and hitting a specific market cap valuation so executives can access their performance-restricted share units (PRSU).

Financial experts are already warning retail investors about buying Reddit stock. 

“Reddit’s IPO marks the return of the junk IPO. I think the company may never monetize its platform without angering its users and the entire premise of Reddit is user-generated content. This business model is inescapably built on a catch-22: make money or please users,” wrote David Trainer for Forbes.

What Could The IPO Do For Reddit?

Regardless of what the future holds for Reddit, it’ll certainly be receiving a boost of income come its IPO. What could Reddit do with the funds? And could Reddit’s IPO help grow its user base, and actually improve user experience?

In its filings, Reddit said it plans to spend $467 million on R&D, which has led to confusion among some users. “$468m of R&D? This site hasn’t changed since 2011,” wrote u/Winterough.

R&D could go into investing into better adtech, or improving the platform’s infrastructure to be better suited for AI training – both of which help to increase Reddit’s profitability but not its user experience.

Possible R&D investments that could actually improve user experience would be enhancing their mobile app (which many users are stubbornly refusing to adopt), improving its internal search, expanding localization and internationalization efforts or developing better content moderation and content discovery tools.

Improved content moderation, especially to combat hate speech, illegal content and AI-generated spam, would also improve the quality of its data and possibly make data-sharing deals more attractive for other AI companies.

In its prospectus, Reddit said another way it’s looking to monetize is by creating a developer platform and contributor program to help individuals monetize their work and creations. It also hopes to create a user-to-user marketplace for buying and selling digital and physical goods. An infusion of cash from the IPO could help fund these new features.

It’s still too soon to say whether or not Reddit, or its users, will see much upside from the IPO. While the sentiment around Reddit’s success as a public company has been relatively negative, the platform does boast a growing user base and a shiny, new AI deal with Google.