Buy the (avocado) dip: Analyzing nutritional buzzwords on Reddit


Big CPG companies are struggling to move quickly enough to meet changing tastes. Kraft Heinz’s recent write-downs are a particularly salient example of this.

One way incumbents have sought to keep up is by acquiring small brands that have a more direct connection with consumers and are perceived as healthier, fresher, or more authentic. Prominent examples include General Mills’ acquisition of Epic and Annie’s and Kellogg’s acquisition of RXBar.

Kellogg bought RXBar for $600m in 2016

It’s hard for large companies to keep their finger on the pulse of what is popular with consumers in lots of fragmented categories. Text analysis is exciting to me because it offers a way to learn about the important things in a huge subject relatively quickly, and I’m interested in how to apply it to help predict consumer trends. With that in mind, I’ve been experimenting with ways to use text analysis to find trends in Reddit data.

It feels like much of the recent innovation in packaged food has been centered around health. I wanted to see how conversation on a dense, health-focused digital community like Reddit’s nutrition forum could be an indicator of broader adoption of a product or category.

For example, avocados have had quite the decade in popular culture and sales. Were people on reddit’s nutrition community talking about them before the broader public? I compared reddit data with Google trends data (seemed like a sensible proxy for broad interest) to try to gauge this:

It’s not a perfect comparison and there are lots of ways you could improve it, but at a glance it looks like interest in avocados on the nutrition subreddit spiked around 2014 after going up quickly, while the broader public’s interest in ‘cados as indicated by Google search data has been steadily rising since 2011.

I pulled Reddit data going back to 2010 with a Python script using the pushshift API. It is quite easy to do and I encourage you to play around with the script and query other subreddits you’re interested in. I analyzed the data in R, and that script is available here. Let me know if you have any questions or if you find something interesting.

Visualizing Bigrams in the Nutrition Subreddit with R

To get started analyzing a body of text, I like to create a graph of the bigrams, which shows the most common word pairings. You can see the output above. This helps give a sense for what the community is talking about and gave me ideas for topics to dig deeper into. For example, I hadn’t thought of looking into coconut oil or fish oil, and looking at this chart reminded me of those products.

You can learn how to build this visualization in R on this exceedingly helpful site.

After looking at the bigram chart, I wanted to explore individual topics and products. I wrote a function to plot yearly mentions per thousand posts of text I input. Below are some of the more interesting outputs.

A Greek Yogurt Tragedy

Interest in greek yogurt exploded in 2013 but has steadily declined since. The category has soured! Sales for greek yogurt have declined a bit, but not nearly this steeply. It seems like subreddit mentions are more volatile than broader interest in the product, which makes sense: people probably post about new, exciting things a lot, and then back off as they become commonplace.

Lots of interest in Plant-Based foods

If the market cap of Beyond Meat ($10B as I write this) is any indication, the markets have caught on to this trend.

Keto and Intermittent Fasting…people are into them

I don’t know very much about keto aside from that I don’t really like listening to people talk about it, and that feels like that has been happening more and more lately. The data seems to support this. Intermittent fasting is also seeing a lot of buzz on this subreddit.


I wonder how sales of those Yale-script “Kale” shirts are doing.

Gut Health

This one I find interesting. It makes sense with the increasing popularity of kombucha and fermented foods generally. It seems like a promising category, especially as the science around the importance of gut health continues to develop.

The most-mentioned nutrients on r/nutrition

Is sugar a nutrient? Anyway, I would be interested to see how this changes in the next 10 years. Obsession with certain nutrients seems like it goes in about 10 year cycles.

Conclusions and ways to build on this

As is, this is a fun way to learn about what people talk about in a given subculture. I struggled finding adequate data on individual brands vs. categories, and I think that analyzing bigger subreddits could help, as could analyzing comments in addition to posts. This would help make the data granular enough to analyze trends at a brand level versus a category level.

A polished and productionalized version of this analysis which surfaces the newest trending brands and categories each month could serve as a discovery tool for big CPGs looking to snap up companies in hot spaces or launch new products. This sort of trend data could supplement other data sources and human analysis in picking acquisition targets.

One could extend this by incorporating sentiment analysis to assess the intensity with which consumers talk about products and topics. This could strengthen the signal of whether something will truly be a hit product or category or if it’s a flash in the pan. It would also be interesting to monitor multiple subreddits and see when different communities start to overlap (for example, the “coffee” and “nutrition” communities would have had an interesting overlap on “cold brew”).

I am working on a similar analysis of the “male fashion advice” subreddit (see the sinking of boat shoes below) and on the coffee subreddit. Please reach out (joe at this website via email or on twitter) if you have any ideas for interesting things to look at or with any questions. I’m happy to share the underlying data or talk about the code.

Thanks for reading. If you’re interested in this sort of thing, I’ll email you when I write something new: