content consolidation meta study

A bit of background here before we get too deep into it: I have always been deeply curious about how successful content consolidation is for websites, at scale. Particularly in this era of AI. But I hadn’t seen anyone do it.

The Hypothesis

The hypothesis I was working to answer was essentially that topic based content does better post-HCU than keyword-based content, therefore websites that took the initiative to consolidate similar keyword-based pages into one topically based page improved in organic traffic.

So I decided to do it myself.

The Methodology & Data

I chased the team at SEMRush at the end of 2024 (thank you for humouring me), and got monthly data for a number of domains, for as long as I could, with number of pages ranking in the first 100 results of organic searches as well. This amounted to about 8,400 domains, with data that included page count from February 2023. I had hoped to get data from before the first Helpful Content Update, but unfortunately we couldn’t make that happen. The initial data request I sought was:

  • We want a sample of between 7-9K websites, ideally primarily not ecommerce if that filtering is possible
  • Jan 2022 – present estimated traffic by month (and if possible by subfolder I can do that regex filtering myself; if not domain is okay)
  • Website has more than 1,000 unique pages
  • Number of pages dropped by 40%+ 

I wasn’t able to address all of these at the start, but many.

Raw spreadsheet data from SEMRush, and the engineer I was working with, Yulia.

Because this was so much data, I pulled it all into Big Query to run analysis on. First, cleaning:

  1. I segmented out the domains so any domains that had pages but reported zero traffic were taken out of the analysis
  2. I classified domains as ‘SHUTDOWN’ if their reported traffic or reported pages was only 5% or less of their peak reported traffic
  3. The rest was determining thresholds and ranges for the classifications of whether traffic had improved or declined.

From there I went into the analysis. (As I am not a seasoned expert with SQL I sat with Claude and Cursor and ChatGPT to assist in writing queries for the data analysis, and validated their results with my own manual review). To supplement this, I used Watson’s Natural Language Understanding to validate the associated industry of the website. 1,715 of those threw an error, and in the first clean of data 1,742 were shutdown domains.

Of the websites that did not have an industry classification, 330 were shutdown domains, so 3,127 of the ~8,500 domains were incomplete in their data. For the general performance data, I used the cleaned dataset without the shutdown data (6,679 domains) and for the industry specific analysis, I used the cleaned dataset without shutdown data and with industry classification: 5,294 domains.

ClassificationAverage Start PagesAverage End PagesAverage Absolute Page ChangeAverage Relative Page ChangeAverage Traffic Change
MPMT16,13026,91010,78081.40%74.11%
MPST11,69016,5044,81467.65%-6.15%
FPLT29,48719,659-9,828-34.16%-55.92%
SPMT18,86020,2151,35512.26%50.78%
SPLT10,79910,90210311.11%-51.01%
MPLT10,09014,3974,30776.80%-46.09%
SPST15,22215,53231010.45%-8.91%
FPST32,02820,756-11,272-30.72%-10.85%
FPMT30,35322,168-8,185-30.96%54.71%

A quick translation:

  • MP: More pages
  • MT: More traffic
  • FP: Fewer pages
  • LT: Less Traffic
  • SP: Same pages
  • ST: Same traffic

The Results

NB: I will be coming back to this and updating this section to include more industry specific results and any further requests on the data.

For those of you that saw my talk at SMX Munich (thank you for attending!), this is the bulk of what I discussed. The TL;DR? Strategically reducing the number of pages on your website by even 10% can improve your organic traffic by an order of magnitude – some websites saw a traffic improvement over the time of this analysis of +~55%.

The domain classification I was most interested in, FPMT (fewer pages, more traffic), was the smallest group, at 391 domains. So I dug deeper into the data I had available: as a top level study I did not have the data to analyse by subfolder either traffic or number of pages (this would have been nice, but gosh, imagine how many rows of data that would be?!)

One of the most interesting findings from this data to me is that the industries that had the greatest increase in traffic were relatively highly regulated ones:

  • Law, gov’t and politics: ~30% reduction in pages with a ~+90% average traffic gain
  • Health and hospitals: ~32% reduction in pages with a ~+70% average traffic gain
  • Business/industry: ~32% reduction in pages with a ~+60% average traffic gain

So if I didn’t have the detail to see the category level changes in the website, why am I assuming the websites that successfully reduced their pages and got more traffic strategically consolidated and pruned their content?

Because whether you gained traffic or lost traffic, on average websites that reduced their page count reduced their page count by ~30%.

So, the successful sites had to be doing something different. To me, this is enough to stand my hypothesis on and call it true.

To see my full results analysis, visit my speakerdeck.