The Marketer’s Guide to Identifying & Fixing Google Index Bloat

Spread the love

Google has rolled out a number of quality related updates recently – from Panda to Fred to Phantom, it’s clear that quality content and strong user engagement metrics are increasingly important components of ranking well in Google.

While you may not need to recover from an explicit Google penalty, if a lot of your web pages are getting indexed, there’s a good chance that your site may be suffering because of low-quality, duplicated content that is eating up crawl budget and link equity, while also having very poor link and engagement metrics.

Index bloat is when your website has too many redundant pages in the search engine index. Click To Tweet

So how can you – as a marketer who may not have a comprehensive understanding of how Google’s index works – find and fix indexation issues?

In this post I’ll walk you through some relatively simple (but frequently time-consuming) steps for identifying if you have an issue with duplicate/thin content and indexation, how to determine which pages are causing the problem, and your options for cleaning it up.

Learn More: How to Recover From Any Google Penalty

Step 1: Determine if You’re Suffering from “Index Bloat”

The first step here is to identify whether you have an issue at all. What we’re looking to discern out of the gate are the number of pages you’re allowing to be indexed versus the number of pages that actually rank and drive traffic.

First, we can get some rough estimates as to how many pages on our site are indexed by going to Google and doing a site operator for our domain. My kids and I watched Harry Potter last night, so I’ll use the site as an example here:

image3 1

The site search operator can be a bit confusing – if I actually click through all the results for, I only see a few hundred results (I’m showing 100 results per page, so I click through to the last page of results, click show all, and now we are seeing the first few results from 601-700):


Google’s Gary Illyes explained that these are just estimates, and are less accurate for larger sites:

Gary Illyes

This is one of the things that makes diagnosing indexation issues difficult: there’s not always a comprehensive, specific list of the exact pages that are or aren’t in the index, and there are frequently many pages on your site that are indexed that you aren’t seeing in search results.

Free Bonus Download: Get our free SEO guide to learn how you can skyrocket your rankings, even in a competitive niche! Click here to download it for free right now!

I’ll go into more detail on how to deal with that knowledge gap in a minute, but beyond the site operator estimates you can also get indexation numbers from Google Search Console:


Obviously we don’t have access to Pottermore’s Google Search Console account (the above is an example from a different site), but if we did we could check the raw indexation numbers against the estimate (for smaller sites, frequently you’ll find they match up fairly closely).

Once you’ve quickly checked to see how many of your site’s pages are in the index, you’ll want to understand, at a high level, how many actually drive organic traffic for your site. To do this, you can start with Google Analytics:


I always like to create a custom report to see landing page data, but you can also quickly drill down to your content report and look at the organic traffic segment as pictured above, and scroll to the bottom of the screen to see the number of pages actually driving traffic:

image1 2

Similarly, you can see the number of pages generating search impressions within Google Search Console by looking at Search Analytics and drilling down to clicks (or impressions) and pages:


And then looking at the total number of pages listed:

image2 1

If you have thousands of pages showing in the index in Google Search Console and in site search operator estimates, but only a few hundred pages drive all your site’s traffic (and/or only a handful of pages drive most of your site’s traffic), you likely have an issue.

Learn More: 10 Google Search Console Hacks to Boost SEO

Step 2: Diagnosing Which Pages Are Causing Your Indexation Issues

So you’ve looked at the rough number of indexed pages for your site and the pages actually driving traffic to your site, and have identified a potential issue: how do you go about actually identifying which pages on your site present an “indexation issue?”

In other words: how do you know what pages you should be “cleaning up” and getting out of the index?

There are a few great tools that can help with this process.

Deep Crawl

I use Deep Crawl frequently, particularly for larger sites. It may be a bit pricey for some, but if you’re making an investment in SEO and have indexation issues it’s pretty easy to justify a subscription (if the price tag is a turn-off, don’t worry, I’ll outline some less expensive options as well).

Essentially what you can do here is start a crawl, and be sure to link Google Search Console accounts for your site. Once you’ve run a crawl, which can take some time, you’ll be able to get access to all kinds of great data, but for our purposes we want to start by going to Issues > Duplicate Content:

Deep Crawl

As you can see, you can quickly get a sense of the pages on your site that are indexable and duplicated.

Next, however, comes the hard part: analyzing these URLs to find out why your content is duplicated (and then actually fixing the issues!). We’ll deal more with that in step three.

For now, we want to look at one other Deep Crawl report called Indexable Pages without Search Impressions:

Deep Crawl report

If we navigate to Traffic > Landing Pages > Indexable Pages Without Search Impressions we get the above report.

In the case of this particular site, most of the duplicated pages actually aren’t getting any search impressions, but for many sites the “Indexable Pages without Search Impressions” may include a lot of pages that don’t show up as pure “duplicates” but are actually unique pages, just lower quality. (Again: more on how to sift through these and determine what to remove from the index later.)

Screaming Frog

Screaming Frog is another great (and less expensive) option for identifying duplicate content and areas of your site where pages are indexable but not showing up in search results. Once you’ve run a crawl of your site, you can quickly get a list of pages with duplicated title tags or meta descriptions:


Once again, after you’ve linked your Google Search Console account you can generate a report with all the URLs that did get crawled, are indexable, and aren’t showing up in Google Search Console:

READ ALSO  A CMO's Guide to Online Reputation Management


Google Search Console

A third and even better-priced (i.e. free!) option for getting to helpful duplicate content and indexation information is Google Search Console. With this tool you can quickly drill down to get duplicated title tag and meta description information as you would with Screaming Frog or Deep Crawl:


One advantage of the Search Console data is that you’re actually getting the report “straight from the horse’s mouth” and can get a glimpse into what Google actually sees.

Beyond this duplicated data, Google also has a new “Index Status” report (currently in beta and being rolled out to more accounts) which offers a lot more data on which specific pages on your site are actually being indexed:

Google Search Console

The report even allows you to drill down to see specific issues with pages and “indexed, low interest” pages where Google has indexed the page, but doesn’t crawl it often:

Google Search Console2

These indexed, low-interest pages may be pages you want indexed (so you don’t noindex or delete them automatically), but they very well may also be lower-quality pages you can work on removing from the index.

Free Bonus Download: Get our free SEO guide to learn how you can skyrocket your rankings, even in a competitive niche! Click here to download it for free right now!

Google Analytics

Now that you’ve worked through Deep Crawl, Screaming Frog, and/or Google Search Console, you’re likely aware of a lot of duplicated pages and indexed pages that you don’t want indexed.

The next step will be to work on actually removing these pages from the index, but first we want to make sure that we don’t skip over one remaining cohort of pages that we might want to remove from the index: low-quality pages with poor engagement.

As you saw, the reports above largely focus on duplicated content and pages that are indexable but get no impressions or traffic. But what about the pages on your site that may be “legitimate” pages yet have no real value to visitors?

If you have a batch of three-year-old press releases or a subsection of your site that’s been neglected and isn’t relevant or useful, there may still be some traffic from search going to those pages, but visitors might not be having a positive experience when they get there.

To unearth pages with relatively few visits and poor engagement signals, we can drill down to look at a landing page report in Google Analytics segmented for organic traffic:


If you grab a significant amount of data in terms of date-range (e.g. the previous year), you can then export that report and see which pages have unusually high bounce rates:


In this report there were a handful of these types of pages, which I find is often the case, even with relatively simple sites that have just been creating blog content consistently for an extended period.

So now you should have a big list of URLs that are some combination of:

  • Duplicated content
  • Pages that are indexed but not generating any impressions
  • Pages that have very poor engagement metrics (high bounce rates, low time on site, etc.)

Next comes the trickiest part: actually removing these pages from the index and fixing your issues with “index bloat.”

Related Content: How to Set Up Goals and Funnels in Google Analytics

Step 3: Removing Low-Quality Pages & Fixing Your Indexation Issues

As you move to actually sorting out your indexation issues, the first step is figuring out which types of duplicate and low-quality content are present in the index. You should have a big list of URLs that are likely duplicated or low value as a result of the first two steps, but how do you actually address these issues?

I like to start by classifying the different types of issues that the site is suffering from. This is something that can be quite time consuming since you have to review each of the URLs you’re seeing in these reports. If you’ve reviewed a lot of different sites, then this is something you get pretty adept at and can frequently identify pretty quickly, but assuming you’re new to indexation issues, here’s how you can start to dig in.

First, you want to make sure that the page in question is actually a duplicate or thin page you’re considering eliminating. Then you need to look for the root of the issue you’re seeing. Why is this page thin or duplicated? In most instances, the reason a URL is duplicated will be something that’s repeated across a variety of different pages.

Let’s dig into some reports for Pottermore to see examples.

In looking at the Screaming Frog duplicate title tag report, we see some pages that have the same title tags:


This is effectively the same content on different URLs and subdomains. In this instance, however, the site is configured to deliver different shopping pages based on geography. While that may be the best experience for the user, there are no hreflang tags applied to these pages to help Google understand their purpose.

Here we can see that two distinct geo-targeted versions of the same page are indexed:


This is an example of a set of pages where you likely can’t just delete the content, but you can apply a fix (in this case proper use of hreflang tags) that will help improve the way Google understands and indexes your site across thousands of pages.

Incidentally, if you’re looking to determine whether a large list of duplicated URLs is actually indexed or not, you can run the URLs through URL Profiler to check their indexation status.

Digging a bit deeper, we can see that the reason these titles are duplicated isn’t just that there are multiple versions of a page targeted to different countries:

These two URLs, even in the U.S. shop, have the exact same content. Rather than having two paths through the site (one through products, the other through collections) lead to the same URL, the site is creating duplicated versions of products.

This is true for literally every piece of art on the Pottermore site:

Here again we see that there’s an underlying issue impacting each of these product pages. Now that we’ve identified it, every time we see something similar in our reports we know the reason for the duplication (we’ll get to how to address it in a minute).

Scrolling down our list of URLs, we can quickly find another couple of common duplicate content culprits:


In this instance, Pottermore actually does have unique title tags for their paginated poster results, but generally you’ll often find that sites aren’t handling pagination correctly. Implementing rel previous and rel next and/or adding noindex follow tags to paginated results can help clean up what may be thousands of thin pages.

READ ALSO  A Story On How to Create Influence

Additionally, we can see Google Analytics tracking parameters above. Parameters can be another area where you may have thousands of duplicated pages sneaking into the index.

You’ll take a similar approach with what you find in an analytics report. I don’t have access to the Google Analytics account for Pottermore, but we can imagine that I might have found that some pages like this…

…show up as both thin content within a Deep Crawl report, and also have low traffic and/or extremely high bounce rates. If that’s the case, again I’ll want to note it and add it as a potential “content type” that I may want to address in some way.

Next, I need to start to apply solutions to my list of problems.

Free Bonus Download: Get our free SEO guide to learn how you can skyrocket your rankings, even in a competitive niche! Click here to download it for free right now!

Tools of the Trade: Your Options for Cleaning Up Indexation Issues

Now that I have a list of the URLs that could be causing me problems and I understand the root of those issues, I can start to go step-by-step and get the problem content out of the index. To do this, I have to understand the tools that are available to me.



A common and well-known tool for dealing with indexation issues is a site’s robots.txt file. I find that this tool is frequently misunderstood, however.

An important note about disallowing a page or section of your site through robots: you are blocking Google from crawling this section of your site. If your pages are already in the index, adding a robots operative does not prohibit Google from keeping those pages in the index.

The benefit of this approach is that you’re conserving crawl budget by keeping Google from coming back to these pages. And if you have a new batch of pages or section of your site that’s not yet indexed, you can prevent those pages from being indexed.

The disadvantage is that Google can’t follow links on these blocked pages (if they link to other pages you’d want to discover), and again if they’re already in the index, disallowed pages won’t necessarily be removed.

Meta Noindex Tag


Conversely, by adding a meta noindex tag you can actually have your pages removed from Google’s index once those pages are crawled.

The pros and cons here are essentially the flip side of using the robots file: you have pages in the index removed and you’re able to have links within noindexed content be followed for discovery (assuming you include a meta follow tag as well), but you don’t stop Google from crawling your noindexed pages, and Google may not “come back” to remove your page if it’s a deeper, less important page or section of the site.

To help move along the removal of these pages from the index, you can use the URL removal tool detailed below, or you can submit to the index using Google Search Console’s fetch as Google tool.

Canonical Tag


Generally, it’s always best to try to fix the root of a duplicate content problem, so redirecting pages, preventing your CMS from creating duplicated content, etc. are all preferable to just applying a canonical tag and moving on.

That said, properly adding a canonical tag can help indicate to Google which version of your page is actually the primary or “canonical” result that Google should return in search results.

Just make sure you don’t screw up the implementation!

Learn More: Beginner’s Guide to Properly Using Rel Tags To Improve Your Site’s Rankings

Remove URL Tool


Pairing a robots operative with a noindex tag is pointless, but what if you really can’t get a developer to get access to add meta noindex tags to a section of your site?

One alternative is to couple adding a robots operative with submitting your URL to Google Search Console for removal. There’s even a cool, free Chrome extension to help with bulk removal.

Parameter Handling in Google Search Console

Additionally, if you’re tracking parameters or faceted navigation and want to deal with some parameters that are being indexed that shouldn’t, you can address those from the URL Parameters section of Google Search Console:


Just be careful to properly classify your parameters within the tool:


XML Sitemaps

Additionally, making sure that your XML sitemap is up to date and isn’t generating errors can help Google crawl and understand your site and given the new Search Console indexation reports mentioned above, the XML sitemap will become even more valuable.

Deleting or Redirecting Pages

Say we have ~100 older, lower-quality, 150-word blog posts we wrote when we started our company blog. They may very well be indexed, but are not getting any traffic, have never been shared, and have never been linked to by external sites. We may want to noindex these, but we could also simply delete them and submit the URLs for removal.

Similarly, if we have a page that is very low-quality and not visited by users often (or even duplicated) but has some stray links from external sources, it’s a good idea to 301 redirect that page to the most relevant page (or its duplicate).

Thickening & Promoting a Page

After you work through the pure duplicates, tracking parameters, and lower-value pages you want to delete, you may have some areas where you have pages that are actually of value to users and that you’d like to keep in the index (and perhaps improve the performance of in terms of traffic and engagement metrics like time on site and bounce rate).

In these instances, you can work on thickening up your existing pages, promoting them, and addressing poor bounce rates rather than simply removing them from the index.

Free Bonus Download: Get our free SEO guide to learn how you can skyrocket your rankings, even in a competitive niche! Click here to download it for free right now!

Wrapping Up

If you’ve discovered that many of your web pages have been getting indexed as a result of low-quality, duplicated content in addition to poor link and engagement metrics, I hope that this article has helped you identify whether you do, indeed, have an issue with duplicate or thin content and indexation, and if so, how to determine which pages are causing the problem and then how to clean it up.

Source link

Spread the love


  1. Prߋperly boys,? Mommy finally mentioned after
    theyd cоme up with a number of silⅼy ideas of what God ddid for
    enjߋyable, ?Whaat God really likes is when folks love each otyer and take care of each other
    liie we do in ᧐our family.? Thаt made sense to Lee and Larry so Lee hugged Mommy
    аnd Larry hugged daddy to juѕt make Good happy.

  2. What’s up, after reading this remarkable post i am as
    well cheerful to share my familiarity here with mates.

  3. Superb, what a web site it is! This website gives helpful facts to us, keep it up.

  4. My programmer is trying to convince me to move to .net from
    PHP. I have always disliked the idea because of the expenses.
    But he’s tryiong none the less. I’ve been using WordPress on numerous
    websites for about a year and am nervous about switching
    to another platform. I have heard very good things about
    Is there a way I can import all my wordpress posts into it?
    Any help would be really appreciated!

  5. This piece of writing offers clear idea designed for the
    new viewers of blogging, that genuinely how to do running a blog.

  6. Plus cztery inne towary, które trzeba zobaczyć.

  7. Well I truly enjoyed reading it. This artіcle procured by you
    is very practical for proper planning.

  8. Pretty! This has been a really wоnderful post.
    Many thankѕ for providing this info.

  9. The Bengali film industry has become there since 1890sed a good deal so has the number of film lovers.
    The idea because of this indoor play centre came when the Wilikie brothers tied a rope on the roof of a factory roof and slid down it into bales of hay.
    Without a doubt, your family and friends want out for that perfect interest of one’s

  10. Great pⲟst. I wilⅼ bе going throuցh many of
    these issues as welⅼ..

  11. This is so as not to dull the paper shredding blades.

  12. rencontre celibataire lyon site des rencontre 100 gratuit rencontre sexe 93 creer site rencontre gratuit rencontre gay lille application pour rencontrer des gens a proximite rencontres trans montpellier un bon profil
    pour site de rencontre rencontre celibataire dans l’ain site de rencontre je trouve site rencontre
    seniors rencontre castres rencontre granville rencontres gay
    gironde comment faire des rencontres sur badoo blog rencontre sexe
    rencontre 17 royan site de rencontre android rencontre agricole applications rencontre
    iphone rencontre ado 15-25 ans la coupole paris
    rencontre nantes rencontres de sophie site de rencontre badoo test rencontre rosny sous bois
    rencontre homme tulle message de rencontre avec un homme rencontre femme draguignan annonce femme rencontre rencontre avec femme russe chat pour rencontre amicale site de rencontres gratuites sans inscription tinder rencontre inscription sur pc vivastreet saint
    nazaire rencontre rencontre sisteron 04 sites de rencontre pour seniors b2
    site de rencontre avis numero conference rencontre gratuit site
    de rencontre totalement gratuite rencontre 42 top
    rencontre iphone rencontre libanaise montreal site rencontre elite avis rencontre parents
    divorces rencontrer un garcon bien rencontre sexe haute savoie site de rencontre gratuit sans inscription rencontre brunoy
    rencontre sur toulon rencontrer du monde a toulouse

  13. Howdy I am so glad I found your weblog, I really found you by accident, while I was looking on Digg for
    something else, Anyhow I am here now and would just like to say many thanks for a tremendous post and a all
    round thrilling blog (I also love the theme/design), I don’t have time to read it all at the minute
    but I have book-marked it and also added in your
    RSS feeds, so when I have time I will be back to read much more,
    Please do keep up the awesome work.

  14. You made some good points there. I checked on the web to
    learn more about the issue and found most individuals will go along with your views on this web site.

  15. Genuinely no matter if someone doesn’t understand then its up to other people that they will assist, so here it
    takes place.

  16. Really excellent information can be found on website.

  17. My coder is trying to convince me to move to .net
    from PHP. I have always disliked the idea because of the
    expenses. But he’s tryiong none the less. I’ve been using WordPress on numerous websites for about a year and am worried about switching to another platform.
    I have heard good things about Is there a way
    I can import all my wordpress posts into it? Any help would be really appreciated!

  18. Α loved ones іn Puerto Rico is selling off all of their posseѕsions in order to сome across security ⲟutdooгs of
    the hurricane ravаged іslɑnd. Sοuirceѕ aгe scагсe,
    the family members is obtgaining troubles obtaining watеr, meals, and other stɑndard necessities.
    Thhe loved ones іs attempting to raise $4000 tο be capable too afford housing while the island gets back to regular ⲟpеratіon with water, poweг, and communications.

    Puerto Rico haѕ ѕuffered main consiquences considering that each hurricane Maria
    and Ηurricane Irma ravaged the island. Thе family members has loѕt every thing
    in the recent storms. They lost their household, their furnitᥙre, their boat
    and all of their worldly poѕseѕsion less a few items that sսrviveԀ the storms.

    They would bе most grateful if you would go too
    their auctions and tһeir fund raising internet site if you
    can donate.

  19. hello!,I really like your writing very a lot!
    percentage we communicate more approximately your article on AOL?
    I require a specialist in this area to unravel my problem.

    Maybe that’s you! Taking a look forward to look you.

  20. I blog quite often and I genuinely appreciate your content.
    Your article has truly peaked my interest. I’m going too take a note of your webxite and keep
    checking ffor new details about once a week. I subscribed to
    your RSS feed too.

  21. I know ths web site provides quality dependent posts and other information, is there any other web site which offers these data in quality?

  22. Hi Deaг, are you genuinelу visiting this ԝeb sijte regularly, if
    so afterward you wll definitely take fastidi᧐us knoѡ-how.

  23. You are in reality a fantastic website owner. The website loading pace is incredible.
    It appears that you’re doing any special trick.
    Furthermore, the contents are a work of art. you’ve performed
    a great task on this subject!

  24. Exceptional post. Never knew this, appreciate it for letting me know.

  25. Hi! Quick question that’s completely off topic. Do you know how
    to make your site mobile friendly? My blog looks weird when viewing from my iphone4.
    I’m trying to find a theme or plugin that might be able to fix this problem.

    If you have any recommendations, please share.
    With thanks!

  26. So if you are expecting a lot of help, remember that this isn’t always forthcoming.
    It is common for teachers to lament that students are not able to write despite having done quite nicely inside PMR English exam for 15-year-olds.
    However, it’s also possible to be wondering to find good essay writing examples.

  27. Wow! Dieser Blog sieht genau gerade wwie min alter! Es ist auf
    einem vollständig unterschiedlichen Thema , aber es hat ziemlich die gleiche Layout
    und Design. Große Wahl der Farben!

  28. Icch wirklich beeindruckt von Ihrer Schreibfähigkeiten und
    miit dem Layouut auf ddem Blog. Ist das eine bezahlte Thesma oder haben Sie ändern itt yourself?
    So oder so Wie auch immjer weiter so schön ausgezeichnete
    Qualität Schreiben, es istt selten, ein schönes große siehe Bllog wie diese heute.

  29. Please let me know if you’re looking for a article
    author for your blog. You have some really good articles and I think I would be
    a good asset. If you ever want to take some of the load off, I’d love to write some material for
    your blog in exchange for a link back to mine. Please shoot me an e-mail
    if interested. Regards!

  30. Greate post. Keep posting such kind of information on your site.
    Im really impressed by it.[X-N-E-W-L-I-N-S-P-I-N-X]Hello there, You have performed
    a fantastic job. I will certainly digg it and personally
    suggest to my friends. I am sure they will be benefited from this web site.

  31. I кnow this website gives quality dependent articles or reviews
    and extra ԁata, is there ɑny other web paցe ԝgich gives
    sucһ data in quality?


Your email address will not be published.