Indexation Problems: Diagnosis using Google Webmaster Tools

Issues with page indexation (or lack of it) are typically faced by larger websites. I regularly run into questions about “How can I tell which pages aren’t indexed?” or “Which section of my site has indexation problems?” et cetera.

One tool that’s been moderately useful in this area has been the Sitemaps report in Google Webmaster Tools (WMT). In particular, there’s a tactic which can make this tool even more effective for diagnosing indexation problems.

For background: if your site is registered in WMT and you have registered your sitemap or sitemaps in there, then it will report on the percentage of pages in each sitemap that are indexed in Google.

Sitemap display in Google Webmaster Central

My recommendation is: take care in choosing your sitemaps; structure them so that the indexation figures are actually useful.

Knowing that the whole site has (for example) 62% of its pages in the index isn’t particularly useful, until you dive into finding out which types of pages are amongst the 38% that still need to be indexed.

Some Examples:

To show how this could work in practice, I’m going to use the structure below as a demonstration.

Example 1: Site Depth Indexation

For a site in the structure shown above, there might be a concern that you’re not flowing enough link juice to the product pages. Creating separate sitemaps for each horizontal level will help diagnose any problems that exist.

In this example, we have individual files for a categories sitemap, a sub-categories sitemap and a products sitemap.

In a structure such as this, we’d naturally expect the lower levels (i.e.: the products) to have a lower rate of indexation, but this analysis will highlight how serious any problems are.

Example 2: Site Category Indexation

Alternatively, you might suspect that some categories are being indexed better than others. This could be due to external link factors: if a job site has categories for ‘accountants’, ‘librarians’ and ‘ninja space-pirates’ then we’re likely to find that one category is better linked to and more thoroughly indexed than the others.

Creating these vertical sitemaps will immediately highlight if any categories are suffering from low indexation. This could then be tackled with more internal link weight, or by promoting them elsewhere to encourage external links.

Example 3: New Content Indexation

There’s often an issue on sites that publish lots of new content, where the latest content may not always be properly indexed. This technique can be used to diagnose this type of issue.

Add the latest content (the most recently published pages) to an individual sitemap, to check the indexation percentage. If you keep this sitemap constantly updated with the most recent content (e.g.: perhaps the 20 or 100 newest pages on the site) then you can keep checking over time to see if this value improves.

(N.B.: there are various tactics for dealing with situations where new content isn’t being indexed quickly enough; I’ll deal with these in a future post.)

Further Information

In terms of actually creating your sitemaps and adding them to Webmaster Central, all the information you need is within Google Webmaster Help pages. In particular, if you end up with lots of sitemaps (for instance: because you have millions or tens of millions of pages that need including) then you should consider creating ‘sitemap index files’ - essentially a list of sitemaps.

Your sitemaps don’t just need to be named ’sitemap1’, ’sitemap2’, etc - make life easy for yourself and choose names like ’product-list’ or ’pirate-jobs’ instead.

It’s also worth mentioning: don’t be scared by the XML formating guidelines for sitemaps; it’s perfectly legitimate to simply create and submit a text file with a list of URLs. Again: just make sure you read Google’s guidelines (linked to above) properly first.

I hope this gives you some insight in to the potential of using Google Webmaster Tools to help diagnose indexation problems - and I’m sure you’ll think of other creative ways of using the same technique. Leave a comment below if you have any questions - or if you are already doing this / something similar.

Rob Ousbey

Rob Ousbey

Rob joined Distilled’s London office in 2008 as an SEO Consultant. Over the years, he’s developed and executed SEO strategy for clients from small businesses to large organizations, and managed Distilled’s Reputation Management projects, where he’s...   read more

Get blog posts via email

8 Comments

  1. Nice post, I like the lateral thinking and the novel use of webmaster tools. I think if you combine this with analytics you will definitely get a better image of how a site is being indexed. Here is a link to a post on seomoz about the subject
    http://moz.com/blog/indexation-for-seo-real-numbers-in-5-easy-steps

    reply >
  2. I have been using this method to detect indexing problem for a long time now. Over the last 6 months though i have found the information Google is presenting to be wildly inaccurate.

    I have seen three accounts in the last couple of days that say they have 3 pages indexed from the dynamic sitemap when a site command returns 1000's

    Although the technique is good, at the moment i would conduct manual checks as well to try and verify Googles data is reporting correctly.

    reply >
  3. Great stuff, Rob. I use that same logic to dig in with the "site:" command, but I never though to set up sitemaps that way. Definitely going to give this one a try.

    reply >
  4. This has worked well for me on large sites with multi millions of pages. By structuring it in layers as above, you also get an idea of where to start fixing first, as you know which sets of pages are worth more to you

    hm, just a thought, how about submitting urls multiple times in different sitemaps? So I create a sitemap just for a bucket of otherwise unrelated high value urls, so that I can keep an easy eye on their indexation. (probably not the recommended use of sitemaps tho)

    reply >
  5. Very ingenious way to setup your webmaster tools sitemaps! I can see how you pinpoint track your indexed pages.

    John, I had a question though, what tool would you use to produce level sitemaps as you explained here? Xenu? Manual (altough it can be long on bigger websites)?

    reply >
  6. Jon

    Excellent post but a question... is it ok to duplicate URLs in different sitemaps or try to keep them listed only once? For instance, if I have a "New URLs Sitemap" that has new products in, is it ok for them to be listed in the "Products Sitemap" too?

    reply >
  7. I never thought about this approach!! Usually what I do is tweak the Title/Meta Desc so & when the change reflects in the SERP, I know the site has been indexed. If the change is in a inner page which doesn't rank, I use the above method combined with "site:" operator. Work's well for me.

    reply >
  8. I know this post is from more than a year ago, but I have the same doubt than Stephen and Jon over here and still haven´t found an answer, so lets see if anybody can help: Is it OK to submit the same URLs several times through several sitemap indexes?

    I mean, if you create several "mini-sitemaps" that then are included in several sitemap indexes, you'll be submiting the same URLs several times. Does Google allow that? (it's wonderful to analyse indexation if possible!)

    I set up a little test with few URLs and it seems ok in Webmaster Tools, but would like to know about any real case in a big website if anybody has tried.

    Thanks in advance!

    reply >

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>