Issues with page indexation (or lack of it) are typically faced by larger websites. I regularly run into questions about “How can I tell which pages aren’t indexed?” or “Which section of my site has indexation problems?” et cetera.
One tool that’s been moderately useful in this area has been the Sitemaps report in Google Webmaster Tools (WMT). In particular, there’s a tactic which can make this tool even more effective for diagnosing indexation problems.
For background: if your site is registered in WMT and you have registered your sitemap or sitemaps in there, then it will report on the percentage of pages in each sitemap that are indexed in Google.
Knowing that the whole site has (for example) 62% of its pages in the index isn’t particularly useful, until you dive into finding out which types of pages are amongst the 38% that still need to be indexed.
To show how this could work in practice, I’m going to use the structure below as a demonstration.
Example 1: Site Depth Indexation
For a site in the structure shown above, there might be a concern that you’re not flowing enough link juice to the product pages. Creating separate sitemaps for each horizontal level will help diagnose any problems that exist.
In this example, we have individual files for a categories sitemap, a sub-categories sitemap and a products sitemap.
In a structure such as this, we’d naturally expect the lower levels (i.e.: the products) to have a lower rate of indexation, but this analysis will highlight how serious any problems are.
Example 2: Site Category Indexation
Alternatively, you might suspect that some categories are being indexed better than others. This could be due to external link factors: if a job site has categories for ‘accountants’, ‘librarians’ and ‘ninja space-pirates‘ then we’re likely to find that one category is better linked to and more thoroughly indexed than the others.
Creating these vertical sitemaps will immediately highlight if any categories are suffering from low indexation. This could then be tackled with more internal link weight, or by promoting them elsewhere to encourage external links.
Example 3: New Content Indexation
There’s often an issue on sites that publish lots of new content, where the latest content may not always be properly indexed. This technique can be used to diagnose this type of issue.
Add the latest content (the most recently published pages) to an individual sitemap, to check the indexation percentage. If you keep this sitemap constantly updated with the most recent content (e.g.: perhaps the 20 or 100 newest pages on the site) then you can keep checking over time to see if this value improves.
(N.B.: there are various tactics for dealing with situations where new content isn’t being indexed quickly enough; I’ll deal with these in a future post.)
In terms of actually creating your sitemaps and adding them to Webmaster Central, all the information you need is within Google Webmaster Help pages. In particular, if you end up with lots of sitemaps (for instance: because you have millions or tens of millions of pages that need including) then you should consider creating ‘sitemap index files‘ – essentially a list of sitemaps.
Your sitemaps don’t just need to be named ‘sitemap1′, ‘sitemap2′, etc – make life easy for yourself and choose names like ‘product-list’ or ‘pirate-jobs’ instead.
It’s also worth mentioning: don’t be scared by the XML formating guidelines for sitemaps; it’s perfectly legitimate to simply create and submit a text file with a list of URLs. Again: just make sure you read Google’s guidelines (linked to above) properly first.
I hope this gives you some insight in to the potential of using Google Webmaster Tools to help diagnose indexation problems – and I’m sure you’ll think of other creative ways of using the same technique. Leave a comment below if you have any questions – or if you are already doing this / something similar.
Rob Ousbey VP Operations - Seattle