New Google Webmaster Tools Keyphrase Data is 70% Useless

I love data. I really do. So you’d think that the latest announcement from Google about the new organic traffic impression, ranking and CTR data would be right up my street!

Unfortunately, as with so many of the other data points in Google Webmaster Tools the data they give you is shoddy, innacurate and largely a waste of your time to analyse it. Why 70%? It’s a number I plucked out of thin air. See how annoying innacurate data is?

Here’s the stats for the Distilled site in GWT:

And here’s the Google Analytics data for the same time period:

See the discrepency? Let’s compare some of the top keyphrases (impression/clickthrough data is from GWT, visit data from GA):

I’m not going to go into a big analysis of the two sets of numbers because they’re clearly quite a way off still.

Two interesting things however that I draw from this:

1) Find long-tail keyphrases

The number of keyphrases that GWT is reporting is larger than GA but quite a way - analysing what these keyphrases are which you’re getting impressions for but not traffic might be interesting. I say might because in actual fact once you get into the long-tail GWT just starts to obscure the data with “<10”. Well 0 is less than 10 right Google?

2) Ranking Factors A-go-go

The REALLY interesting thing for me with this data is that GWT has a good track record of releasing data on things which affect your rankings. Crawl issues, duplicate content, site speed. You see the pattern? And now they’re releasing CTR as a big metric. I strongly believe that CTR either already is, or will soon, start to impact your ranking position. Looking at this figure as an aggregate across your whole site and comparing to similar sites in your industry I think it would give Google quite a reasonable perspective on which sites are and are not “brands”. And if this is the case, then CLEARLY they’re not going to give you the exact data. So my view on this is that you should treat CTR seriously, but that you shouldn’t rely on the data in GWT too much (other than perhaps a benchmark over time?).

What’s everyone else’s opinon on this?

Note - a few things I think I should mention. Firstly, be careful about restricting your data to web search. Image search queries aren’t picked up by default within GA, you have to set up a special filter to grab these, so that might explain some discrepencies (I’ve taken that into account in the above data). Secondly, I’ve looked at a lot more data than just one site but I’m not sharing all of that here as most of them are client sites. Suffice to say I’ve yet to see anything accurate.