Benchmarking your web traffic

Internet traffic is a funny thing, and sometimes things happen to your traffic you don’t understand. Your industry’s traffic may be growing, and almost certainly includes some seasonality. This post suggests a method to look past the traffic’s trends, and instead tells you if you are beating the competition. Better metrics, that’s what we’re after.

As we all know, internet traffic changes over time and can be very seasonal. That’s true for Santa’s websites, but also for yours. However, many of us determine our online performance by looking at our raw traffic. In this post, I advocate benchmarking your traffic to the industry you are in. This allows you to look past the seasonality and overall growth / contraction of your industry encounters, and instead lets you focus on your market share of relevant internet traffic. The problem is to figure out how that traffic changes.

The basic method used here was invented by Phil Nottingham, and goes along the same lines as a CPI or RPI index to measure inflation. The CPI picks a weighted basket of goods, indexes the basket at a certain point in time and sees how that changes. Anything from political decisions, to wage negotiations, are based on those indexes.

For websites, I advocate using a basket of keywords. The basic approach in this post is to pick a significant range of keywords from your industry, and look how their Google searches vary over time. No weighting required, as more search volume should proportionally indicate more traffic. We presume the searches accurately reflect internet traffic, and so we add all the searches together and compare its trends with your own traffic. 

Let’s start at the beginning:

 

Constructing a keyword list 

 

The basket of keywords should contain a larger spread than just the obvious few terms. It should be reasonably long tail and include traffic that would go to your competitor but not you. Therefore, it is more than just the list of products you sell.

During this post, I will refer to my own investigation into the Consumer Electronics industry as an example. To start my keyword list, I looked at all the categories eBay have on their site (which can be found in Excel format here. I determined which top level categories fitted inside ‘Consumer Electronics’, and then had Excel extract a list of all categories and sub category titles. By doing so, I was then given a list which included the top level keywords like ‘Camera’, the long tail ‘8-Track Players’ and more problematic terms such as ‘translators’. In order to ensure my investigation excluded anything to do with terms such as the translation industry, I manually deleted anything whose traffic could be due to a different meaning from the one my list intended. In addition, I assembled a list of brands and manufacturers for consumer electronic products and combined everything I had left. This resulted in a total of 1 275 terms.

 

We’ve got the list, now what do we do?

 

 Google runs two platforms to show how their search queries change over time: Google Insights and Google Trends. Google Insights can handle more keywords, but only outputs search volumes normalised as a percentage of total searches. As the number of searches changes over time and we want to compare it with our absolute data, we need to use Google trends.

Google trends only accepts five entries at a time and benchmarks their search volume against an average volume metric of the first term. As five terms is not enough, we will be benchmarking different searches against each other and this post will refer to the different trends search results as ‘data sets’. The benchmarking requires a crossover in terms of keywords, so we can only put four new terms into every such data set.

In my example, having to do a different search for every four keywords resulted in investigating all 1 275 keywords. To cut down, I could just pick the most significant terms (after all, those give most traffic, right?) However, if people move away from large keywords (because they know what they want to buy just before Christmas), or if your site is better represented by the long tail (for whatever reason), or if people move from traditionally significant, older keywords like DVDs to newer ones which are less likely to have been deemed significant, our estimate no longer reflects on all traffic. Therefore, presuming you are going to sample only a selection of your total amount of keywords, it is better to randomly select as many keywords as you want. For example, I ran the analysis with fifty terms several times. Also, when picking your keyword basket size, you should realize Google Trends doesn’t give figures for low volume searches, due to privacy concerns, so some keywords will be rendered useless.

If your list is in Excel, it is easy to randomise using the =RAND() function which creates a random number every time a calculation is performed. I filled the column next to my keyword list with the RAND function and then sorted the keywords by the size of the randomly generated numbers. In this case, I picked 50, however you can use as many as you would like to sample.

 

I’ve got my list, can I put it inside Google Trends yet?

 

Nope. Because Google Trends’ only accepts five terms and its accuracy isn’t great, we want to minimize the error we’ll obtain due to making all the data sets (each search of five terms) comparable. This means we want each of the terms in the data sets to have nearly as much search volume as the others. Therefore, we sort the keywords using Google Adwords Keyword Tool first.

Now in my example, I looked at traffic around the world. However, if you sell just in the US, you may want to confine your Google Trends investigation to the US and, therefore, your research here should also be confined to the US. In either scenario, enter your list of keywords, sort by whichever search volume criteria you are interested in and export the sorted order. For worldwide analysis, they don’t give any figures for ~monthly searches under 10 000 so exclude these from your new sampled keyword list (this reduced my list from 50 to 41 keywords).

Now Google Trends likes to have a comma in between the different terms. In order to minimize the amount of manual effort required later, I attached a comma and a space to every term through Excel (if the keyword is in cell A6, just use =A6&”, “).

 

Everything’s set. Now, let’s get some data.

 

Here we start copying everything into Google Trends. I have not figured out a method of scaling this, and, therefore, I started from the top of my sorted list. Select the top five, copy, put them into Google Trends, ensure you’ve got the right timeframe/location set up, and “Search Trends”. Now, download the output data using the “Export this page as a CSV file“ option at the bottom and choose “CSV with fixed scaling”. Repeat until finished but only ever including the lowest of the last five just searched and the four next new keywords (so the keyword which is being repeated changes every repetition).

As I had 41 keywords to put in, this took me 8 CSV sheets. I now propose you put it all into one excel file. As an example, I have included my Excel file: Benchmarking Your Site Traffic - Results.

 

So I’ve got the data. How do I make this usable? 

 

As you’ll be aware, the data from the different sheets has to be rescaled so that they become comparable. The earlier keyword repetition in the different data sets was required for this reason. If anything below is confusing, just refer to my attached example, where I’ve done all of that.

First of all, I’d recommend putting all the useful data into one sheet, where I refer to the data next to the dates. The standard error figures aren’t useful but I found it easier to make them part of the copying process (this way the entire section can be copied). I’d recommend leaving two columns in between the data sets.

Then you need to determine what data set to benchmark from. On one hand, every time the data is scaled, the quality degrades, (due to the granularity Google Trends exports with) which would argue for picking the middle data set. On the other hand, the data of the more often searched keywords is more significant. As a compromise, I picked the set which contains the keyword which is one quarter down, when the list is sorted by size.

Copy down the scaled search volumes of this set of data and find the multiplier needed for the keywords on either side of the data set. To do this, we realise that the right-most of this data set contains the same keyword as the leftmost of the next. Now for every date, find the ratio that would multiply the next data set as to make the keyword volume the same in both data sets, for the same keyword. Then average this over all the values and multiply the entire non-yet copied down set by this figure. I placed this next to the previous, copied down set. Then repeat this for the next set, remembering you are comparing it to the newly calculated values below, instead of the unadjusted ones from Google trends. Then just repeat.

 

My data is now beautifully scaled, so it can be used aggregated. What now?

 

You now want to sum the values but you can’t do a simple sum across the entire row (as you’d be counting some keywords twice). Therefore, copy the values down once more but every keyword only once, then sum across the lines to find the total variation.

Now, I like to index these things to a base 100 but that is not required. I have repeated the analysis with a different set of keywords as well, and both results are shown below (referred to as runs):

 

Although the general trend is the same, this graph indicates a problem with my results. The second run included a number of spikes which were not reflected in the first run. By looking through my data, I found that they were almost all caused by Apple (which was randomly selected to be part of my first data set but not my second). Therefore, splitting the first data set with apple out, I get the following graph:

 

This shows much stronger similarity between the two tests. Though the previous graph shows this as well, it is important to look through your data if you are extrapolating from a relatively small sample as I did. Apple launched important products around those spikes and through it being quite a significant keyword in my analysis has disproportionality affected my results.

To realise such things, it can be useful to look at the variance of the benchmarked results per keyword, though only the largest volume terms will be significant. Apple’s variance was a magnitude larger than any of the other variances and this is a way one could pick up such an effect without repeating the analysis or delving into every keyword.

 

I’ve got my indexed results. Now what?

 

You now have an indexed set of results although you may have separated out some items. If you’ve separated out anything you do find important, I recommend deciding on a weighting factor (by looking at how important the category the term refers to for your industry’s traffic). As that information may not be readily available, another option would be to look at the section of your site which refers to the category the separated keyword talks about and add the keyword into the main index, using the ratio from the traffic caused by that keyword’s category to the overall site’s traffic.

Below is an example (where I have adjusted the “Apple” keyword, weighing it to 10% of the total traffic). This is all included in my example Excel sheet.

We have now finalised our index concerning the industry’s traffic movements over time. You can compare your traffic to this index, by exporting it into Excel (for example through GA). You now realise when you are growing because of your own, effective efforts, or if you are actually losing traffic market share, and will be worse placed in the long run. No more uncertainty concerning if the traffic boom is really due to yourself, and better metrics. Now let’s see if you are beating the competition.

If you have any questions or feedback from the post on how to benchmark your web traffic, I’d love to hear it - leave your comments below. 

Get blog posts via email

8 Comments

  1. A lot of work,

    What your article really calls for is a creation of Fortune 500 Keyword Lists, for major industries.

    By doing this yourself. I see the major challenge would be developing and maintaining a coherent keyword list, avoiding the outliers (like your Apple example.) Manually corrections & dealing with the below 10K level due to seasonal stuff.

    Not to mention what criteria would you use to add & subtract words from this index? What happens if you've totally missed the mark and have banked on the wrong keyword traffic? Or your customer embarks on a different course of action midway?

    Adwords has benchmarks in that product, where you can compare your performance against other advertisers and see how you're doing(CTR, Impressions etc) . I wish Google could offer this same transparency on the Analytics/Web Master Tool side.

    PERSONAL TREND LINE?

    It would be so nice, If Google could do the heavy lifting for us in Google Analytics or Web Master Tools. Take the top 50,100,200 words as reported by our personal Analytics/WMT Account. Show the 90 day trend line for those words Globally, Regionally, City Level etc across the board. You could then set up alerts if your traffic is going north or south of the industry trend as it relates to your current sites traffic.

    INDUSTRY TREND LINE?
    Provide industry benchmarks as a segment..based on what Google determines as that segments leading words.

    I really like your method but how many of us have time of day or automated tools,
    to do this?

    Searchengineman

    reply >
  2. Melissa

    There has to be a faster easier way. Somebody build it! I'll buy it!

    reply >
  3. This is really neat. I wish there was a way to get search volume for more than 5 keywords at a time.

    reply >
  4. I am jealous that you guys have the time to spend on running an analysis like this ;-) Did you get much practical use out of it?
    I know clients obsess about their position in the market but is this really actionable data for marketers? and is it really reliable enough? It is just about the best you can get in terms of trends from Google data but perhaps using PPC impression data from a client over a long period would be more accurate?

    reply >
  5. Johann CR

    Neat trick ! Really like it.

    Got a few remarks though :
    - You say "Repeat until finished but only ever including the lowest of the last five just searched and the four next new keywords".
    The problem is with some low search volume keywords, we'll only have a truckload of zeros and no data at all, that'd be problematic to calculate the rescaling factor... I'd try including the highest of the last five, or at least the lowest non-zero.
    And besides, using the high search volume keywords for the rescaling would be more precise wouldn't it ?


    Why change the repeated keyword each time ? It would be simpler to repeat the same one to facilitate the rescaling afterwards. Is it to lessen the degrading ?


    Thanks for this great article !

    reply >
  6. I think this is a really good way of measurign your performance against the wider environment on a budget, however if you're lucky enough to have a hitwise account then I'd recommend using that as it's quicker and much more accurate.

    reply >
  7. Drew

    I have done something fairly similar using SEM Rush data. You can get trends data for the 12 previous months for as many keywords as you need in one go. Worked a treat.

    reply >
  8. Hey,

    Thats an interesting article. I guess we should use this method.

    Thank you

    reply >

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>