Using SEO Spider Data in Excel

We’ve been using Screaming Frog’s SEO Spider in the office for a few months now and, along with other spider software, it provides a great means of identifying technical and on-page issues quickly and easily.

My personal preference is to export the data to Excel where you can manipulate it however you like.  In addition, you can then email the spreadsheet to a client, development team etc. once you’ve highlighted the problem you’re trying to fix.

Here are some of the Excel features and formulas that I’ve been using to generate meaningful, actionable reports.

Sorting and Filtering

This is the first step that I always take once I’ve exported a crawl.  You can either hit Ctrl + T to turn the sheet into a table, or if you’d prefer just to Sort and Filter choose Data > Filter from the main menu.

Now you can pull out some really quick reports:

  • Filter the Content column to show only HTML pages (typically text/html; charset=utf-8).  You’ll want to apply this filter for the majority of your SEO work
  • Filter Status Codes to show 404 errors, 301 and 302 redirects.  This is an easy way to create a list of 404s that ought to be fixed/redirected
  • Filter Titles, Meta Descriptions, H1/H2 etc. to show blanks.  This gives you a list of pages where this information ought to be added
  • Sort Title Length or Meta Description Length in ascending order.  Any titles over 70 characters need shortening, and the same applies to meta descriptions over 155 characters
  • Sort Level in ascending order.  This should give you a rough idea of the hierarchy of your site.  Are there any key pages which are buried beyond level 3 or 4?
The list goes on – have a go yourself and you’ll see that there’s quite a lot you can do with sorting and filters.

Conditional Formatting

Another feature that can be used to highlight errors is conditional formatting.  The first example I’ll use is flagging up titles with a length over the 70 character limit.

Simply select the cells in the Title Length column and go to Home > Conditional Formatting > New Rule.  Now select Format only cells that contain and under the rule description you should have Cell value greater than 70 as shown below.

Hit Format… and choose something that pops – in this case I’ve got bold white text over a red fill.  Once you’re done, click OK and you should now be able to visually scan your page title lengths.  Obviously the same can be done for Meta Description Length above 155.

You could also apply conditional formatting to columns such as Size, Inlinks, and Outlinks.  The built in colour scales are a quick way to make the information in your spreadsheet easy to scan.  The example below uses Colour Scales to show pages with a high number of Outlinks.

Finding Duplicates

Screaming Frog’s SEO Spider generates a hash key for each page based on it’s content.  If two or more different URLs have the same hash key (and therefore the same content) your site most likely has duplicate content problems.

A quick formula is all that’s needed to find URLs where the hash key is duplicated.  Label the column to the right of the Hash column as Hash Duplicate and paste the following formula into the first cell:

=IF(COUNTIF($AA$3:$AA$10545,AA3)>1,“Yes”,“No”)

Double click the bottom right corner of the cell to copy the formula down through the rest of the spreadsheet.  You should now be able to filter this column to show only cells containing Yes.  Order the Hash column in ascending order to group the duplicate hash keys together.

A quick explanation of the formula: the countif() function counts the instances of the hash in the whole column.  If this is greater than one, we know that there are duplicates so the if() function outputs “Yes”.

The same technique can be applied to the Title and Meta Description columns to identify duplicates.

Finding URLs With Query Strings

If you’re not familiar with SEO friendly URLs, I’d recommend having a quick read of this page over on SEOmoz.  One of the most common characteristics of poor URLs is the use of query strings such as www.example.com/?category=example. You can easily find all of the URLs on your site which contain query strings if you use the following formula:

=IF(IFERROR(FIND(“?”,A3),“No”)=“No”,“No”,“Yes”)

Again, create a column to the right of the URL column and call it Query String.  Paste in the formula above and drag it down to the bottom of the sheet.  You should now be able to filter to show Yes giving you a list of URLs which you can improve using URL rewrites.

If you’re interested, the formula above uses the find() function to look for a ? in the URL.  If there isn’t one, a #VALUE! error is returned so a combination of iferror() and if() is used to output a yes/no answer.

Text to Columns

You can easily reveal the structure of a site by using Text to Columns on the list of URLs found by the spider tool.  Here’s the process:
  • Copy the list of URLs and paste them into a new sheet
  • Sort the list alphabetically
  • Go to Data > Text to Columns
  • Click Next
  • In Delimiters, click Other and enter /
  • Hit Finish
  • You can get rid of the first 2 columns.
Hit Ctrl + T to turn the sheet into a table and you can now sort and filter.  For example, if your blog post URLs contain a date, you could filter to show only pages in /blog/ and order by the date to show how often you’ve been publishing content.

You could also create a chart from this information.  Take the first column after your domain for example – copy and paste it into a new column and click Remove Duplicates in the data menu.  Now run a countif() for each row against the original column, which should look something like this:

=COUNTIF(B:B,J3)

This gives you a count of the number of pages in each top level subdirectory on the site, which you can create a pie chart from for example.

You could use this as a comparison against your competitors – are they more prolific bloggers for example?  Or do they have a faceted navigation that generates lots of long tail pages?

The techniques outlines above are just a snapshot of what is possible using these two tools.  If you have any further ideas about how you might manipulate crawl data, I’d love to hear about them in the comments.

 

 

 

Get blog posts via email

11 Comments

  1. Screaming Frog certainly is a great piece of kit. Along with Xenu and GSiteCrawler, these are 3 apps that provide some fantastic insights for SEOs.

    This tutorial is really useful for anyone wishing to dig deeper using Excel. Well done and thank you for sharing.

    reply >
  2. Thank you! Very good article, I will try.

    reply >
  3. Hey Rob,

    Thanks for the mention in the post, some very cool tips on using data from the spider.

    Will be pointing a few people in the direction of this post :-)

    Cheers,

    Dan

    reply >
  4. rob.millard

    Thanks guys - glad you liked it.

    reply >
  5. euan

    Hi Rob, informative post; Screaming Frog is one of my favourite tools - using the custom filter to find pages missing Google Analytics codes saved me a lot of time recently.

    One question on your post though; can i double check the formula to find duplicate hash tags is correct? I cant get it to work properly.

    Copying it into the column to the right of the hash column seems to reference column "AA" which is the outlinks column. Column AC is the hash column but changing the reference to AC still doesn't work.

    I'm a bit of a luddard when it comes to Excel but I normally use it's conditional formatting to identify duplicate values then sort by the cell colour to group all the duplicates together. Your method would save a bit of time IF i can get it working

    reply >
  6. How is Scream Frog any better than you sister site, SEOMOZ's Pro Dashboard? Seems like it's pulling the same data....

    reply >
  7. This is a great post.

    Thanks for sharing the duplicate content detection process - great to know that Screaming Frog does all that great stuff!

    reply >
  8. I have been using screaming frog for a while now and there are some great ideas in here. I particularly like the duplicate content one thanks for the post.

    reply >
  9. rob.millard

    Hi Kevin,
    Let me break it down for you:
    =IF(COUNTIF($AA$3:$AA$10545,AA3)>1,”Yes”,”No”)

    Essentially it's just an if statement:
    =if(there are more than one instances of this hash in the column, print Yes, otherwise print No)

    The countif part works like this:
    =COUNTIF(column,cell)

    Make sure those references are correct. The column should be the hash column, and make sure that the cell is referencing the right one. You should use $s on the column reference because it's fixed.

    Maybe you just don't have any duplicates? Try adding some test data in, such as copy and pasting some values so that there definitely are duplicates.

    Let me know if this doesn't help!

    reply >
  10. I've been working on some Excel hacks for my Screaming Frog data for weeks...You just simplified things perfectly for me. Excellent article - thank you!

    Another tip - If, like me, you want to use this spreadsheet to edit any problem page titles or descriptions you found in your crawl, it helps to add new columns next to both the Title & Meta Descriptions. Then, just apply the =LEN formula to those cells & you can count character length automatically as you write out the new text.

    Cheers!

    reply >

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>