If you don’t know what Screaming Frog is, (then where have you been!?) It’s a tool that lets you crawl an entire site on demand. But that’s not all it’s good for...I was recently working on a clients site and I found myself turning to the tool more and more for tasks other than the standard crawl feature. I thought it might be useful to put together a list of other ways to use Screaming Frog.
1 - Verifying Google Webmaster Tools Crawl Errors
I recently wrote a post on how to do a site audit using Webmaster Tools. I’m a big fan of its features, but I find that the GWT doesn’t update the crawl errors frequently enough. This means there are often things like 404s reported when they’ve actually already been fixed. I’ve been using Screaming Frog to solve this problem. Below is my new process for 404s and other common errors.
- Crawl the entire site and put all 404s into a spreadsheet.
- Download all 404s from Google Webmaster tools and put them into the same spreadsheet.
- Remove duplicates.
- Copy all URLs into a text document and save it as 404.txt
- Using the Screaming Frog list mode, upload the 404.txt document and start the crawl
You can then export all the remaining 404s and fix them.
2 - Checking If a Site Migration was Successful
A site I was working on recently changed their URL structure. For a couple of reasons, some of the URLs were not redirecting correctly. The list in Screaming Frog mode came in useful for checking which URLs were not redirecting correctly.
I got the client to send me a list of all the old URLs and followed the same process as above to find out which of the URLs were returning a 404. I then downloaded all the URLs with the problem and passed them to the developer to fix. This made identifying the problem really easy.
3 - Finding Infinite Links
Sometimes websites that use relative URLs can create never ending chains of links. Again, this recently happened on a client’s site. They were using relative URLs everywhere except some pages on the blog.
This meant that sometimes when they linked to a page on the blog, it was being appended to the existing URL. For example:
http://www.example.com linking to www.example.com/page1
Was creating http://www.example.com/www.example.com/page1
This was causing infinite lists of URLs. This means that search engines could be wasting their time crawling pages that technically don’t exist. Because this wasn’t the case on every page, I had to identify where on the site the issue was. When looking at some of the examples, the cause was using links that didn’t include the “http”. To find where this was happening I used the “custom” feature.
This is under the configuration tab. I asked it to include only pages that included the html :
This then returned all of the pages that were linking to other pages in this way.
4 - Checking a List of Links
During processes of outreach, you often end up creating a large list of pages you are expecting links from. Going through each one to check that the link actually exists on the page can be a tiring job. To speed up the process, the Screaming Frog list mode lets you check a stream of URLs very quickly. There is already a post on the Screaming Frog blog on how to do this: Auditing Backlinks Using Screaming Frog.
5 - Creating a Sitemap
Screaming Frog makes creating an XML sitemap really easy, but it’s important that you set up the crawl correctly before you start. If you don’t limit the spider to crawl only those pages that you want in your sitemap, you can end up with a bunch or URLs that shouldn’t be in there. An example of this is with Wordpress, which I discovered when I crawled my site.
A common problem with Wordpress is that it creates pages like http://www.craigbradford.co.uk/about-craig/?replytocom=12 when people leave comments.
I don’t want these pages indexed and definitely not in my sitemap, so I can use the exclude tool (which is under the configuration menu), to ensure anything with this style of URL tail is excluded.
Once you have set up the configuration, let Screaming Frog compete a full crawl of the site. Once complete you have the option to export the sitemap. Under the main navigation go to “Export” then select sitemap. You can then upload it to your site and submit it through webmaster tools.
6 - Check Sitemap for Errors
Duane Forrester from Bing recently said that Bing allow a 1% level of ‘dirt’ in a sitemap. “Dirt” could be anything from 404 errors to redirects.
Screaming Frog can be used to keep your sitemap clean and healthy. If you have the XML file like shown above, you can simply change to list mode and upload the XML file. Screaming frog can then crawl all of the URLs and tell you if there are any errors such as 404s or pages that are redirecting.
7 - Using Screaming Frog for Linkbuilding
When doing outreach, I often find it easier if I first contact a link target through something like Twitter.
Taking this one step further, an easy and innocent way to get on someones radar would be to crawl their site for them, find a blog post that returns a 404 and tell them about it.
Assuming you don’t use an SEO profile, this is a good way to be nice and draw attention to the fact that you read their blog. Now, when you actually do contact them for outreach, it’s not out of nowhere and they’ll at least recognise your name and face.
8 - IP and User Agent Redirection
Two features that don’t get as much attention are the proxy option and the ability to change user agent. Taking them in order, the proxy feature can be useful for clients that are using IP based redirects. To see what’s going on you can use buy a set of international proxies or else try some of the free ones at Hide My Ass. You can them compare the results. To use this, select “Configure” then “Proxy”.
Tick the box for “User Proxy Server” and enter your proxy details. When you crawl the site now, it will be using the international address instead of yours. If you are going to do this, I would recommend paying for private proxies as the free ones can be quite temperamental.
Changing user agent can be useful for checking if websites are treating search engine crawlers such as Googlebot differently to users. It can also pick up if robots.txt is explicitly blocking certain content from individual search engines. To use this feature just select “Configuration” and then “User agent”, it doesn’t get much easier than that.
That’s all folks! I’m sure there are Lots of other ways to use Screaming Frog or other crawlers, if you have any tips, please leave a comment and I’ll update the post with any other tips. If you have any questions, you can get me on Twitter @CraigBradford.