How to do a 5 Step Site Audit

Sometimes, it can be overwhelming and difficult to look at a new site and know where to start.

Do I got straight into the code? Do I navigate around? Should I go straight into their backlink profile. I JUST DON’T KNOW!

Although these are all important, they may not give you a high-level understanding of the site and a clear picture of how the project should proceed. To tackle the “deer-in-the-headlights” approach I’ve put together 5 simple steps that will help you identify some common and crucial problems that are made. These steps are especially useful when you being speaking with a potential client and they want to know how you can help.

So away we go...

1) Duplicate Content

Why does it matter? Duplicate Content can be any site’s downfall. When there is more than one piece of the same content, it can make it difficult for search engines to distinguish between the original source and mere copies. This will then cause the search engines to present less relevant information, and more importantly to the site, lower rankings and traffic metrics.

How to check?

    - Copy a snippet of text
      - Paste the text in Google (or your search engine of choice :))

      - Analyze results!
    If you want to be sure there isn’t duplicate content on your domain (which you should definitely do!) combine the above method with a site search. By doing this, your query will only return instances of duplicate text from said site.

2) Check Robots.txt

Why does it matter? Utilizing robots.txt can be super helpful when you’re trying to block the search engines from crawling certain pages or folders. However, this function can be just as detrimental if you do not implement it correctly. To be certain that a site isn’t blocking themselves (I’ve seen this..a few times!) or any of their own important content or pages; check the robots.txt file of the website.

How to check? Super easy! Taking Distilled as an example:

3) View the Cache Version of the page

Why does it matter? You’ve heard of cloaking right?! Don’t do it. By showing different content or URLs to Google you’re asking for a world of pain to be brought down upon your rankings and traffic. My advice, don’t wake a sleeping bear (that’s a saying right?!)

This check is also useful because you can see the last time Google crawled a specific page. This is important because you want your site to be crawled as frequently as possible, while a long lapse in time can indicate a possible penalty or crawling issues.

How to check? Again, super easy!

4) Canonicalization

Why does it matter? Just like duplicate content, having multiple versions of the same content across multiple locations can decrease your rankings, spread out your link equity, or more simply devalue your site.

How to check? Easy as pie. Check all the common versions of the site are redirecting to the canonical version:

If the page loads the same content without redirecting then you’ve got some canonical issues, my friend. Don’t get too panicked though, this is a basic mistake that can be easily rectified.

5) Top Page Report

Why does it matter? You can often find missed opportunities by looking at the top pages report in Open Site Explorer(OSE). By doing this, you can see which pages have the most links pointing to them, as well as the response codes. For a site’s top pages, you want to ensure that they are in fact the site’s “top pages” (AKA most important) that return the right response code. If the page returns anything but a 200 or 301 that page might not be passing valuable link juice.

How to check? Input the site into OSE and analyze the top pages, linking root domains, and HTTP status codes.

Get to it!

So get to it! Hope this helps and remember, it’s as simple as pie. Specifically, pecan pie. mmmm :)

Julianne Staino

Julianne Staino

Julianne comes to Distilled after working in the digital studio of a Multicultural Advertising Agency. Looking to continue her love affair with SEO, she knew Distilled would be the perfect fit. Julianneh264 // Born and raised in Poughkeepsie...   read more

Get blog posts via email


  1. Nice post, Julianne! I do a lot of SEO audits so here are a few more quick thoughts...

    You mention it above, but a "site:" search is a quick and dirty method for seeing how many of your pages Google is throwing in the supplementary index. In addition to using a search engine to search for duplicate content snippets, you can also use a third-party service like Copyscape (or you can implement your own dupe/near-dupe detection, if you're really feeling motivated).

    I actually just wrote a post about robots.txt today that people might find helpful: A robots.txt File Guide That Won't Put You to Sleep. It covers the robots.txt basics, and it also contains resources to robots.txt generators and analyzers.

    Don't forget to add with and without a trailing slash to your canonicalization list (i.e., and

    reply >
  2. Thanks for posting this Julianne,

    I would add to the "Canonicalization" section an example with query parameters in the URL:
    It may also be a problem, and to "cure" it, a "canonic url" link tag can be used, or a 301 redirect.

    I would also double-check if having both http and https versions of the pages really is a canonicalization issue. If yes, then again the "canonic" tag should help. But, I do not think that Google will index both secure and regular versions.

    As for the robots.txt, your screenshot shows Google search box. I guess, you wanted to just type the URL and not search it up in Google...

    reply >
  3. Good post, 5 easy things to check whilst speaking to clients.

    Definitely for me the site: search is one of the most useful, especially when combined with other search filters like allintitle, inurl etc. Do a quick a site: search for a domain, if the home page is not number 1 then you may have a problem

    reply >
  4. These tips were a great place to start my SEO audit. I'm actually reviewing a new site right now, and this helped out a lot. Thanks Julianne!

    reply >
  5. Ian

    Hi Gregory

    Interesting comment - how would you add a canonical link tag to a link to overcome the query parameter?

    reply >
  6. I was drawn to your post because we do have some "issues" but don't know how to remedy all of them. We run a site that sells products, and as an example, one shipping container can be used to ship a variety of products, so we need to show that container with each different product that needs this container. We've used different descriptions for the Meta description but a LT Shipping Case w/wheels is always going to be a LT Shipping Case w/wheels, so how do you get around that? In addition, our manufacturers have for example, different modular product lines, and each of them has dozens of different options with stock numbers beginning with VK-2001 xyz product, then VK-2002 xyz product, etc. Because of that we have 1400 duplicate pages on our site.

    How can this be remedied? Thanks, Lowell

    reply >
  7. Lisa

    "Site's", not "sites". Possessive. I find it incredibly sad when really valuable content is marred by grammar, spelling, and punctuation errors.

    reply >
  8. Good tips Julianne. Some more important points would include 404 pages that must be handled. There are plenty of plugins that can help if your site uses a blogging framework like WordPress. Some people or businesses change their site (URL's, pages, categories, etc..) often enough to require a more vigilant approach to conserving inbound linking.

    This is a great wordpress plugin to help with redirects


    reply >
  9. Hi Julianne

    Thanks for the great post. In regards to Check One what's the best path to take when someone has duplicated your content, word for word, on their site?

    There is a particular blog, which has obviously been created for SEO purposes, for another site and they are using our content while linking back to their companies main site.

    Any advice would be appreciated.

    reply >

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>