Sometimes, it can be overwhelming and difficult to look at a new site and know where to start.
Do I got straight into the code? Do I navigate around? Should I go straight into their backlink profile. I JUST DON’T KNOW!
Although these are all important, they may not give you a high-level understanding of the site and a clear picture of how the project should proceed. To tackle the “deer-in-the-headlights” approach I’ve put together 5 simple steps that will help you identify some common and crucial problems that are made. These steps are especially useful when you being speaking with a potential client and they want to know how you can help.
So away we go…
1) Duplicate Content
Why does it matter? Duplicate Content can be any site’s downfall. When there is more than one piece of the same content, it can make it difficult for search engines to distinguish between the original source and mere copies. This will then cause the search engines to present less relevant information, and more importantly to the site, lower rankings and traffic metrics.
How to check?
-
- Copy a snippet of text
- - Paste the text in Google (or your search engine of choice
- - Analyze results!
2) Check Robots.txt
Why does it matter? Utilizing robots.txt can be super helpful when you’re trying to block the search engines from crawling certain pages or folders. However, this function can be just as detrimental if you do not implement it correctly. To be certain that a site isn’t blocking themselves (I’ve seen this..a few times!) or any of their own important content or pages; check the robots.txt file of the website.
How to check?
Super easy! Taking Distilled as an example:

3) View the Cache Version of the page
Why does it matter?
You’ve heard of
This check is also useful because you can see the last time Google crawled a specific page. This is important because you want your site to be crawled as frequently as possible, while a long lapse in time can indicate a possible penalty or crawling issues.
How to check?
Again, super easy!


4) Canonicalization
Why does it matter? Just like duplicate content, having multiple versions of the same content across multiple locations can decrease your rankings, spread out your link equity, or more simply devalue your site.
How to check? Easy as pie. Check all the common versions of the site are redirecting to the canonical version:
- http://www.example.com
- http://example.com
- http://www.example.com/index.html
- http://example.com/index.html
- http://www.example.com/INDEX.html
- https://www.example.com/index.html
If the page loads the same content without redirecting then you’ve got some canonical issues, my friend. Don’t get too panicked though, this is a basic mistake that can be easily rectified.
5) Top Page Report
Why does it matter?
You can often find missed opportunities by looking at the top pages report in
How to check?
Input the site into OSE and analyze the top pages, linking root domains, and HTTP status codes.

Get to it!
So get to it! Hope this helps and remember, it’s as simple as pie. Specifically, pecan pie. mmmm

Julianne Staino is an SEO Analyst at Distilled NYC. She loves learning new SEO techniques and creative approaches to link building.
Nice post, Julianne! I do a lot of SEO audits so here are a few more quick thoughts…
You mention it above, but a “site:” search is a quick and dirty method for seeing how many of your pages Google is throwing in the supplementary index. In addition to using a search engine to search for duplicate content snippets, you can also use a third-party service like Copyscape (or you can implement your own dupe/near-dupe detection, if you’re really feeling motivated).
I actually just wrote a post about robots.txt today that people might find helpful: A robots.txt File Guide That Won’t Put You to Sleep. It covers the robots.txt basics, and it also contains resources to robots.txt generators and analyzers.
Don’t forget to add with and without a trailing slash to your canonicalization list (i.e., yoursite.com/ and yoursite.com).
Thanks Steve! All great additions!
Thanks for posting this Julianne,
I would add to the “Canonicalization” section an example with query parameters in the URL:
http://www.example.com?referred-by=affiliate-idIt may also be a problem, and to “cure” it, a “canonic url” link tag can be used, or a 301 redirect.I would also double-check if having both http and https versions of the pages really is a canonicalization issue. If yes, then again the “canonic” tag should help. But, I do not think that Google will index both secure and regular versions.
As for the robots.txt, your screenshot shows Google search box. I guess, you wanted to just type the URL and not search it up in Google…
Good post, 5 easy things to check whilst speaking to clients.
Definitely for me the site: search is one of the most useful, especially when combined with other search filters like allintitle, inurl etc. Do a quick a site: search for a domain, if the home page is not number 1 then you may have a problem
These tips were a great place to start my SEO audit. I’m actually reviewing a new site right now, and this helped out a lot. Thanks Julianne!
Hi Gregory
Interesting comment – how would you add a canonical link tag to a link to overcome the query parameter?
I was drawn to your post because we do have some “issues” but don’t know how to remedy all of them. We run a site that sells products, and as an example, one shipping container can be used to ship a variety of products, so we need to show that container with each different product that needs this container. We’ve used different descriptions for the Meta description but a LT Shipping Case w/wheels is always going to be a LT Shipping Case w/wheels, so how do you get around that? In addition, our manufacturers have for example, different modular product lines, and each of them has dozens of different options with stock numbers beginning with VK-2001 xyz product, then VK-2002 xyz product, etc. Because of that we have 1400 duplicate pages on our site.
How can this be remedied? Thanks, Lowell
“Site’s”, not “sites”. Possessive. I find it incredibly sad when really valuable content is marred by grammar, spelling, and punctuation errors.
Fixed.
Good tips Julianne. Some more important points would include 404 pages that must be handled. There are plenty of plugins that can help if your site uses a blogging framework like WordPress. Some people or businesses change their site (URL’s, pages, categories, etc..) often enough to require a more vigilant approach to conserving inbound linking.
This is a great wordpress plugin to help with redirects http://wordpress.org/extend/plugins/404-redirected/
Best, Henry http://henryjawhary.com
Hi Julianne
Thanks for the great post. In regards to Check One what’s the best path to take when someone has duplicated your content, word for word, on their site?
There is a particular blog, which has obviously been created for SEO purposes, for another site and they are using our content while linking back to their companies main site.
Any advice would be appreciated.