I have been receiving a lot of questions recently—from clients and in other private forums—about the Penguin update and where to start in a Penguin analysis. So to set expectations off the bat, this is not going to be an advanced post of Penguin theory, but rather what to look for in your Analytics and Webmaster Tools to see where the drop occurred. This includes visualizations of factors that I have seen, and how to check for links that might be harming you.
We’ll discuss what Penguin is, how to diagnose a Penguin traffic drop, how to check the toolbar PageRank (TBPR) or Page Authority (PA) of links, how to check indexation of links quickly, and then give some overview strategies about how to approach different Penguin scenarios.
Let’s get started, shall we?
What Penguin Is
First, we need to explore what Penguin is (and is not) and how it manifests itself.
Penguin was first rolled out on April 24th (and reported about here by Danny Sullivan). It’s the “over-optimization” penalty that we had all been waiting on to drop since Matt Cutts seemed to talk about it in March at SXSW.
Think of Penguin as Panda’s link cousin. Panda targeted low-quality content onsite. Penguin seems to be targeting overly aggressive anchor text (both internally and externally), especially from low-quality sources. There has also been mention of Penguin targeting keyword stuffing, which keeps with the “over-optimization” theme.
…such as this from the geniuses over at SEOipsum
We have since found out that Penguin operates like Panda in that it rolls out in iterations (as evidenced by Penguin 1.1 rolling out in late May), with some sites reporting recovery and new sites being hit. This is a classic iteration algorithm, meaning even if you make changes to your site quickly after being hit, you have to wait until the next iteration (most likely) to see whether or not you made the right changes.
Since Penguin is an algorithm, not a penalty, it does not help to submit a reinclusion request (though you can submit your site using this form if you believe you did not deserve to be hit).
How to Diagnose Penguin
As mentioned above, Penguin manifests itself by targeting overly-aggressive anchor text, both externally and internally, especially over-optimized anchors on low quality sites (which is why WPMU was hit according to Ross Hudgens). We must be careful when diagnosing a Penguin hit, though, because Google also started deindexing free directories, especially directory networks, in mid-May. When trying to diagnose a traffic drop the following indicators should be looked at (in this order):
- Does the drop in traffic coincide with one of the reported dates (so far April 24 and May 25)?
- Is it a site-wide drop or does it seem to be keyword-specific?
- Did you receive a notice in Webmaster Tools at a different time that could signify that it’s a penalty (as opposed to an algorithm)?
The best way to check these is to first check in Analytics (and then verify in Webmaster Tools based on keyword and overall site queries). I’m going to show you how to do both of these, or at least how I check what has dropped. We are going to use some Excel wizardry to do this, but just easy Pivot Tables and a few simple formulas that I will provide you.
Let’s look at a Penguin-ized site.
Remember, these are the links that Penguin loves to munch on :
- On low quality sites (low PR)
- Exact anchors
- Over-optimized exact anchors
- Too many exact anchors over branded terms
We are going to look at Analytics to determine when the date happened, to determine which keywords dropped, and then we’ll investigate using OSE/Majestic and Excel.
Step 1 – Analytics
The first step, of course, is using Analytics to slice data around the dates that Penguin rolled out to determine if the drop happened around the same time as the updates. We’re going to use the following views:
- From March 1 to June 9
- From April 29-May 12th compared against April 8-21 (Penguin 1)
- From May 27-June 9th compared against May 6-19 (Penguin 1.1)
Look at March 1-June 9
We want to look at March 1-June 9th to get a long-term view to see if a noticeable drop has occurred. Here is what you might see if a drop occurred around Penguin (notice April 24th is highlighted):
There is a pretty obvious drop there, even though the site is low-traffic. You can easily see the difference in traffic levels before and after April 24th.
Compare Traffic Levels from 2 weeks before and two weeks after
Now you should compare two weeks prior to Penguin 1 and two weeks post Penguin 1, allowing a few days of buffer on both sides to give the algorithm time to shake out. Here is what you might see:
Check out the drop in visits. In this case, it’s over 52% between before and after the update. A pretty clear drop-off!
Compare Two Weeks Before and After Penguin 1.1
Since Penguin 1.1 rolled out on May 25th, we can now also see if the site was hit again (or possibly regained some traffic because of efforts made). Here is a screenshot of a site that did not recover, and may have even been hit a bit again (for a variety of reasons):
“Alright”, you’re probably asking, “so my site got hit. I see that. Thanks. Now, what can I DO about it?”
Investigating WHAT Dropped
Now that we know there has been a drop, we’re going to investigate WHAT dropped. We’re going to use a combination of Analytics and link data, sourced from OpenSiteExplorer AND MajesticSEO (both if possible).
Step 2 – Which Keywords Dropped?
In Analytics, using Traffic Sources > Sources > Search > Organic, you can see the difference in traffic being driven from each keyword between time periods. Since you’ve been optimizing for your highest-value and traffic keywords, the keywords that drop the most should be near the top. Penguin seems to be a keyword specific algorithm, not a sitewide algorithm. Here is an example of what you might see (I have had to black out the specific keywords here for confidentiality purposes):
These keywords are where you start. So download them in CSV format.
Check in Webmaster Tools
Once you have seen which terms have dropped, I also recommend checking them in Webmaster Tools on a keyword level. To do this, go to Search Queries:
By adjusting the date range back as far as it can go (currently 90 days) you can see if there has been a drop according to Google as well:
Find the keyword that seem to have dropped on that page and click through to it. If you see something like this, then you know you have lost visibility for the term:
Step 3 – Download Anchor Text Data from OpenSiteExplorer and/or MajesticSEO
Next you need to download the anchor text distribution data available in OpenSiteExplorer and/or MajesticSEO, depending on which one you have full access to. I am going to use OpenSiteExplorer here.
What you need to download is the anchor text distribution from OpenSiteExplorer, located at http://www.opensiteexplorer.org/anchors?site=www.YOURDOMAIN.TLD. You’ll be looking at something like this (Distilled’s backlinks):
*Note* - these top anchor texts are the anchors you must give attention to. These are most likely the anchors that dropped, as well as the related keywords. If you’ve targeted, for example, [online colleges] with aggressive anchor text you will most likely see a drop around keywords like [best online colleges] and [cheap online colleges].
So by now we have:
- Diagnosed that a drop occurred;
- Seen which high-traffic keywords have dropped;
- Downloaded Analytics data and OSE anchor text spread.
Step 4 – Combine Data to Pull Out Learnings
What I’ve been doing now is combining the data in Excel. As you can see from the below Excel sheet, I’ve put both the Analytics data (anonymized) and the popular anchor text (also anonymized) onto one sheet for now:
One other thing that I do is get the percentage decrease back into my data, as unfortunately this does not download from Analytics (dear Avinash, can you make this happen please?)
What I did was use the following Excel formula, assuming visits to the keyword, by date range, are in column C:
This doesn’t give you completely clean data, but it does show you the change in every other column (on the later date level). Here’s how it looks now (with conditional formatting applied):
Now we get to combine this data using a Pivot table to see if the drops match up to anchor text. We are going to really be looking at:
- The drop;
- # of LRDs;
- # of links for that term (to look for potential sitewides to deal with first)
I am assuming that all of you know how to use Pivot Tables. If you do not, and you need some help learning them, I highly recommend either watching this video on Youtube or working through Lesson 5 on the Excel Guide for SEOs.
The goal with using this Pivot Table is to combine the keywords that have dropped with the # of linking root domains and number of links. You can now look for an abnormally large links/root domains ratio, or by knowing roughly the number of linking root domains your site possesses you can see where you are over-optimized with external links and need to go about changing anchor text or removing links.
Here is the configuration that I have used on the Pivot Table to get the data that I need:
Checking Link Over-Optimization
We’ve talked about how to diagnose the Penguin drop and tie it back to the specific keywords, but now we need to prioritize links to start investigating. My recommendation is to follow these guidelines:
- Start with the biggest traffic drop (numbers, not percentage)
- Check the TBPR/DA/PA of the site/page on which the links lives
- For sitewides (based off of links)
- Check indexation to see if Google has taken care of the link for you
Check DA/PA Distribution
Of course, start with your money-making terms. You want to check the distribution of TBPR/PA across these to see how good or bad it might be. To visualize the links to a specific page in order to see the DA spread of links, there are a few tools available:
- Tom Anthony built a tool that he gave away on SEOmoz that can do this for you;
- Dr Pete gave away the initial spreadsheet on SEOmoz to check Page Authority; and
- I created a Domain Authority checker spreadsheet based off of Dr Pete’s that is the second in this post, or you can just download it directly HERE and go to the post to figure out how to use it. You’ll get a graph like this:
Are My Links Indexed?
As I said above, Google has also been deindexing directories. If you have been doing a lot of directory linkbuilding, especially low PR directories, many of them may be deindexed. I’ve seen sites with over 80% of their backlink profile deindexed, which was to blame for their traffic drop (which did not correlate with the time of Penguin, by the way).
To check indexation, I recommend using Neils Bosma’s SEO Tools for Excel. There is a nifty formula called =googlepagerank() that will tell you the TBPR of the URL where the link exists. Pro tip: -1 means that it is not cached.
It’s easy to use, but you can see it here on the sheet I used for the pivot table (this sheet is called ‘Links to check indexation’):
Obviously, the links that this tells you are deindexed do not need to be worried about. You can then subtract them from the totals and get the overall number of links left to remove to get back to a homeostasis of a balanced link profile (balanced against competitors, of course).
Checking indexation based on dates can also help you figure out if you were hit by directories being deindexed. This will be around May 15th.
How to Proceed
Of course, following all of this investigation, I often get asked how to proceed. And the unfortunate answer is:
The right answer depends on your specific situation
If you had a lot of directories pointing to deep pages, but good links to your homepage, you might be able to salvage your site. If you have a situation like WPMU, where they had a lot of links from a lot of low-quality domains that they had control of, you can turn those off quickly.
If you have a case where your homepage has a lot of exact anchors and few branded and random anchors, you need to make the decision of whether it is worth the cost to do a marketing campaign to get branded links, whether you should use Penguin as a reason to rebrand and move sites, or a combination of these.
I made you a spreadsheet to help out
I’ve been showing you screenshots from an Excel spreadsheet spread throughout this post. In the interesting of giving away helpful things, I want to give you that spreadsheet as well!
I’d love to get y’alls thoughts on any of this. If you’ve written a good post about diagnosing Penguin in order to put together an action plan, feel free to drop the link in the comments below.
John Doherty is the head of and consultant in the Distilled New York City office. His work time is filled with data consumption and strategic awesomeness, while his free time consists of extreme sports, travel, and bicycle riding in Brooklyn.