Using Blogrolls to Expand your Link Prospecting Lists

Note from John at Distilled - This is a guest post from Paul May at Buzzstream. Paul is a friend of Distilled's and we love their product, so you will see Paul guest posting here on the blog on a more regular basis going forward.

One of the most time consuming parts of content-based outreach is finding and qualifying prospects. Advanced search queries can return a large volume of opportunities, but you need to be experienced in putting together prospecting queries. Even then, you often have to cull through a lot of low quality sites once you get past the first few pages. Another approach is to use curated lists as the starting point for your prospecting. While these lists are more qualified, often they aren’t big enough to support a campaign. When you’re reaching out to bloggers, one way to address this problem is to start with a curated list of blogs and use blogrolls to expand it. Because blogroll inclusion can be a vote of confidence from a trusted person, it can be a great source for finding high value blogs.  That said, in order to find the high value blogs included in blogrolls, you need to deal with a couple of challenges:

  1. You can’t tell how relevant or valuable each of the blogs in a blogroll is without reviewing each one (this is particularly the case now that blogrolls have fallen out of favor for many verticals).
  2. Blogrolls often include a mix of blogs that are focused on the niche and ones that are completely unrelated.  So until you dig in, you don’t know if the blog was added because it’s a go-to resource for this blogger or because it’s a client’s blog, friend’s blog, etc.
Without a process and tools, trying to take a set of blogrolls and quickly isolate the relevant niche blogs in blogrolls is time-consuming. Fortunately, with a handful of free tools and a bit of ingenuity, you can take a set of blogrolls and quickly identify the ones you should research first.  If you can focus first on the ones that appear promising, you can save a boatload of time.

Here’s the basic approach:

  1. Create a seed set of blogs by extracting blog URLs from a public blog directory (e.g., alltop) or a curated lists of “top blogs”
  2. Use a blogroll discovery tool to expand the list,
  3. Create Excel pivot tables to find blogs that are included in blogrolls most frequently, and focus your research on them first (i.e., use co-citations as a quick-and-dirty proxy for measuring influence).
Let’s dive into an example to show you how this works.

Step 1: Build your seed list

Let’s say I’m trying to find out which blogs are influential in the internet marketing space.  There are lots of sources I could rely on, but for this example, I’ll use TopRank’s Big List of Search Marketing Blogs.

The Big List is a great source for a seed because it’s curated in a thoughtful way and because it’s, well, big. Not all niches will have a definitive source like this, but you can often find good pages on Alltop, or you can merge lists from multiple sources – often times a simple search for “best x blogs” can unearth some good lists, or you can look for badges on influential blogs in the niche and that will lead you back to a comprehensive list (doing this in the Internet marketing space would lead you to The Big List, The Ad Age Power 150, and Invesp’s Blog Rank). Mining general directories like DMOZ can also be very effective.

So we’ve identified a great source for our seed list, but now we need to get this list of URLs into a CSV format.  There are online apps for extracting links, including iWebTool’s Link Extractor and BuzzStream’s link extraction tool (I’m one of the founders of BuzzStream). You can also accomplish this by using a Chrome/Firefox extension like Web Developer to grab the list of URLs on a page.  Since I’m most familiar with BuzzStream’s tool, I’ll use it for this example:

  1. In Firefox, highlight the section on the blog that includes the links you want to capture.
  2. Right-click and click “View Selection Source”
  3. Copy-and-paste the selection into the Link Extraction Tool
This gives us a CSV that includes all the URLs in the Big List and the domains for each of them (385 blogs).  This is our seed list of blogs.

One thing to note is that, depending on your source and the tools you use, you might need to clean up the CSV a bit.  Look for URLs that are clearly improperly formatted, links to share buttons, etc.

Step 2: Use blogrolls to expand the list

Now that we have our seed list, the next step is to expand it by finding the list of blogs that are included in the blogrolls of these list members.  To do this, we’ll use the free Blogroll List Builder tool:
  • Paste the list of URLs in your seed list into the blogroll tool and click ‘Go’
  • Give the tool some time to run…there’s no feedback mechanism in the tool right now, so you’ll need to check the CSV against the list shown on the page to know when it’s done.  Once it’s finished, click “Download as a CSV file”
So this gives us a list of all the blogs from our original list and all of the blogs included in their blogrolls.  If you’re feeling ambitious, you can feed this new list back into the Blogroll List Builder tool to find 2nd level blogroll members (I’ve found that this improves results significantly).  We’ll skip that for this example.

We’re almost ready to find out which blogs are included the most frequently in blogrolls, but first we need to clean up the data a bit.

Step 3: Clean up the blogroll list

The first thing we’ll do to clean up the data is remove blogroll members that are included multiple times in the same person’s blogroll.  For example, is included in’s blogroll 15 different times.  If we leave all of these in the dataset, the pivot table is going to overstate how often this blog is included in blogrolls.  So to prevent this, we’ll create a formula in excel that will help us find the duplicates and then delete these rows:
  • In Excel, click Data>Sort.  Sort first by ‘Input URL’ and then by ‘Blogroll Domain’
  • In cell E1, put in the header “Multiple Inclusion per Blog’
  • In cell E2, add this formula - =IF(C2=C1,E1+1,1) .  What this does is check to see if the domain for the blogroll member is the same as the domain in the previous row. If it is, it adds 1 to the count.
  • Paste the formula down to the bottom of your list.
  • Select all of the cells in column E that have formulas in them, right-click and select copy, right-click again and select ‘Paste as Values’.  This will replace the formulas in these cells with the actual numbers that the formula returned.  If you don’t do this, you’ll have problems when you start deleting rows

  • Scroll through your data to find any cells in column E that don’t have a value of 1 and remove these rows.
The other thing we’ll do to clean up the data is remove any rows where the blogroll member is the same domain as the source blog (i.e., the XYZ blog includes in their blogroll).
  • In cell F1, put in the header “Same Blogroll Member as Source”
  • In cell F2, add this formula - =IF(LEFT(A352,20)=LEFT(B352,20), "match","").  This isn’t a perfect formula, but it’ll help you find ones that appear to be the same (i.e., the blogroll URL has the same first 20 letters as the input URL).
  • Paste the formula down to the bottom of your list.
  • Again, select all of the cells in the column that have formulas in them, right-click and select copy, right-click again and select ‘Paste as Values’.
  • Scroll through your data to find any cells in column F that say “match.”  Review them to make sure that they’re an actual match and remove these rows.

Step 4: Create a Pivot Table to Find Blogs in the Most Blogrolls

Now let’s make use of our excel ninja skills and create a pivot table that shows us how many times each of these blogs show up in a blogroll.

To start, click Insert>Pivot Table

The data you created should be selected, and in the section titled “Choose where you want the pivot table report placed,” keep “New Worksheet” selected.

Click OK.  Your screen should look like this:

On the right-hand side, drag ‘Blogroll Domain’ into both the ‘Row Label’ and ‘Values’ box:

In the ‘Values’ section, it should say “Count of Blogroll Domain’.  If it doesn’t click on the dropdown for the item in the ‘Values’ section, select ‘Value Field Settings and change it to ‘Count’.

You’ll now have a table with two columns.  The first column shows the blog and the second column shows how many times it’s included in blogrolls.  To sort this list, simply select any cell in column B, right-click and select Sort, largest to smallest.

So from our original list of 288 blogs, we’ve found 483 more blogs in the first level of blogrolls.  Of these, 65 show up in at least 2 blogrolls.  The ones with the highest count will usually be multi-author, broad category blogs.  To find influencers within these blogs, you can usually find the specific tag or category you care about in the blog, find out who writes about that topic for the publication and search to find out where else they publish.  Beyond the big publications, you’ll start seeing more of the single author blogs that cover a fairly broad category and beyond that, you’ll start seeing more of the “sub-niche” blogs (e.g., people in the SEO community who are heavily focused on link building, outdoor adventure bloggers who are specifically focused on rock climbing, etc.).

That leaves over 400 blogs that are only included in a single blogroll.  Should you ignore these? Definitely’ll miss out on lots of relevant, highly valuable sub-niche bloggers in this list. There are other tools you can use to quickly identify the good sub-niche blogs and to filter out the ones that are obviously unrelated.  But that’s another post entirely…

Get blog posts via email