How to Use Screaming Frog to Segment Data

The Problem

Recently, one of our clients came to us with concerns about the amount of syndicated content that they currently house on their site. This particular client was in the publishing industry and thus, used syndicated content to help generate more content on their news site. Although, their syndicated content rightfully attributed the original source of the content, we still believed it provided minimal value to the overall site. However, we wanted to be able to provide the data to support our beliefs.

Unfortunately, there isn't an easy way to segment syndicated content on the site from original content using Google Analytics, as the syndicated articles provided attribution to the original article only on the actual content of the page. Thus, we needed another tool that could act as a crawler, while providing the resources to be able to custom segment the data that was being crawled- and lo behold, there was such a tool called Screaming Frog.

Screaming Frog Logo

Screaming Frog Logo

Other Uses For Screaming Frog

The value of Screaming Frog isn't limited to segmenting syndicated and original content, but could be used in conjunction with Google Analytics (or any other data analysis tool) to provide further detailed and specific information about a site.  For instance, it can be used as a QA mechanism, to determine which pages on a site are missing the Google Analytics tracking code or for more customized Google Analytics settings, which pages have set Google Analytics custom variables. This can ensure that the data you or your clients are looking at later on is actually accurate and potentially saving lots of time later on.

There are many other uses for Screaming Frog.  For instance, in Google Analytics, you can track how much traffic is generated through videos.  With Screaming Frog, you can determine the percentage of pages on the site that include embedded videos, and use that data to decide the value of these videos. Should more time be expended to include more videos on the site?  Less? At least now you have the data to either back-up or refute these beliefs.

Step-By-Step Process

Below I'll be showing you the step-by-step process on how to use these features in Screaming Frog. Please bear in mind that these features in Screaming Frog require a license key, which costs £99 per year (or about $162).

To maintain the confidentiality of our clients, I've decided to use the Wall Street Journal's Digits blog as an example.

Step 1: Export URLs

Export all relevant URLs into a CSV file and then save the document as a text file in any format. Then open up Screaming Frog and in the toolbar, select "Mode" and then "List."

Screaming Frog Mode

Step 2: Select File

Click "Select url list file" and then choose the correct text file that houses the list of URLs. In this case, I have saved it as "data2." Then click "Open."

Screaming Frog Select File

Step 3: Confirm URLs

Afterwards, a pop-up appears that shows you the list of URLs that Screaming Frog has found from within your file. Select "Ok."

Screaming Frog URLs

Step 4: Create Custom Filter

In order to segment the syndicated articles from the original articles, we need to create a custom filter. Screaming Frog is able to crawl through the source code of any website and their custom filters allow you to specifically look for anything you designate. To do so, we have to go back to the main toolbar to "Configuration" and then select "Custom."

Screaming Frog Custom Filter

Step 5: Determine Custom Filter Configuration

A "custom filter configuration" pop-up will appear.  To determine what to include in this section, you must go back to your page and find any distinguishing phrases that would help to differentiate one content from the other.  For example, on the "Digits" Wall Street Journal blog, they attribute to the original source by stating within the content "This article originally appeared in" the [blank] blog.  Be careful not to include any words with hyperlinks, as that is not how the sentence would appear in the source code.

Wall Street Journal Digits BlogWe would then type in "This article originally appeared in" in the "Custom Filter Configuration" and then select "Ok."

Screaming Frog Custom Filter Configuration

Step 6: Start Crawler

For the last step, you would click "Start," which will result in Screaming Frog initiating the crawling process.  Please be aware that you must select "Start" after you had set-up the custom filter because the crawler does a look-up as it is crawling the site.  To find the data from your custom filter, select the "Custom tab."  You're able to set-up five different custom filters at the same time.

Screaming Frog CrawlFrom our screenshot, you can see some of the URLs from our initial list that followed this criteria.

Screaming Frog Data


We can then use this information in conjunction with our Google Analytics. For instance, you can then export this list with the list you had initially compiled that contained all the URLs from organic traffic via Google Analytics in Excel. In our case, we administered a Vlookup to distinguish syndicated from non-syndicated articles. From our client data (not this Wall Street Journal example), we were able to determine that out of 3,176 article URLs that had received organic search traffic in the past 30 days, 1,257 were syndicated articles and 1,919 were original articles. This means that almost 40% of the articles on the site from the past month were syndicated.


However, to put this in perspective, we also compiled a list of the total traffic compiled from these 3,176 articles and determined that syndicated articles only accounted for 20% of the traffic. And out of the top 20 articles that had received the most traffic in the past month, only 1 was syndicated.  We also looked at that specific syndicated article to determine a potential reason why it was receiving so much traffic and found out that the original article was actually behind a pay wall.

All this information was provided to the client to help them determine for themselves the value syndicated articles brought to their site. As SEOs, our job has manifested into multiple components that includes making sure a site is crawlable, providing a positive user experience, and analyzing the data to help guide the future business of a company. Screaming Frog is a valuable resources that used in conjunction with other data analysis tools, could help all of us better segment the data that would eventually be used to guide business decisions.

I'd love to hear your thoughts about how you use Screaming Frog (or any other tool) to better segment data. Please feel free to reach out to me on Twitter @stephpchang or in the comments section.

Get blog posts via email