The ProblemRecently, one of our clients came to us with concerns about the amount of syndicated content that they currently house on their site. This particular client was in the publishing industry and thus, used syndicated content to help generate more content on their news site. Although, their syndicated content rightfully attributed the original source of the content, we still believed it provided minimal value to the overall site. However, we wanted to be able to provide the data to support our beliefs.
Unfortunately, there isn't an easy way to segment syndicated content on the site from original content using Google Analytics, as the syndicated articles provided attribution to the original article only on the actual content of the page. Thus, we needed another tool that could act as a crawler, while providing the resources to be able to custom segment the data that was being crawled- and lo behold, there was such a tool called Screaming Frog.
Other Uses For Screaming FrogThe value of Screaming Frog isn't limited to segmenting syndicated and original content, but could be used in conjunction with Google Analytics (or any other data analysis tool) to provide further detailed and specific information about a site. For instance, it can be used as a QA mechanism, to determine which pages on a site are missing the Google Analytics tracking code or for more customized Google Analytics settings, which pages have set Google Analytics custom variables. This can ensure that the data you or your clients are looking at later on is actually accurate and potentially saving lots of time later on.
There are many other uses for Screaming Frog. For instance, in Google Analytics, you can track how much traffic is generated through videos. With Screaming Frog, you can determine the percentage of pages on the site that include embedded videos, and use that data to decide the value of these videos. Should more time be expended to include more videos on the site? Less? At least now you have the data to either back-up or refute these beliefs.
Step-By-Step ProcessBelow I'll be showing you the step-by-step process on how to use these features in Screaming Frog. Please bear in mind that these features in Screaming Frog require a license key, which costs £99 per year (or about $162).
To maintain the confidentiality of our clients, I've decided to use the Wall Street Journal's Digits blog as an example.
Step 1: Export URLsExport all relevant URLs into a CSV file and then save the document as a text file in any format. Then open up Screaming Frog and in the toolbar, select "Mode" and then "List."
Step 2: Select FileClick "Select url list file" and then choose the correct text file that houses the list of URLs. In this case, I have saved it as "data2." Then click "Open."
Step 3: Confirm URLsAfterwards, a pop-up appears that shows you the list of URLs that Screaming Frog has found from within your file. Select "Ok."
Step 4: Create Custom FilterIn order to segment the syndicated articles from the original articles, we need to create a custom filter. Screaming Frog is able to crawl through the source code of any website and their custom filters allow you to specifically look for anything you designate. To do so, we have to go back to the main toolbar to "Configuration" and then select "Custom."
Step 5: Determine Custom Filter ConfigurationA "custom filter configuration" pop-up will appear. To determine what to include in this section, you must go back to your page and find any distinguishing phrases that would help to differentiate one content from the other. For example, on the "Digits" Wall Street Journal blog, they attribute to the original source by stating within the content "This article originally appeared in" the [blank] blog. Be careful not to include any words with hyperlinks, as that is not how the sentence would appear in the source code.
Step 6: Start CrawlerFor the last step, you would click "Start," which will result in Screaming Frog initiating the crawling process. Please be aware that you must select "Start" after you had set-up the custom filter because the crawler does a look-up as it is crawling the site. To find the data from your custom filter, select the "Custom tab." You're able to set-up five different custom filters at the same time.
ResultsWe can then use this information in conjunction with our Google Analytics. For instance, you can then export this list with the list you had initially compiled that contained all the URLs from organic traffic via Google Analytics in Excel. In our case, we administered a Vlookup to distinguish syndicated from non-syndicated articles. From our client data (not this Wall Street Journal example), we were able to determine that out of 3,176 article URLs that had received organic search traffic in the past 30 days, 1,257 were syndicated articles and 1,919 were original articles. This means that almost 40% of the articles on the site from the past month were syndicated.
AnalysisHowever, to put this in perspective, we also compiled a list of the total traffic compiled from these 3,176 articles and determined that syndicated articles only accounted for 20% of the traffic. And out of the top 20 articles that had received the most traffic in the past month, only 1 was syndicated. We also looked at that specific syndicated article to determine a potential reason why it was receiving so much traffic and found out that the original article was actually behind a pay wall.
All this information was provided to the client to help them determine for themselves the value syndicated articles brought to their site. As SEOs, our job has manifested into multiple components that includes making sure a site is crawlable, providing a positive user experience, and analyzing the data to help guide the future business of a company. Screaming Frog is a valuable resources that used in conjunction with other data analysis tools, could help all of us better segment the data that would eventually be used to guide business decisions.
I'd love to hear your thoughts about how you use Screaming Frog (or any other tool) to better segment data. Please feel free to reach out to me on Twitter @stephpchang or in the comments section.