How to Audit a Site for Structured Data Opportunities

‘Structured data’ has become a major buzzword in the SEO world, and with it has come a host of other jargon, from ‘rich snippets’ and ‘JSON-LD implementation’ to ‘microformats’ and ‘Schema.org’. But what do these things actually mean? And what (if anything) should we be doing about it?

This post will walk you through the process of identifying structured data opportunities for your site, as well as how to check an existing implementation for errors.

What is structured data?

‘Structured data’ in an SEO context refers to marking up a webpage’s HTML to provide more detailed information about what the content on that page is actually about. This is done in a very specific way so that it is understandable by computers as well as people.

There are many different types of structured data; you may be familiar with the Facebook Open Graph markup, for instance. But when we talk about structured data for SEO, we are usually talking about a particular ‘vocabulary’ called ‘Schema.org’.

The Schema.org vocabulary was the result of a collaboration by the major search engines (Google, Bing, Yahoo!, and Yandex) in order for them to have a standardized list of entities and attributes that they agreed to support/understand. However, there are other vocabularies which can be used; for instance, some websites make use of the microformats.org vocabulary, which is most commonly found in use when defining a physical location in relation to a person or organization (hCard markup) or for marking up product reviews (hReview markup).

While Schema.org refers simply to the particular vocabulary used to mark up this content, there are multiple ways you can implement this markup. The most common is through the use of the microdata format, which uses HTML elements and attributes to embed data in a page using the Schema.org vocabulary. With microdata markup, your structured data is integrated within the main HTML of the page. This has traditionally been the recommended approach when marking up a page for SEO purposes. You may also see references to RFDa formats. This is simply an alternative format that integrates into HTML in a similar fashion.

However, more recently additional support has begun to roll out for JSON-LD implementation, which allows you to place all of your markup in the head of the page, using a Javascript object. This makes it a simpler implementation from a development perspective. JSON-LD is currently supported for all Knowledge Graph features, sitelink search boxes, and Event and Recipe rich snippets, and for these it is the recommended implementation.

Implementing structured data, and specifically using Schema.org, is beneficial to a site for two reasons:

  1. Immediate benefit: possible enhanced display of search results, such as rich snippets and knowledge boxes.

  2. Future-proofing benefit: improving the machine-readability of your site is a good way of preparing for the future landscape of search. For more thoughts on what this might look like, see my colleague Tom Anthony’s presentation on Five Emerging Trends in Search.

The audit

NOTE: For the purposes of this post, I will refer to the Schema.org vocabulary hierarchy. You can also use microformats.org (e.g. hCard, hReview) if you prefer. I would normally recommend using Schema.org as it contains the most widely supported types. The exception to this would be if the specific markup type you want is better supported by microformats.org. Also note that you should only ever have one of these vocabularies per page, so if you mark up some of the elements on a page using Schema.org, don’t also include microformats.org markup on the same page.

Step 1: Check current implementation

You may already know whether there is currently any structured data markup on the site. In any case, you’ll want to check whether Google is finding this markup, and whether there are any errors being flagged.

Correcting errors is the first priority of a structured data audit, as Google may penalize sites (via either algorithmic quality factors or manual action) which are using markup incorrectly or in a way that looks spammy.

The best place to start with this is the Structured Data Report in Google Search Console. This report will tell you not only which pages have errors but may also be able to identify where and/or why the error is occurring.

You may also want to check search results for any schema types which you currently have on the site and, therefore, would expect to be seeing rich snippets for. These include:

  • product

  • event

  • review/rating

  • article

  • video

  • knowledge graph box (for branded searches)

  • breadcrumbs

img source: https://www.google.com/help/hc/images/webmasters/webmasters_99170_rsreview_en.png

Once you have checked the current setup for errors, the next step is to identify any additional opportunities. The best place to start is simply by categorizing all of your site’s content.

Step 2: Audit site content

Hopefully, your site has an intuitive structure of subfolders. This can make your job a bit easier. But if not, then you may have to take a slightly more manual approach.

What we’re looking for at this stage is simply the broad ‘types’ of content found on your site.

For example, a typical e-commerce/retail site might include:

  • homepage

  • category pages

  • product pages

  • editorial content (such as a blog)

  • transaction pages

  • product videos

  • user reviews/testimonials

  • store locations/contact info

  • business/company information

  • a software application download

  • tutorials/how-to content

  • etc.

Once you identify what the different types of content are, the next step is to figure out which of these can be marked up.

Step 3: Map content to Schema types

This process basically looks a little like a flowchart and a little bit like card-sorting. Now that you’ve identified the top-level types of content you want to mark up, review the documentation that lists all the currently supported Schema.org types. The goal is to find the most specific type that accurately applies to each type and/or piece of content you’ve identified on your site.

Start with the core vocabulary, to narrow down any which are not applicable. Then, once you’ve identified the top-level type you plan to use, you can view the ‘core plus all extensions’ list to check that you’re not missing any more specific options.

The top level areas that you’ll be looking at are all classed as ‘Thing’ (to differentiate them from the handful of markup types related to ‘Data Type’), and they fall under the following topics:

  • Action

  • Creative Work

  • Event

  • Intangible - This one is a little hard to define. The most easily categorized items included in this category are pieces of factual information (such as ‘OpeningTimes’, ‘GeoCoordinates’, and ‘FlightDetails’), or types of categorization (such as ‘Audience’ and ‘Brand’).

  • Medical Entity

  • Organization

  • Person

  • Place

  • Product

Sort your different content categories into these topic areas, and then start to drill down. To help figure out where to look, you can use the following questions:

Can the user complete an action on this page?

  • if so, look in the ‘Action’ list.

Is this page about a ‘thing’?

If yes, what kind of thing?

  • Creative work (e.g. editorial content, music, video, images, recipes, reviews, etc) - the schema for these are likely to be found in the ‘CreativeWork’ list.

  • A piece of factual information (e.g. ‘flight details’ or ‘opening hours’) - information ‘things’ are often classed under the ‘Intangible’ list.

  • An event - use the ‘Event’ list for more specific types of event.

  • A place - Use the ‘Place’ list for specific types of place. LocalBusiness is a ‘place’ type. For specific location data, you may want to use ‘GeoCoordinates’ under the ‘Intangible’ list (or try the microformats.org hCard markup) rather than a Schema.org ‘Place’ type.

  • A person - use ‘Person’ type.

  • A product - This one is fairly self-explanatory but worth noting that ‘Vehicle’ is a specific subtype of ‘Product’, so car dealers etc should use the specific ‘Vehicle’ markup for their product listings.

These broad areas should cover the majority of content on your site - however if you’re not sure whether there is a more specific type for what you want to markup, I’d actually suggest a brief Google search for that specific type, as that can be a quicker way of getting an answer than reading through every single type in the list.

Once you’ve determined what content you want to mark up, you’re ready to implement.

For a quick way to generate the more common markup types, you can use Google’s Structured Data Highlighter Helper tool. This allows you to highlight the relevant content on a given page, and the tool will generate the correct markup which can then be added to the page.

How do I know if the implementation is correct?

Once you’ve added the markup to your pages, you should test it is setup correctly using the Structured Data Testing Tool provided in Google Search Console. You should also continue to monitor the Structured Data Report in Search Console, in case of any errors which may be flagged up at a later date.

Closing thoughts

I hope this guide to a structured data audit was useful. If you have any comments or questions, or if there's anything you think worth adding to a structured data audit, get in touch in the comments section.

Get blog posts via email

About the author
Bridget Randolph

Bridget Randolph

Bridget joined Distilled in November 2012. An American born and bred (originally from a small Virginia town), Bridget came to the UK as a grad student in 2010, and didn't want to leave! In September 2012, she completed an M.Sc. in social...   read more