Quick Tips to Format Data to Make your Data Sets Flexible

If you want to analyze or manipulate data, you have to save it in a format that you can easily retrieve. So I want to go over a few best practices for saving data so that it can effectively be used in the future.

Don’t include multiple pieces of data in the same cell.

There are many reasons you might do this, like having multiple keywords that are relevant to a particular page. But it will make the data super difficult to sort, compare and understand in any other way besides by a human.

Do create a new sheet or table (if you’re talking Access) for each type of data.

When dealing with data where multiple data fields have more than one relationship to another field (ie: each URL and each Keyword have multiple relationships to the other), the ideal way to save this information is actually in three different tables--one for each field, and another to define relationships.

data tables and relationship table.

So your first table will list only URLs and an ID:

And your second table will just have keywords:

A third table will define the relationships between each of these fields of data, using each of their ID numbers as the identifier:

This allows you to indicate anything about each unique relationship, such as whether it’s primary (1 for yes, 0 for no).

Then, each of the URL and Keywords tables can have additional information where there is information unique to each row:

From here, we can use QUERY & VLOOKUP to determine the likelihood of ranking for a particular URL:

In this sheet, when you change the URL, it will automatically update the PA (Page Authority), as well as what the PAs of top ranking pages are. You can see the formulas in my examples spreadsheet.

While there are other tools for this specific use case, saving your data in this way makes it much more flexible.  

Don’t skip cells to indicate that something is “continued,” or use the same field to indicate different data that is somehow related (ie: putting a Page Authority in a Domain Authority column).

This is bad practice because when you sort things, if you haven’t indicated in each row what the field value should be, you’ll lose it.

Adding in information about the data (like that it is a PA instead of a DA) will make the data un-sortable to begin with.

Do keep your data standard.

Saving information in this way will allow you to sort and/or create pivot tables. There should never be non-dates in date columns, or string (text) in number columns. Keep things standard and treat each column as its own clean little set of data. If you must, add an additional column with notes.

The other thing to keep in mind around standardization is regarding URLs.

You may have a list of URLs with all different formats - some with http, some with https, some with www, and some without any of that. But to leverage data well, you’ve got to have them all standard. This way, when you use something like VLOOKUP, you’ll know which version to use.

And if you want to better understand how to use some of these functions, I’d highly recommend our guide to Excel for SEOs. If you’d specifically like to dig into the QUERY function (only in Google Spreadsheets), I’d highly recommend it, and this intro is a good one, or you can go to Google’s own explanation of the language.

Happy sorting and querying!

Get blog posts via email