Lesson 1: Basic Tasks
In this lesson, we’ll cover some of the simpler formulas and functions available in Excel and Sheets, and how they’re used in the SEO’s day-to-day tasks. The functions we’ll cover:
- CONCATENATE/CONCAT and TEXTJOIN
- Text to Columns
- Remove Duplicates
Microsoft Excel definition: Joins several text strings into one text string.
Note that for versions of Excel from 2016 onwards, CONCATENATE has been replaced by CONCAT. Both still work for now, but Microsoft have warned that CONCATENATE may be deprecated in the future.
Syntax: CONCAT(text1,text2,…) or text1&text2&...
Concatenate is a pretty self-explanatory function, but that doesn’t make it any less useful. It is most often used to combine two cells into one:
You may also use the formula to insert text strings before, after, or between other cells. Insert a text string by putting it within quotations:
Another way to use CONCAT is without CONCAT at all! You can insert ampersands (&) between the different pieces of text or cell references in your formula, and dispense of the CONCAT command altogether.
ii) Text to Columns
Microsoft Excel definition: Distribute the contents of one cell across separate columns
Although it’s not technically a formula, Text to Columns is one of the most useful things that Excel can do for an SEO. The main use case for it is to split URLs.
Note that it’s normally best to copy the data that you’re Text-to-Columns-ing into a fresh column to the right of your table, otherwise you run the risk of overwriting your data.
For our real-world SEO example, let’s take a Screaming Frog crawl with a list of URLs.
COUNTIF and COUNTIFS
Microsoft Excel Definition: Counts the number of cells within a range that meet the given criteria.
COUNTIFS(range, criteria, [range2, criteria2] …)
COUNTIF is your go-to function for getting a count of the number of instances of a particular string, or more generally the number of times that given criteria about the contents of a cell are met. For instance, with this export from SEMrush’s organic research report for distilled.net, we can count the following things:
The number of keywords for which the position is 1 (or any given number).
The number of keywords for which the homepage (or any given page) ranks.
The number of keywords which contain a given string (e.g. “SEO”) including the use of wildcards
The asterisks either side of SEO are wildcards which indicate that any character or characters can apply.
Counting based on mathematical statements. (In this case keywords for which the ranking is 10 or less).
Countifs is effectively an extension of countif, which allows you to specify multiple criteria over which to count, such as the following examples. Note that all of the multiple criteria must be met in order for them to be counted - it acts as an AND statement, not OR.
The number of keywords for which the position is 1 (or any given number) AND the homepage is ranking.
=COUNTIFS(B:B,1, F:F, "https://www.distilled.net/")
Microsoft Excel Definition: Returns a value that you specify if a formula evaluates to an error; otherwise, it returns the result of the formula. Use IFERROR to trap and handle errors in a formula.
IFERROR is really simple and will become an important piece of most of our formulas as things get more complex. IFERROR is your method to turn those pesky #N/A, #VALUE or #DIV/0 messages into something more presentable.
Simple example of IFERROR
This function is used in Excel to take a set of data, and remove any entries that are duplicates of each other. An example of where this is used is to turn a list of backlink URLs into just a list of the domains they are on.
Backlink sources vary in whether they give both the URL and domain of the backlinks their crawlers discover, but if they don’t, this can be extracted using the text manipulation functions discussed in Lesson 2. Be careful with removing duplicates - this deletes data from duplicate rows, so it’s normally a good idea to copy your data into a new worksheet before doing so.
You can choose to remove rows based on duplication in as many rows of the table as you like - if you select multiple columns, only rows that are duplicate in all of the columns will be deleted.
Lesson 2: Text Manipulation Functions
The functions on which we’ll be focusing in this lesson are useful for dealing with text manipulation. As we’ll see from the examples, there are quite a few scenarios wherein the SEO has to manipulate a text string. Some of the formulas we’ll talk about are pretty simple to grasp individually, but can get a bit confusing when used together. We’ll touch on:
- LEFT, RIGHT, MID
Microsoft Excel Definition: Returns the number of characters in a text string.
Len simply gives the number of characters (including spaces) in a string. The string input can be from a cell, within quotes in the formula, or the result of another formula.
Len isn’t particularly interesting on its own, but it can be very handy when combined with other formulas. In this example, it can be used to find the length of title tags to ensure that they are not too long.
Microsoft Excel Definition:
SEARCH — Returns the number of the character at which a specific character or text string is first found, reading left to right (not case-sensitive).
FIND — Returns the starting position of one text string within another text string. FIND is case-sensitive.
Syntax: SEARCH(find_text,within_text,start_num) and FIND(find_text,within_text,start_num)
There are two differences between SEARCH and FIND:
SEARCH is not case-sensitive, FIND is.
SEARCH allows the use of wildcards, FIND does not.
Under most circumstances, SEARCH is all you need, but it helps to know that FIND is always there if you’ve got to deal with pesky capital letters in URLs or something similar.
Another reason to choose FIND is if you’re dealing with URLs that contain parameters. Without properly escaping question marks, they will act as wildcards, which may cause some frustration.
Example of SEARCH to find if and where pipes are appearing in title tags:
Note that when the searched string can’t be found within the target, a #VALUE error is returned. We can use IFERROR to clean this up, but first we should introduce the concept of nested formulas.
Note in the second example above: This is the first time we’ve used a nested formula. We have these when a function is placed within another function, which can be placed in another function, and another, and so on. Whether you’re reviewing your own formulas for errors, or looking at someone else’s work, you should start with the middle of a nested formula and work your way out.
Here’s the previous example, with IFERROR nested around SEARCH to clean up the results:
LEFT, RIGHT, MID
Microsoft Excel Definition:
LEFT — Returns the specific number of characters from the start of a text string.
RIGHT — Returns the specific number of characters from the end of a text string.
MID — Returns the characters from the middle of a text string, given a starting position and length.
Both LEFT and RIGHT return the characters from a given position in a text string starting from either side of a string. MID is great for extracting a portion of a text string. I’ve lumped the three together because they are often used in conjunction with each other (along with a few of the earlier functions). Let’s dive into an example:
Let’s say we’ve been given a list of URLs, and we want to extract just the domain.
This formula will do the job. Let’s break down this nested formula, and see how it pulls just the domain out of our URL. Starting from the middle we see SEARCH, which uses the syntax:
This formula finds the first instance of “/” in the cell to the left, starting at the 9th character from the beginning, which is done to start past the double slash in http:// or https://. As we see below (by using the F9 trick), the result for the first row of data is 22.
Now we are left with a simple LEFT formula. The syntax for LEFT is LEFT(text,num_chars).
In plain terms: “Give us the first 22 characters starting from the beginning.” We now have a nice listing of just root domains, with protocols.
Let’s use SEARCH (with wildcards) and MID together to extract a portion of a URL:
Extracting a portion of a URL with search and mid.
Let’s assume we want to pull the descriptive piece out of each of these URLs for reporting purposes.
We’ll definitely be making use of MID, as the text we want is in the MIDdle of our string. We’ll need to determine how many characters make up the “-tXXX.html” bit at the end of each URL. Since the length of this portion of the URL varies, but the format doesn’t (that is, “-t” + “numbers” + “.html”), we can use wildcards to find this character count.
Again, the syntaxes for these 2 functions:
Let’s break down the formula for the first URL in our list.
Cell A2: http://www.example.com/lamp-maintenance-t83.html
We’ve calculated the first instance of a “/” after the 8th character. This gives us our start_num values. We’re also using the * wildcard to help us get the character count of the right-most chunk of text.
We can easily calculate the number of characters for our MID once we know where our non-descriptive characters begin.
Let’s make a small adjustment to our original URL to demonstrate how we can use LEN in this formula.
Cell A2: http://www.example.com/t1521-lamp-maintenance.html
The additional +1 and -5 are necessary to make minor adjustments to the final outcome. Without them, our final result would have been “-lamp-maintenance.html”.
Both Excel and Google Sheets have the ability to find and replace strings in spreadsheets using the interface. SUBSTITUTE is a formula that allows you to do this within a formula. This has two advantages - it allows you to retain the unamended version (unlike find and replace), and it can be combined with other formulas.
A basic version of this is to take a set of URLs and replace the TLD. This can be useful when finding a list of URLs for checking redirects, or hreflang markup.
A neat trick involving SUBSTITUTE allows you to count how many times a given character appears in a cell. This is not a default formula, so it must be done using a combination of SUBSTITUTE and LEN.
For example, you can use this to find the number of words in a sentence (e.g. a search query or a title tag). The way this works is to count the number of characters in the sentence as a whole, and subtract the number of characters in the sentence with all of the spaces removed (or replaced by empty strings). This formula would look like the following:
=LEN(A1) - LEN(SUBSTITUTE(A1, “ “, “”)) + 1
The plus one is in order to count the number of words, not just the number of spaces.
Here is that formula in action:
This combination of LEN and SUBSTITUTE can also be used to count the number of slashes in a URL for example.