In the past few years innovations on the web have made it incredibly easy for regular people like you and I to enter the world of coding. For example, right at the end of 2010 I started dabbling with Google Appengine and shipped a fully functional interactive site in 4 weeks. (Read how I built 7books in 4 weeks)
So as SEOs we have a huge arsenal of tools that we can call upon for various different functions. A lot of these tools, services and platforms however either require learning a large amount of code or take a long time to build something bespoke. So in this post I’m going to talk about using Google Spreadsheets to build small, agile tools which can be built to match your exact needs.
Agile vs Scaleable
Before I dive into the technical details, a quick word on what I use Google Docs for. In my SEO-ninja toolset Google Docs are used for quick, agile tools. That means that if there’s a specific problem I need to overcome or some weird thing I’m testing I always turn to Google Docs first. That’s because I can build things quickly. They aren’t always robust, but if I’m only building a tool to solve a unique problem (as opposed to a problem I encounter all the time) then speed is of the essence. I don’t want to have to spend a lot of time building a tool I’m only going to use once. Or running a test that turns out to not give me the expected results. If you want to build scaleable tools then I suggest you leave it to the pros (though Appengine is a great place to start with building “real” tools).
Let’s start with the complete beginner
Ok, so you might be scared. I’m going to talk about writing functions and building tools. You’re going to get your hands dirty. But literally anyone can do this. You need no prior knowledge. None. This post should take the complete beginner to ninja in 5 easy steps. The steps I’m going to cover:
- Simple Web Scraping
- Advanced Web Scraping
- Google Docs Scripts – The Secret Sauce
- Script Triggers
- Putting It All Together
Lesson 1 – Simple Web Scraping
Ok, so the bedrock of using Google Spreadsheets for fun and profit is their nifty little function called ImportXML. Richard Baxter wrote a great intro post on ImportXML which you can check out. The basic premise is that using a function like this:
(Note: this importxml function has been modified from the original to handle the move from distilled.co.uk to distilled.net)
Gets us a list of blog post titles into a Google Spreadsheet like this:
Try it for yourself! Copy and paste that code into a blank Google Spreadsheet and see what happens
Don’t get scared! There’s lots of things you probably don’t understand so let’s walk through them for you.
A standard function looks like this =importxml(“url”, “query”). So the URL can be explicit (like I typed above) or a reference file like this =importxml(A1, “query”) just like you would with a regular spreadsheet function. The query is an XPATH query. For a tutorial reference on XPATH here’s a good guide.
If you can’t be bothered reading that then here’s a few quick definitions (warning! hand-wavey!)
- // – this means select all elements of the type
- //h3 – this means select all h3 elements
- [@class=''] – this means only select those elements that meet the criteria given
- //h3[@class='storytitle'] – this means only select elements that look like: <h3 class=”storytitle”>Title</h3>
Walkthrough Example for Simple Web Scraping
So, now we’re getting to grips with the code let’s step through a practical example. A common SEO task is “how can I find as many blogs on niche X as possible”. So I google around and find a list of 100 blogs like this: http://www.onlineschools.org/2009/10/27/100-best-book-blogs-for-kids-tweens-and-teens/. It’s manual and time consuming having to click on each one to copy the link. I want to get the list of URLs into a spreadsheet as quick as possible.
1) First we take a look at the source code of the page and we see something like this:
2) We load up a Google Docs and fire up the importxml function. We can see that all the blogs are a separate li element so let’s try something like
=importxml(A1,"//li")(where A1 is the cell with the URL of the page). We get this back:
3) As you can see, it contains the blog names so we’re getting there. But our query is also getting a whole load of other stuff we don’t want. So let’s look in the code and see if we can isolate the list of blog items. I find the “inspect element” control in Google Chrome excellent for visualising this. As you hover over the code, it highlights the section of the page that applies to it.
4) We refine our guess to limit ourselves to the div with class ‘intro-box-wide’ using a query like
=importxml(A1,"//div[@class='intro-box-wide']//li")Which loosely translated says “fetch the div with that class and then select all li elements within it” which results in:
5) We’re nearly there! We now have a list of all the blog elements. The next step is to pull the URL. So we modify our function to be:
=importxml(A1,"//div[@class='intro-box-wide']//li/a/@href")Which says, from the li elements select the href contents from the a element. This /a/@href is a very common thing to tag on the end of importxml functions so I suggest you memorise it. This results in:
And we’re done! If you want to look at the spreadsheet within Google Docs go here and make a copy then you can play around to your heart’s content
Lesson 2 – More Advanced Web Scraping
Ok, now we have the basics down let’s move on to some more fun activities. Of course, as soon as I get computers involved my thoughts turn to rank checking… This is a common task that we might want to do so let’s quickly discuss how to do that. Firstly we construct the search URL like this:
Where the query to search is in cell A2. Then we parse the Google URL using importxml like this:
I’m not going to break that down, hopefully you can figure out what I’m getting off the page. Again, check the source code for the page if you’re not sure what to write in your importxml function. Output like this:
As before, Grab your own copy here.
Of course, since this is a common function we might want to roll it into a single cell like this:
I was going to add some explanation here as to what this formula does but actually it gets pretty complicated. Either you already know what an arrayforumla does (in which case it should be straightforward) or you don’t. In which case you probably just want to copy and paste for now
I should note at this stage that there is a limit for 50 importxml calls per spreadsheet which limits us from building a full web crawler but for most agile tools this is sufficient (especially when combined with scripts, see lesson 3).
Walkthrough Example for Complex Scraping
Cool, so now we have a bunch of moving parts let’s start combining things. Here’s a spreadsheet where you enter a keyword, 4 location variants and a domain and it gives you the ranking position for all the keywords that search suggest spits out for that domain. Nifty.
As always, have a poke around with the Google Doc to understand what’s happening.
Lesson 3 – Google Docs Scripts – The Secret Sauce
Now, all this is very well – we have functions which pull in data but it’s all a little “flat” if you know what I mean? Let’s try and jazz things up a little by making it MOVE. For anyone familiar with macros in excel, scripts function in a very similar way. Two big advantages here however are the ability to crawl URLs and also the ability to email you. Nice.
Google Scripts are very powerful and essentially allow you to build fully featured programs so I’m not going to go into massive detail here. There are great tutorials from Google already for example:
- Your First Script – a walkthrough of how to use the service
- Sending emails from a spreadsheet – does what it says on the tin
You can easily lose days of your life browsing through and playing with all the things that Google Scripts do. Here, I’m going to present a simple example to show you how agile this is. That’s the key here, building tools that fit your exact needs quickly and easily. Let’s imagine I want to quickly check a bunch of URLs for their tweet count to produce something like this:
What’s happening here is that I’m inputting a keyword, then pulling the top 10 results for that keyword from Google and then checking each of those URLs for the number of tweets they have (imagine perhaps you’re gathering data like for this post by SharkSEO). Let’s step through the thing you need to do:
Walkthrough Example for Google Scripts
The code I’m using in column C is this:
=tweetCount(B2)Where tweetCount is a predefined function that I’ve written in Google Scripts. Let’s fire up Google Scripts and take a look at it:
(Note that you’ll need to grab your own free Backtype API Key to make this script work)
Once you’ve read through the Google Scripts tutorials above you should be fairly comfortable with how this works so I’m not going to step through it in detail. The parsing XML tutorial will likely come in handy.
Lesson 4 – Google Scripts Triggers
Ok, now for the magic. Google scripts are nice, but the real power comes from triggering these scripts in different situations. You can cause a script to trigger on any of the following:
- The spreadsheet is opened
- A form is submitted
- A button is pressed
- A specific time happens
The most useful here is the time-based trigger I think. Let’s take a quick look at writing a time-based script.
Walkthrough example of time-based trigger
Let’s again take a simple example. As I’m writing this post I know that I’m going to put it live soon, so let’s build a spreadsheet to check the keyword “seo tools” and see if perhaps QDF will push this post onto the first page at any point. How viciously referential
Step 1 – we write a simple spreadsheet to rank check distilled.co.uk against a particular keyword Step 2 – we write a script that tracks the rank and logs it in a new cell:
Step 3 – we create a time-based trigger to run the script every 30mins:
A few things to note:
- I’ve used
=int(now())in the URL to generate a unique URL each time. Otherwise Google caches the data and you won’t get fresh data each time
- Note the getRange and setValue functions – these are very useful to get your head around. See this tutorial.
The final result (you might have to scroll down for a while depending how long after I wrote this post you’re reading this!):
Grab a copy of the spreadsheet here to see how it works.
Lesson 5 – putting it all together
So, finally let’s put it all together in a fun example. I’ve created a form here where you can enter your city and your email address and my script will fetch some data and email it to you. Just like magic! Go ahead, try it out
Taking it further
The sky really is the limit when it comes to Google Scripts but I think that if you start doing any more heavy lifting than what I’ve done in this post you almost certainly want to start building in exception handling and learning to code properly (which I, to be clear, have very much not done!). That said, if you do fancy using Google Scripts there are all kinds of funky things it can do:
- Build a GUI
- Use full oAuth to build a twitter app
- Create & edit new spreadsheets on the fly
- Publish & edit Google sites
- Converting spreadsheets into JSON
- Using Google Docs as a database
Tom Critchlow Tom Critchlow is VP Operations for the NYC office, living in Brooklyn and working in Manhattan. Fiercely curious about most things and passionate about everything.