How To Build Agile SEO Tools Using Google Spreadsheets

In the past few years innovations on the web have made it incredibly easy for regular people like you and I to enter the world of coding. For example, right at the end of 2010 I started dabbling with Google Appengine and shipped a fully functional interactive site in 4 weeks. (Read how I built 7books in 4 weeks)

Of course, advances in technology have also made it easier for the pros to build awesome applications. Just look at SEOmoz’s Open Site Explorer which relies on Amazon Web Services.

So as SEOs we have a huge arsenal of tools that we can call upon for various different functions. A lot of these tools, services and platforms however either require learning a large amount of code or take a long time to build something bespoke. So in this post I’m going to talk about using Google Spreadsheets to build small, agile tools which can be built to match your exact needs.

Agile vs Scaleable

Before I dive into the technical details, a quick word on what I use Google Docs for. In my SEO-ninja toolset Google Docs are used for quick, agile tools. That means that if there’s a specific problem I need to overcome or some weird thing I’m testing I always turn to Google Docs first. That’s because I can build things quickly. They aren’t always robust, but if I’m only building a tool to solve a unique problem (as opposed to a problem I encounter all the time) then speed is of the essence. I don’t want to have to spend a lot of time building a tool I’m only going to use once. Or running a test that turns out to not give me the expected results. If you want to build scaleable tools then I suggest you leave it to the pros (though Appengine is a great place to start with building “real” tools).

Let’s start with the complete beginner

Ok, so you might be scared. I’m going to talk about writing functions and building tools. You’re going to get your hands dirty. But literally anyone can do this. You need no prior knowledge. None. This post should take the complete beginner to ninja in 5 easy steps. The steps I’m going to cover:
  1. Simple Web Scraping
  2. Advanced Web Scraping
  3. Google Docs Scripts - The Secret Sauce
  4. Script Triggers
  5. Putting It All Together

Lesson 1 - Simple Web Scraping

Ok, so the bedrock of using Google Spreadsheets for fun and profit is their nifty little function called ImportXML. Richard Baxter wrote a great intro post on ImportXML which you can check out. The basic premise is that using a function like this:
(Note: this importxml function has been modified from the original to handle the move from to

Gets us a list of blog post titles into a Google Spreadsheet like this:

Try it for yourself! Copy and paste that code into a blank Google Spreadsheet and see what happens :)

Don’t get scared! There’s lots of things you probably don’t understand so let’s walk through them for you.

A standard function looks like this =importxml(“url”, “query”). So the URL can be explicit (like I typed above) or a reference file like this =importxml(A1, “query”) just like you would with a regular spreadsheet function. The query is an XPATH query. For a tutorial reference on XPATH here’s a good guide.

If you can’t be bothered reading that then here’s a few quick definitions (warning! hand-wavey!)

  • // - this means select all elements of the type
  • //h3 - this means select all h3 elements
  • [@class=’’] - this means only select those elements that meet the criteria given
  • //h3[@class=’storytitle’] - this means only select elements that look like: <h3 class=“storytitle”>Title</h3>

Walkthrough Example for Simple Web Scraping

So, now we’re getting to grips with the code let’s step through a practical example. A common SEO task is “how can I find as many blogs on niche X as possible”. So I google around and find a list of the top 25 blogs on Technorati: It’s manual and time consuming having to click on each one to copy the link. I want to get the list of URLs into a spreadsheet as quick as possible.

1) First we take a look at the source code of the page and we see something like this:


2) We load up a Google Docs and fire up the importxml function. We can see that all the blogs are in h3 elements within a list with class of “even” so let’s try something like

(where A1 is the cell with the URL of the page). We get this back:

Check out the sheet here.

3) As you can see, it contains the blog names so we’re getting there. But our query is also getting a whole load of other stuff we don’t want. So let’s look in the code and see if we can isolate the list of blog items. I find the “inspect element” control in Google Chrome excellent for visualising this. As you hover over the code, it highlights the section of the page that applies to it.


4) We refine our guess to limit ourselves to the a tag within the h3 using a query like

Which loosely translated says “fetch the anchor text within h3’s that appear within the list with class=”even“ which results in:

Get the sheet here.

5) We’re nearly there! We now have a list of all the blog elements. The next step is to pull the URL. We still want the blog names, but we also want the links, so we add in another importxml call:

Which says, from the li elements select the href contents from the a element. This /a/@href is a very common thing to tag on the end of importxml functions so I suggest you memorise it. This results in:

And we’re done! If you want to look at the spreadsheet within Google Docs go here and make a copy then you can play around to your heart’s content :)

Lesson 2 - More Advanced Web Scraping

Ok, now we have the basics down let’s move on to some more fun activities. Of course, as soon as I get computers involved my thoughts turn to rank checking... This is a common task that we might want to do so let’s quickly discuss how to do that. Firstly we construct the search URL like this:
Where the query to search is in cell A2. Then we parse the Google URL using importxml like this:
I’m not going to break that down, hopefully you can figure out what I’m getting off the page. Again, check the source code for the page if you’re not sure what to write in your importxml function. Output like this:

As before, Grab your own copy here.

You’ll notice that the results returned are less than pretty, and it’s just because this is how Google structures their HTML. We need to turn this: /url?q= into There’s probably 10 different ways of doing this, but here’s what I’m using:

. Using this formula, I can extract exactly what I need from all the returned results, hopefully you can pick out what I’m doing!

Lastly, we want to find out where ranks for ”seo agencies london“ so we’ll use this formula:

=ArrayFormula(MATCH(1, FIND(”“,D2:D51),0))
I was going to add some explanation here as to what this formula does but actually it gets pretty complicated. Either you already know what an arrayforumla does (in which case it should be straightforward) or you don’t. In which case you probably just want to copy and paste for now :)

I should note at this stage that there is a limit for 50 importxml calls per spreadsheet which limits us from building a full web crawler but for most agile tools this is sufficient (especially when combined with scripts, see lesson 3).

Lesson 3 - Google Docs Scripts - The Secret Sauce

Now, all this is very well - we have functions which pull in data but it’s all a little ”flat“ if you know what I mean? Let’s try and jazz things up a little by making it MOVE. For anyone familiar with macros in excel, scripts function in a very similar way. Two big advantages here however are the ability to crawl URLs and also the ability to email you. Nice.

Google Scripts are very powerful and essentially allow you to build fully featured programs so I’m not going to go into massive detail here. There are great tutorials from Google already for example:

You can easily lose days of your life browsing through and playing with all the things that Google Scripts do. Here, I’m going to present a simple example to show you how agile this is. That’s the key here, building tools that fit your exact needs quickly and easily. Let’s imagine I want to quickly check a bunch of URLs for their tweet count to produce something like this:

Check out the sheet here.

What’s happening here is that I have a list of URLs that I want to check tweet counts for. I’ve created my own function which takes one parameter: =twitter(URL) where URL is the reference to the cell with the link I want to check. Here’s the code:

function twitter(url) {
  var jsondata = UrlFetchApp.fetch(”“+url);
  var object = Utilities.jsonParse(jsondata.getContentText());
  return object.count;

Once you’ve read through the Google Scripts tutorials above you should be fairly comfortable with how this works so I’m not going to step through it in detail. The parsing XML tutorial will likely come in handy.

Lesson 4 - Google Scripts Triggers

Ok, now for the magic. Google scripts are nice, but the real power comes from triggering these scripts in different situations. You can cause a script to trigger on any of the following:
  • The spreadsheet is opened
  • A form is submitted
  • A button is pressed
  • A specific time happens
Read more about script triggers here.

The most useful here is the time-based trigger I think. Let’s take a quick look at writing a time-based script.

Walkthrough example of time-based trigger

Let’s again take a simple example. As I’m writing this post I know that I’m going to put it live soon, so let’s build a spreadsheet to check the keyword ”seo tools“ and see if perhaps QDF will push this post onto the first page at any point. How viciously referential :)

Step 1 - we write a simple spreadsheet to rank check against a particular keyword Step 2 - we write a script that tracks the rank and logs it in a new cell:

Step 3 - we create a time-based trigger to run the script every 30mins:

A few things to note:

  • I’ve used
    in the URL to generate a unique URL each time. Otherwise Google caches the data and you won’t get fresh data each time
  • Note the getRange and setValue functions - these are very useful to get your head around. See this tutorial.
The final result (you might have to scroll down for a while depending how long after I wrote this post you’re reading this!):

Grab a copy of the spreadsheet here to see how it works.

Lesson 5 - putting it all together

So, finally let’s put it all together in a fun example. I’ve created a form here where you can enter your city and your email address and my script will fetch some data and email it to you. Just like magic! Go ahead, try it out :)

Taking it further

The sky really is the limit when it comes to Google Scripts but I think that if you start doing any more heavy lifting than what I’ve done in this post you almost certainly want to start building in exception handling and learning to code properly (which I, to be clear, have very much not done!). That said, if you do fancy using Google Scripts there are all kinds of funky things it can do: But for me, the real power is in hacking together things in a few minutes which gather the data I need so I can get back to getting stuff done. I’ll leave building real SEO tools to the pros for now :)

Get blog posts via email


  1. My goodness sir. Thanks for this cracker. Going to take me a while to get through this properly but looks a treat for those of us who love to build our own adventure!

    reply >
    • Really Sam, what a cracker it is. I came to know this after one year. Since last few years I was finding the way to build this kind of SEO tool. Mr. Tom did a great job!. Thanks to him for this cracker.

  2. Oh yes. There's a ton of good new skills in here! I suppose it wouldn't be too difficult to pull WHOIS data and capture contact emails or look for twitter addresses on each of the scraped web pages in Lesson 1 - :-)

    reply >
  3. Yousaf Sekander

    Excellent post! Bet Google can not penalize itself for scraping the its own SERPs.

    reply >
  4. Mal

    Great stuff, Tom, thanks - I'll be playing with some of this stuff over the next few weeks...

    reply >
  5. Ari

    NOOOOO! You wrote all of my secret sauce in a blog post! I was planning on teasing with a few of these techniques but you've given it all away. Good man.
    I've been using the importXML function for a variety of SEO tools (and other scraping techniques) for the past 6 months or so.
    The scripting is new to me though - thanks for putting it all out here. I have a much uglier rank checking spreadsheet.. this will no doubt help with all of that.
    I've also used importXML to scrape titles, meta descriptions, H1s, etc. Obviously if you have money, SEO Spider works just as well, but Google Docs can provide a nice poor-man's SEO toolkit!

    reply >
  6. Great post Tom, some really really useful stuff in here!!

    reply >
  7. Wow, is this ever a timely post. I just launched my new site a few days ago and I've been trying to find a tool with exactly the functionality you've put together here. The complexity of implementing it all went over my head the first read-through, but I'm sure I'll get a handle on it on a second pass. Thanks a ton for this post!

    reply >
  8. Richard/Tom, care to extrapolate more on how we could pull e-mails with this? I'm trying to figure out how that would work but I'm not sure if it's possible at scale, say with a huge e-mail list, because I'm not sure how it could pull all data to the left or right of an "@" sign for example.

    reply >
  9. Ari

    Hey Tom - as promised here's a link to my Anchor Text Scraper tool. A very basic handy little google doc tool which will output all links on the page (Anchor Text and corresponding URLs)

    Feel free to hit File/Make a copy/ in order to save a version that you can actually use! :)

    reply >
    • Thanks for sharing! Google Spreadsheets a poorman's excel? Boy was I wrong..

      Ari, awesome doc. However, with one of the sites I filled in the anchor text and url don't appear in the same row.

      URL's are all neatly following each other up one row after another, but the anchor text are skipping random numbers of rows. Although the order in which they appear is still good, this makes the whole a bit confusing.

      Any idea how this might have come to be?

  10. @Ari that's awesome, thanks for sharing!

    reply >
  11. Great stuff Tom, but it will take some time to integrate.

    I am a little disappointed though that you can only import(xml) 50 cells in Google Spreadsheets. I guess the other alternative is using your own programs (python is the best?)


    reply >
  12. I've been hearing about this for some time now, but I never actually got down to trying it out. Thought I'd give it a try now, and managed to get some pretty nifty results. Thanks for your help :)

    reply >
  13. This article is excellent, I'm sure it's going to take me a while to get my head round this.

    reply >
  14. Pretty powerful stuff there, have been looking for an efficient way to automate rank checking!

    reply >
  15. Hey all - for those looking to solve all your automated scraping and rank checking needs, be very careful with the limits here. They're pretty restrictive. If you're going to build anything robust, you're better off building a real application to do it. This technique is mainly useful for quick proof of concept or one-off data gathering needs

    reply >
  16. Ok, so I played around a bit with the local keyword ranking sheet and it doesn't seem to give 100% accurate results. It will say for example position 29, but when I verify manually it is 21.

    Even in your example above for 'hotels in seattle', is around position 29 or thereabouts while your ranking shows 24.

    Now I guess this method is OK for approximating position of specific keywords, but +/- 5 is not so good when you are trying to micro manage high rankings.

    I will use this as a proof of concept for other things that can be achieved through these methods, but I think for keyword ranking checking I will stick to my application I was developing. Good article for learning about Google spreadsheets though!

    reply >
  17. @malcolm - sure there are always discrepancies. These are more than likely GEO-issues. I see them 26th instead of 24. But this is a challenge regardless of the rank monitoring solution you use. Do you have a more robust solution? IMHO you always get fluctuation. Sometimes more than +/- 5 depending on KW.

    Bear in mind that Google will crawl the page from the US so unless you're on the Google IP you'll always see slightly different results.

    reply >
  18. Jon

    Great Post Tom! I tried doing the twitter count but get an error in the script, Is there another way to find tweet count w/o google scripts.

    reply >
  19. Hi Tom,

    Thanks very much for this post, I've done some playing around with importxml to get rank data but my spreadsheet was nowhere near as tidy as yours and I hadn't managed to incorporate any forms or scripts - now I've finally learnt how to make it all come together with a few simple formulas that I was missing and simple explanations of scripts and forms. Thank you for making it much easier to learn, play and build without so much confusion! :)

    reply >
  20. des

    Excellent post, it will take me a week to pick through it, and internalize all the info you have given here :-)

    Good one

    reply >
  21. Jason

    William -

    I love this post! This is incredibly handy. Have one small question though - when i try and get anchor text from Google search results, sometimes the anchor text is broken up over several cells, I am guessing because of nesting HTML tags in the link.

    Is there any way to make sure the anchor text appears in one single cell when using import xml? I have been concatenating the results so far and was hoping for something simpler

    reply >
  22. @Jason - unfortunately I don't know of a way to do this. I have the same problem. I'm sure there's a solution but I've not found what it is yet I'm afraid.

    reply >
  23. Hey Tom, thanks for the outstanding post. I haven't paid much attention to Google Docs and seeing what you can with the various import features is awesome. Thanks for sharing this info.

    reply >
  24. Red

    The #linklove conference today has blown this out of the water!
    Gotta think of something else that is bite-size and scalable without relying on the fact that GDocs is most prob gonna wipe it soon - due to the masses. RoR...python...hmmm alas the pipeline perpetuates...the beauty about GDocs is that it's so manageable for one person to scrape data in bite size chunks in a targeted way to "get shit done"

    reply >
  25. astatic

    Quick Question - I'm having trouble transferring a hyperlink embedded into separate text from my Google Spreadsheets RSS feed into my web page. I can transfer either the text itself or the actual hyperlink URL itself, but not the hyperlink embedded into the text. I have it all set up through Google Spreadsheets, but when I call these items through SimplePie it doesn't want to transfer correctly.

    Each text entry is associated with its own URL, and when I integrate the feed into my webpage I would like to be able to have each entry clickable to be taken to that independent URL.

    I would very much appreciate any input, and I'd be happy to provide any details to anyone willing to help.


    reply >
  26. Red

    Dunc, I thought you guys were 2 yrs ahead of the game on Google Docs tricks! ;)

    reply >
  27. Hi guys,

    that's a very interesting article. However, I wonder if you have experienced some problems while using the importXML on a XML file within a schema specified into the header.

    reply >
  28. Your Lesson 1 code returns an error....
    "error: The xPath query did not return any data."

    help please! thank you

    reply >
  29. Wow that is a lot of (helpful) information to process all at once, but SOO valuable! I'm glad I found this site. Oh btw SEO Moz is how I found your page.


    reply >
  30. Awesome, you guys always taking it to the next level! Tom how about about using a feed like OPML or RSS in excel to automatically update new URLs found in Google. Any thoughts?

    reply >
  31. Hi, unfortunately I've the same problem as B.Moore: "Your Lesson 1 code returns an error…. “error: The xPath query did not return any data.” Until now - as a total beginner - I wasn't able to detect the problem. Other examples did work. I would really appreciate an information about what's wrong with the lesson 1 code. Thank you so much.

    reply >
  32. I'm also getting “error: The xPath query did not return any data.” for the first lesson. Any suggestions?

    reply >
  33. Tom Critchlow

    Hey, in response to everyone's comments about lesson one being broken - sorry it's taken me a while to fix but I've gone and changed the code. It broke because we redesigned the distilled site in the move from to and so my scraper stopped working.

    Please let me know if you have any more issues!

    reply >
  34. Great stuff. I've used a pro web scraping tool called (djuggler that is good for complex retrieval, search & database insertion, but for one-off tasks the Google Docs solution you describe seems the way to go

    reply >
  35. Hey Tom, Nice post here! Is there any way you could inform me on how to maybe manage links using this method. Something that checks daily whether or not the link was achieved from an external URL?

    reply >
  36. Does anybody know, if Google has blocked importxml from scraping their SERP's?
    I have tried several times the last few days to get indexed titles and other things from serps, but it just returns that the URL isnt working. It works fine when getting things from other websites...

    reply >
  37. Thanks for writing this awesome guide! I've learnt so much from reading it, will save loads of time with the automated search queries!


    reply >
  38. AWESOME!!! Will need to re-read this when it is not a Monday morning in the office :D

    reply >
  39. Im going to do this as my next project thanks, do you wanna see the results once completed?

    reply >
  40. I don't get it, how comes when I look at the spreadsheet link it doesn't look like it has enough columns as the example above.

    reply >
  41. My God sir! i mean this is something really awesome~! Tons of Good you have with you sir! This is really awesome! em deffo going to try that...

    reply >
  42. Tom,

    Great Post! I was wondering if you know of any resources on how to pull in API data? I wanted to pull in PR, and MozRank and other metrics like this. However, I have no idea where to start at to figure out how to pull in this kind of data.


    reply >
  43. Thanks for the concepts! Very interesting. Very powerfull!

    reply >
  44. Wow. Great tutorial. I never realised Google spreadsheets were that powerful. I'll give it a try. I mainly use Market Samuria for SEO but it would be nice to be able to customise tools for different purposes. Thanks for sharing this.

    reply >
  45. Looks really interesting. Will definitely devote some time to studying web scraping / google docs. Interested to potentially apply this outside SEO in the field of trading (seems like could have interesting applications). Not a programmer myself but seems pretty easy.

    I watched your (great) video Tom on "How to Structure a Major Link Building Project". I noticed the links you put up at These are great resources and link bait in themselves. Since the domain is different to your main domain, how are you building link equity through these? Interested to see your tactics at work.

    Thanks for the amazing content!

    reply >
  46. Google is giving a 404 for your first Google Help link to

    reply >
  47. Try it and its very useful for me. I like it. Spend no more time in webscraping.

    reply >
  48. mike

    I ran across a list of google searches to find things like forums that use my keyword
    for example:
    intext:"powered by vbulletin" + keyword
    intext:"powered by phpbb" + keyword
    What I want to do is come up with a spreadsheet with the formulas set so all I have to do is copy and paste the list of queries top 10 results of each search that I do (and get the title in one column and the url in another). Can that be done?

    reply >
  49. Woweeeee - sooper dooper bang on timing and what a cracking post - thanks for this!

    reply >
  50. Jon

    Great post. I'm still a newbie to SEO, but found it very easy to get about 10,000 URLs from DMOZmusing the above advice.


    reply >
  51. ai

    We're a group of volunteers and starting a brand new scheme in our community. Your site provided us with useful information to work on. You've done an impressive activity and our entire community will probably be thankful to you.

    reply >
  52. Hi. Listened to this in a german conference last week and now I'm searching stuff to build my own sheets. Thanks for your detailed explanation. (Wow.)

    reply >
  53. Andrea

    Went to grab Backtype API Key, is now owned by Twitter.Is there another way to make script work?

    reply >
  54. Great stuff Tom! I'm new to the world of scraping, but your info really got me going.

    reply >
  55. This is a great tool, I use to manually add each site from the location bar and create each spreadsheet for future use but always ran into a problem when the URL to a particular site changed but this tool is very useful as it pulls live data, Thanks Tom for the info.

    reply >
  56. The blog is absolutely fantastic. Lots of great information and inspiration, both of which we all need. Thanks.

    reply >
  57. Very powerfull explanation, many i see what i can do with Google Docs.
    Thanks for sharing Tom

    reply >
  58. Hi Tom,

    Exceptionally well explained.
    I have been using Google docs from many years but i have not seen such an awesome methods to use them like this.
    I hope this kind of work can give some clearness about project.



    reply >
  59. i had a spreadsheet that was just working fine till yesterday and now some cells in the same sheet are giving error (The xPath query did not return any data.)

    I deleted the cells and re inserted the formulae and all... but the result is still the error...??

    What could be the reason for it...??

    Please help...its urgent....

    reply >
  60. Just wanted to say cheers Tom.

    I had a daunting task for a personal affiliate site I'm building (hey, us SEO's have to earn some extra bucks somewhere eh?!) where I wanted to scrape a site of all their product names and prices for import.

    I spent several hours and had covered just a handful of products due to their clunky javascript navigation, but this importxml guide came to the rescue!

    I've now managed to get the data from >2000 products in 20 mins! Acetastic!

    I owe you a beer next time your down under :)

    reply >
  61. Heya

    Great set of tools. I am working through them 1 by one. This post is a little old and I can't get the rank checker query to work (which is n.b. for me!)

    1) Does it still work?
    2) I just get a "#N/A" - any ideas on another formula I can try?

    reply >
  62. The rank tracker in this document doesn't work I made one that works, I'll try to keep it up-to-date as Google updates their SERP html.

    If you have any suggestions for improvement you can connect with me on Twitter @n8ngrimm.

    reply >
  63. Great step by step explanations (much appreciated!) but there's a lot of N/A's throughout. Did the google doc functions change? Any updates?

    reply >
  64. I'm having the same issue with the rank checker. I think the problem is that the formula tries to find text in an array using "FIND", which I believe is for 'finding' text within a block of text.
    But then again - I'm not sure...

    reply >
  65. I couldn't get the rank checker to work either - shame. This page on White Fire SEO suggests that it's no longer a valid feature. If anybody has an updated version that works, it would be very helpful. Shame Tom is no longer working in SEO to help us out, his instructions are easy to follow, unlike many others.

    reply >
  66. In fact, I just found one here!

    reply >
  67. The script looks great but for some reason it isn't working for me in Google docs.

    reply >
  68. This is really neat. I had no idea you could do all this with Google docs. Thanks for sharing.

    reply >
  69. Is the problem solved with the Rank Checker?

    reply >
  70. Nick

    I'm a newbie and think I'm missing in lesson one I learn how to scrap content from a site and then in lesson two I learn how to use google to retrieve results. But lets say I want to search a specific site for various keywords.

    Example: a site on birds called I knowing the class I want, say i want everything in "//p". I want to search and scrap the page from on "sparrows" without having to manually find the specific sparrows url on

    Lesson two: =concatenate(""&A1&"&pws=0&gl=sg&num=1")
    where A1="sparrows"

    Then in Lesson one: =importxml(url,"//p")

    How do I parse the url search result from lesson two into lesson one? At the moment I only seem to get the search result and not the url.

    Again, I'm a newbie so I'm probably missing something really obvious.

    Thanks in advance. And for the record, i'm not looking to republish anyone's content, just compare content from two sites, eg "sparrows, chickens, owls etc..." from and

    reply >
  71. All of the embedded Google Docs aren't working as of 10/04/2013 :(

    reply >
    • Any chance you could update the docs? :(
      Your tutorial seems really interesting, though I can't figure everything out without the embed docs.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>