The ImportXML Guide for Google Docs

The Importxml guide for Google Docs has been written to primarily empower SEOs, SEMs and digital professionals across the world to create their own tools.  Inspiration, innovation and all around ingenuity can be attributed to the talented engineers at Google for bringing us this invaluable service. Next time you speak to a Googler, show them some love.

This guide is suitable for absolute beginners to the diabolically clever coders that will no doubt expand on what they learn here. If you’re looking for advanced level Google docs, Tom Critchlow created a full video guide on Appsumo. Enough talk, let’s get it on...

Chapter 1 - Prerequisites - How to look at source code and interpret HTML

Chapter 2 - Introduction to Xpath

Chapter 3 - I’m ready - What’s ImportXML and why should I use it?

Chapter 4 - Your first ImportXML lesson

Chapter 5 - Advanced ImportXML

Chapter 6 - Your first Google Docs script with ImportXML

Chapter 7 - Advanced scripts and API magic

Chapter 8 - Regular Expressions and data extraction

Chapter 9 - Efficiently lightening the load on the spreadsheet

Chapter 1 - Prerequisites

back to top

First things first, go get yourself a Google account. Secondly, if you haven’t used Google Docs and specifically Google spreadsheets you’ll need to take the tour and familarize yourself. If you’re already an Excel user, you’ll be right at home - hurry up we’re waiting!

Great, now we’ll need to make sure you can read some source code. In this instance we’re talking about HTML markup that your browser needs to bring those beautiful websites to life.


Try it! Right click on this web page and choose view source.


Don’t get scared, and more importantly DO NOT give up now :)

Funnily enough this will all make perfect sense to you in a bit, trust us you’ll love it. What you’re seeing up there is mostly HTML markup which is how web pages are built.


If you want to become an expert on HTML visit this introduction at W3 Schools

If you want a quick and dirty explanation, stick with me.

HTML isn’t exactly programming, rather it’s a markup language made up of tags. The tags are surrounded by angle brackets < > and come in pairs - one to open and another to close ( We add a forward slash to close like this: . Think of writing a quote in text “You would always open up the quote by using apostrophes, but the end must come as well so we close them by adding some more”


Try it!  Take your mouse and rest it on the tab of the current page you’re on. You should see something like this:


That information is actually stored in an HTML tag called


Try doing this right now with a live test


*Notice how the second title tag is closed using the / slash. There are many tags which you’ll need to reference in your quest for importing data, so use this cheat sheet to help you.

Chapter 2 - Introduction to Importxml: Exploring Xpath

back to top

Great, you made it! Now that we’re familiar with HTML and such, we’re going to learn how to import some of that data using Google Docs Spreadsheets. Before we do, we’ll need to learn some XPATH first.

Xpath (simple definition): A query language used to pick out pieces of information from web pages.


If you want to become an expert on Xpath, get back to W3 and start the introduction.

Or we’ll take you through another quick and dirty explanation here.


To get you through quickly, we’re going to explain only what you need to know for ImportXML - Parents and Children.

For the following example we’re going to use:

For the following example we’re going to use:

<div> Starring as “the parent”, <p> as “aww the cute child” and co-starring is: “Plain old text” as himself.

<div>

<p>Plain old text</p>

</div>

So we’ve got our stars and now we want to pluck out “Plain old text” as he hasn’t been performing to his best these days.

To do this using Xpath we would write: //p

Ok so you might not get it, let’s get some information from the Godfather (specific to Google Docs):

  • // – this means select all elements of the type<
  • //h3 – this means select all h3 elements
  • [@class=’’] – this means only select those elements that meet the criteria given - Example: look for H1
  • //h3[@class=’storytitle’] – this means only select elements that look like:h3 class=”storytitle”Title/h3

Chapter 3 - Why should I care about using ImportXML?

back to top

Well you’ve come this far and it’s fair that we tell you. ImportXML allows you to import data from web pages directly into your Google Docs Spreadsheet. It’s like magic! We can now look at a page, decide what information we want by looking in the source code, finding the HTML element and by using Xpath we can dig it out.

ImportXML is how we’re going to pull it in. So why exactly should you care about it?

Instead of copying and pasting information from a bunch of websites, this is will do it in 2.2 seconds.

It automatically pulls it into a spreasheet so you can work with your data however you want.

If you’re a trader, you can automatically pull in all your stocks into one spreadsheet effortlessly.

If you’re an SEO, you can acquire 100’s of link targets from around the web.

If you’re an online retailer, check up on your competitor’s pricing for Product XYZ as soon as it changes!

It is truly endless, and only limited by your imagination....and um Google. Have we convinced you? OK come on let’s get to the first lesson!

Chapter 4 - Your first ImportXML lesson

back to top

You’ve got 2 options: You can watch a quick video we made below or continue past the video to get hands on straight away.

The hands on approach: We’re getting used to the code, so let’s go ahead and step through a practical example. A common SEO task is “how can I find as many blogs in specific niches as possible”.

Performing a a quick search turns up with a site listing 100 blogs like this: http://www.onlineschools.org/2009/10/27/100-best-book-blogs-for-kids-tweens-and-teens/. It’s manual and time consuming having to click on each one to copy the link. I want to get the list of URLs into a spreadsheet as quick as possible.

1) First we take a look at the source code of the page and we see something like this:

You can see that all of the blogs are in the li element. Make a mental note!

2) Now fire up your G docs spreadsheet and add the URL in cell A1: http://www.onlineschools.org/2009/10/27/100-best-book-blogs-for-kids-tweens-and-teens

We can see that all the blogs are a separate li element so let’s try something like

=importxml(A1,“//li”)
(where A1 is the cell with the URL of the page). We get this back:

3) As you can see, it contains the blog names so we’re getting there. But our query is also getting a whole load of other stuff we don’t want. So let’s look in the code and see if we can isolate the list of blog items. I find the “inspect element” control in Google Chrome excellent for visualising this. As you hover over the code, it highlights the section of the page that applies to it.

4) We refine our guess to limit ourselves to the div with class ‘intro-box-wide’ using a query like

=importxml(A1,“//div[@class=’intro-box-wide’]//li”)
Which loosely translated says “get the div tag with that class (intro-box-wide) and then select all li (list) elements within it” which results in:

5) We’re nearly there! We now have a list of all the blog elements. The next step is to pull the URL. So we modify our function to be:

=importxml(A1,“//div[@class=’intro-box-wide’]//li/a/@href”)
Which says, from the li elements select the href contents from the a element. This /a/@href is a very common thing to tag on the end of importxml functions so I suggest you memorise it. This results in:

And we’re done! Have a look at the Google Docs spreadsheet go here and make a copy then you can play around to your heart’s content :)

Chapter 5 - Advanced ImportXML

back to top

Ok, now we have the basics down let’s move on. You might have seen the next section, as it’s been detailed by Tom Critchlow in a different post. I’m going to help explain the genius behind his rank tracking - try and keep up. This is a common task that we might want to do so let’s quickly discuss how to do that. Firstly we construct the search URL like this:

=“http://www.google.com/search?hl=en&gl=US&q=”&a2
Let’s break this down a bit shall we?
  • We’re going to get data from google, so naturally we have http://www.google.com/
  • Next is /search? - this means we’re pulling up the search results page
  • We add in hl=en which is the language code. EN=english
  • gl=US is helping us get United States Results, for UK we would use gl=UK
  • q= is your search query ex. cool places in london, cats etc...
  • Optional: You can specify the number of results by adding &num=20 or =50,=100
  • Most importantly, this URL looks like a formula because he adds the =. Also, he adds “ because this is essentially telling G docs that anything in the ” “ is text. When the text finishes he adds the cell reference &a2. Why? This way you can type your next query in A2 and not have to modify the URL again :)
Ok back to Tom:Then we parse the Google URL using importxml like this:
=importxml(C2,”//h3[@class=’r’]/a/@href“)
I’m not going to break that down, hopefully you can figure out what I’m getting off the page (Psst - he’s taking all the links from the listings). Again, check the source code for the page if you’re not sure what to write in your importxml function. Output like this:

As before, Grab your own copy here.

Of course, since this is a common function we might want to roll it into a single cell like this:

=ArrayFormula(MATCH(1, FIND(”https://www.distilled.net“,importxml(concatenate(”http://www.google.co.uk/search?q=“&A2&”&pws=0&gl=UK&num=50“),”//h3[@class=’r’]/a/@href“)),0))

I was going to add some explanation here as to what this formula does but actually it gets pretty complicated. Either you already know what an arrayformula does (in which case it should be straightforward) or you don’t. In which case you probably just want to copy and paste for now :)

I should note at this stage that there is a limit for 50 importxml calls per spreadsheet which limits us from building a full web crawler but for most agile tools this is sufficient.

Cool, so now we have a bunch of moving parts let’s start combining things. Here’s a spreadsheet where you enter a keyword, 4 location variants and a domain and it gives you the ranking position for all the keywords that search suggest spits out for that domain. Nifty.

As always, have a poke around with the Google Doc to understand what’s happening.

Chapter 6 - Your first Google Docs script with ImportXML

back to top


Welcome to your first script in Google Docs. If you’re a beginner you’ll probably want to head to Google Apps and try your first script.

Or, get your hands dirty right away with a very simple tutorial...


Script (simple definition): A set of instructions where Google docs will execute and return values, perform actions etc..

What coding language does Google docs script use? In simple terms, it’s a ”bastardized“ version of Javascript - tutorials here.

We’re going to walk you through a very basic script that puts a a value into a cell. Wooohoo, let’s go!

1) Fire up the old G docs spreadsheet, click on Tools > Script Editor

2) The easiest way to explain the function name is to relate it to something you already know. Any formula in Excel ex. =SUM is a function. This will work the same way if we need it to.

3) First thing we need to do is define the current spreadsheet. Google has already pre-made this:

SpreadsheetApp.getActiveSheet();
That’s pretty long and I don’t feel like typing it out every time. Let’s put it in a box (real name is a variable) and let’s call this box ”sheet“. Much nicer name don’t you think?
var sheet = SpreadsheetApp.getActiveSheet();
4) Alright, now let’s try and put something in a cell. You’ll have to do 2 things here: Tell the script which cell you want to put things in and what you want to put in. To ”get“ the cell we use: .getRange() and to put something in, we use .setValue(). Notice the uppercase letters, they’re important! Ok let’s put it all together:
sheet.getRange(”b3“).setValue(”Hi there!“);
5) Ok, the moment of truth - put all of this in the script and it should look like this:
function myFunction() {
var sheet = SpreadsheetApp.getActiveSheet();
  sheet.getRange(”b3“).setValue(”hi there“);
}
6) Press the play button and just save the script as it comes. Now go back to your spreadsheet, what do you see?

Well done!

Chapter 7 - Advanced scripts and API magic

back to top

Let’s combine the magic of ImportXML + APIs + Google scripts to make some seriously cool tools. For the following example we put Tom Anthony in a cage until he made us a Twitter followers scraper. We wanted to know how many crazy people actually followed him on Twitter using scripting magic.

As always, have a poke around with the Google Doc to understand what’s happening.

The only cell that needs user input is A2 - which is his twitter handle. The rest of the magic happens behind the GO button. Here is your step by step:

1) Select Tools from top menu > Script Editor

2) Copy and paste in the script below. Anything with ”//“ in front of it is a comment and isn’t required to make the script run.

function loadImport()

{

// This variable, ‘ss’, will store a reference to the Spreadsheet.

var ss = SpreadsheetApp.getActiveSpreadsheet();

// And ‘sheet’ will store the Sheet.

var sheet = ss.getActiveSheet();

// In more complex scenarios we may want to be referencing different

// sheets or even different spreadsheets. Normally though, the above

// will be what you want.

// Get a reference to the A2 Cell, where the URL has been placed.

var cellreference = sheet.getRange(”A2“);

// Now we use getValue() to get the URL stored in that cell.

var url = cellreference.getValue();

// Now join that url onto the API URL.

var apistring = ”http://api.twitter.com/1/statuses/followers/“ + url + ”.xml?cursor=-1“;

// Now build our importxml() command

var command = ”importxml(\“” + apistring + “\”, \“//users_list/users/user/screen_name\”)“;

// Now we want to reset our reference and point it to the target cell

var cellreference = sheet.getRange(”B2“);

// Now we put the forumla into the cell and it will pull in some followers for us! Woo!

cellreference.setFormula(command);

}

See the line that begins with var command= ? This is a bit tricky, he’s basically trying to formulate this ImportXML call: =importxml(”http://api.twitter.com/1/statuses/followers/tomanthonyseo.xml?cursor=-1“, ”//users_list/users/user/screen_name“)

The only problem is that when he’s trying to put in the Xpath like so:

\”//users_list/users/user/screen_name\“ it looks kinda funny. Look at the start of this line: \” - Why is that there? Google will assume the quotations are needed to run this script, but really all we want to do is make this regular text in a cell. To make sure Google doesn’t execute the quotes, we use \ (regex) to escape the proceeding character.

3) Save the script!

4) Go to the Insert Menu and choose drawing

Yep, we’re going to make us a shiny button. Draw away friends and when you’re done we need you to right click on your button and “assign script”

In this instance it’s going to be called loadImport. That’s it, you’re done :)

Chapter 8 - Regular Expressions and data extraction

back to top

ImportXML and meta data seem to disagree. I’m sure you’ve tried helplessly trying to get meta descriptions from different web pages with fruitless results.

Ladies and Gentlemen, there is an answer and it’s not ImportXML but Importdata and Regex.

Importdata is a function in Google docs that allows you to import the entire source code of a webpage directly into the spreadsheet. Here’s how it works:

=Importdata(“http://www.example.com”) < Simply call the function and ensure the URL is between the quotes and parentheses and you’re done.

It’s taken over my entire spreadsheet!!?? No sweat, just wrap it in the concatenate function like this:

=Concatenate(Importdata(“http://www.example.com”))

There are a few Regex functions available to you in Gdocs, but for the sake of this tutorial we’ll be working with =Regexreplace(text, regular expression, replace)


We strongly advise you to read about Regex and test your skillz.

However, we’ll break down the following formula to save you the pain of importing meta descriptions...this time ;)


In order to pull out meta descriptions from a web page you need to first Importdata then reference that cell with this formula:

=REGEXREPLACE(Importdata cell,“(.*)(meta name=.description. content=.)(.*?)(\x22)(.*)”,“$3”)

As always, have a poke around with the Google Doc to understand what’s happening.

Chapter 9 - Efficiently lightening the load on the spreadsheet

back to top

You might have the most complex super duper Google docs spreadsheet, but it’s soooo slow. How do we speed this thing up?

  1. Store formulas in scripts
  2. Use script triggers (buttons) to populate cells with formulas
  3. Hide the cells that you don’t need to see. Trust us, it works
  4. Incorporate “clear” functions - all you need to do is set “” values in cells for each set of importXML calls, this helps with clearing out old data and formulas that have gone wrong
As we find more speed fixes we’ll continue to update this section.

Pre-Made Toys

back to top

This is a collection of free tools made by the folks at Distilled, friends, and other very talented people who have shared their tools with us.

Have one you’d like to submit? Tweet @dsottimano, or add it in the comments below and I’ll be happy to look at it!

Tom Critchlow

David Sottimano Tom Anthony  


Resources & Further Learning

back to top

Reference

Blog Posts  


Special Thank-you’s

back to top

A very special thank you goes to the guy who mentored me: Tom Critchlow. Another thank you goes to Tom Anthony for helping me hack tools together everyday at the office. They have both generously contributed to this guide and it wouldn’t have been possible without them. Also, I love solid delivery

 


FAQ

back to top

We need more questions folks, feel free to submit as comments!

Q: How can I add a pause or sleep function in Google Docs with ImportXML? A: There is a function in Google scripts called utilities.sleep and will allow you to pause for a maximum of 5000 milliseconds

Dave Sottimano

Dave Sottimano

David Sottimano comes from a varied background in Corporate Marketing and Professional Sales. His love affair between the internet and marketing has finally found the perfect balance at Distilled, and continues to flourish each day. He graduated...   read more

Get blog posts via email

66 Comments

  1. Tom Critchlow

    Thanks for putting this together Dave - should be a great resource guide for everyone trying to find their way around this stuff. Looks awesome.

    reply >
  2. Many thanks for making my brain bleed on a Friday afternoon... :-)

    reply >
  3. David Sottimano

    @Alan - Tom's been doing that to me for months :)

    reply >
  4. Haha, if that was in any way linked to producing this document then I can see why! There's some really valuable stuff in here, great job in pulling it all together. I'm sure my brain will thank me for it one day...

    reply >
  5. If you're into training and stepping up your enterprise SEO game, today is officially dripping with win. Thanks for this resource.

    reply >
  6. great guide you put together Tom, but as long as Google will limit us to 50 results on an import, I will not consider this seriously for big tasks.

    I'll look in the API though as this looks more flexible.

    reply >
  7. Michael Brenckle

    This is so fascinating! Really enjoyed it and using it hand's on.

    reply >
  8. Nice one David,

    You know that you are obsessed when you have just read this on a Friday evening when you should be at the pub!

    Great work!

    reply >
  9. Great guide, I have a folder of SEO scripts in my docs account and haven't really taken the time to learn how to make my own. Thanks for this.

    reply >
  10. i love these writeups! they are definitely a time saver and i enjoy tweaking them to fill different needs when doing seo research/analysis. keep up the great work!

    reply >
  11. Thanks for the great tutorial. This is eye-opening and a must-read not just for SEOs, but for anyone trying to automate their processes.

    reply >
  12. Great guide!

    Does anybody know how to parse cell value to a script function?
    It keeps saying "undefined"

    =getUrl(A1)


    function getUrl(keyword) {
    Browser.msgBox(keyword);
    }

    reply >
    • David Sottimano

      Thanks for all the great compliments guys - really appreciate it!

      @Sjoerd - Instead of using Browser.msgBox try this instead:

      return keyword;

      Just make sure you have something in A1 and you input =getUrl(keyword) in any other cell.

      Does that help?

  13. Hi David thanks for your replay,

    It does work but its not really what i wanted :-)
    i need to use it inside my function like this:


    function getUrl(keyword) {

    var fetchURL = 'http://www.google.nl/search?num=100&q='+keyword+'&pws=0&gl=NL';

    }


    Why wont this work?
    thanks

    reply >
  14. Ah thats why it doesnt work :(
    I am seeing the captchas as well now.

    Will have to look for a work around,
    Thanks for your help

    reply >
  15. I was playing around with this some more...obviously pulling in different text from a website can be done rather easily. i was curious if you can also pull images? meaning if there was an image or two you wanted to pull into the spreadsheet, could you either extract the URL location of the image, or even copy/paste the image into a cell on the spreadsheet? or maybe even automatically download the image to your computer?

    i tried doing some research but did not have much luck figuring this one out and was curious as if anyone has experimented with importing images?

    reply >
  16. David Sottimano

    Hi Chris,

    Yep, you can now pull in images too! I should have included this in the post but for now:

    In your cell use the =image() function, you'll need to point it to an image URL like this:

    =image("example.com/hello.jpg")

    As for getting all the links of images on the page, it's not perfect but it works:

    =importxml("example.com,"//@src")

    Enjoy :)

    reply >
  17. Thanks David! I ended up using your second option...now i just need to see if i can get it to pull a single/particular image instead of all of them :)

    reply >
    • Joe

      To get a specific image, this template worked for me:

      =ImportXML(A1,"//*[@id='main-container']/div/div[2]/div/div[1]/div[2]/img/@src")

      That way you're getting the @src of a particular image.

      Hope that helps!

  18. Tom

    I wrote ImportXML!

    reply >
  19. Lee

    Chapter 8 – Regular Expressions and data extraction

    hi is there a way to run this on a coloum of 5000 rows

    =Concatenate(Importdata(“http://www.example.com”))

    without having to click on each cell?

    cheers, lee

    reply >
  20. Great post and a lot of useful information...well hence so many comments and I thought it would also be great to say that I have enjoyed reading it and it was very helpful. Anne

    reply >
  21. Hi,

    Great post and loving playing with importXML but can't seem to get Facebook 'likes' working? Some things work on Facebook but just not that 'like' number, for example, I've tried the following but just getting an NA:

    =importXML("http://www.facebook.com/manchesterunited","//span[@class='uiNumberGiant fsxxl fwb']")

    What am I doing wrong?

    Thanks :)

    reply >
    • David Sottimano

      Hey,

      I would use API calls here. Open up your script editor and drop this script in:

      function facebook(url) {
      var jsondata = UrlFetchApp.fetch("http://graph.facebook.com/"+url);
      var object = Utilities.jsonParse(jsondata.getContentText());
      return object.shares;
      }

      *Note: when you copy and paste the code above, the apostrophes can get mangled. Just retype them.

      Next, test it. Copy paste this into any cell:

      =facebook("https://www.distilled.net")

      *Note: if it doesn't work right away, save the sheet, close it and come back in 5 minutes (this is a G docs fault)

      :)

  22. @David Sottimano

    Thanks! I will give this ago tonight. Will similar be required for extracting Twitter Followers for a particular username?

    I'm loving importXML but completely new to it :)

    reply >
  23. I have tried this techniques they seem to work great except that often times the results are very different (usually worse than in reality) from what I find out manually

    is it different data centers that importxml pulls data from ?
    or is it getting a later matching occurence of the domain

    here is my formula with details

    =Arrayformula(MATCH(1, FIND("http://www.katakolon.gr",importxml(concatenate("http://www.google.com/search?q=katakolon&pws=0&num=10"),"//h3[@class='r']/a/@href")),0))


    @Lee
    yes you can, but you will need scripting

    reply >
    • David Sottimano

      Hey Giorgos,

      Yes there can be some variance, mainly because it can pull results from cache (google data centre). There might be a way to get the absolute latest results though...

      The best way is to NOT specify num= and just let it default to 10. Use start= to get the second page.

      I've tried tons of rank checking solutions, and to this day I've had the best luck with G docs.

      Good luck!

  24. Nick

    Great site and lovely post! I am having an a hard time gathering ANY information using the importXML function from one site in particular, and I was hoping you might take a peek for me:

    http://gis.clark.wa.gov/gishome/Property/?action=account&account=125403008

    My guess is it has something to do with the namespace??? I've been searching for days without any resolution. Any help is very much appreciated.

    reply >
  25. Hello David!

    First of all congratulations for putting this together, it is really helpful and has encouraged me to develop my own functions in Gdocs. After throwing this compliment = ) I would like to ask you something.

    I´ve been using the ImportXML formula to get 100 results of different search queries in a single spreadsheet... formula=ImportXML(B3,"//h3[@class='r']/a/@href") being B3 my cell containing the following http://www.google.es/search?q=blogs x&pws=0&gl=es&num=100 ... Everything was going good until last week, but today I just realized that the results are retrieved with additional elements of the code that shouldn´t be there.

    Example: /url?q=http://cronicasdemiarmario.wordpress.com/2012/01/30/best-looks-seen-last-week-on-fashion-blogs-x/&sa=U&ei=2xU5T5CLPIeftwfUkainAg&ved=0CBcQFjAA&usg=AFQjCNGSAiZCtItWbyxmvM22NZaaNDyYiw

    So as you can see there is a "/url?q=" before the URL and also some parts of the code after the last / of the URL, can you help me out? Has Google changed something in the SERP?

    I´d appreciate your help.

    Cheers!

    reply >
    • David Sottimano

      Hi Ivan, I'm going to be updating this soon with the fix :) Stay tuned, or have a play with search & mid functions.

  26. I think google has changed the results html layout
    the importxml ranking does not work any more I am trying to figure out a new XPATH for it

    Can anybody confirm ?

    reply >
    • David Sottimano

      Confirmed. The easy way is to strip out the additional parameters using extract functions like mid, search, find etc.. I'm working on building a custom function & re-writing this guide. Watch this space.

  27. Yes it changed a little bit now you have to put /url?q=http://www. as part of the URL to be found


    =Arrayformula(MATCH(1, FIND("/url?q=http://www.domain.com",importxml("http://www.google.com/search?q=searchphrase&start=0&num=10","//h3[@class='r']/a/@href")),0))

    reply >
    • David Sottimano

      Hi guys! There's a bright fellow at SEER who's already thrown together a custom function for Google serps. You can get it here: http://www.seerinteractive.com/blog/google-scraper-in-google-docs-update We're going to re-write the importxml guide for 2012 and we hope to show you some more cool stuff. For any of you who are still confused of the change, Google added some extra code which disrupts the normal import. For now, use SEER's solution and in the near future, Distilled and SEER will team up :) thanks for stopping by

  28. Raphael

    hi, can anyone help me out to fix this script that i found, i had tried different ways, but im not familiar with the code at all im a newbie on this matters...

    // this function assumes the CSV has no fields with commas,
    // and strips out all the double quotes
    function parseCsvResponse(csvString) {
    var retArray = [];

    var strLines = csvString.split(/\n/g);
    var strLineLen = strLines.length;
    for (var i = 0; i &lt; strLineLen; i++) {
    var line = strLines[i];
    if (line != '') {
    retArray.push(line.replace(/"/g, "").split(/,/));
    }
    }

    return retArray;


    }

    function populateSheetWithCSV(sheet, csvUrl, user, pw) {

    // request the CSV!
    var resp = UrlFetchApp.fetch(csvUrl, {
    headers: {
    // use basic auth
    'Authorization': 'Basic ' + Utilities.base64Encode(
    user + ':' + pw, Utilities.Charset.UTF_8)
    }
    });

    // parse the response as a CSV
    var csvContent = parseCsvResponse(resp.getContentText());

    // clear everything in the sheet
    sheet.clearContents().clearFormats();

    // set the values in the sheet (as efficiently as we know how)
    sheet.getRange(
    1, 1,
    csvContent.length /* rows */,
    csvContent[0].length /* columns */).setValues(csvContent);


    }

    reply >
  29. Thanks for this amazingly useful resource, I keep coming back for more!

    One thing I found for importing meta descriptions using Importdata and regexreplace was that if my page didn't contain a meta description tag I would be left with that huge cell full of the page source code. So I altered the formula as follows:


    =if(REGEXMATCH(REGEXREPLACE('Import Meta Data'!A1,"(.)(meta name=.description. content=.)(.?)(\x22)(.)","$3"),".(<!DOCTYPE)"),"None",REGEXREPLACE('Import Meta Data'!A1,"(.)(meta name=.description. content=.)(.?)(\x22)(.)","$3"))


    This checks the output for a match on <!DOCTYPE using REGEXMATCH and if found, rather than outputting all of the page's code it just returns None. Much tidier :)

    Hope that helps someone.

    Mat

    reply >
    • David Sottimano

      Nice one Matt, you've discovered our little secret ;)

      When I do update this beast, I plan on demonstrating this technique. I just need to figure out the last few kinks, but I just wanted to step in to say good job !

  30. diego

    Hi guys

    I am trying to use the ImportXML but it does not work.
    Even if I copy and paste your example it reports error

    Does Google removed this function or there is any update or whatever to imporove in my Gdocs?

    Many thansk

    Diego

    reply >
    • nickel

      Diego, I had the same problem. There is a stupid solution: replace "," with ";". Works here.

  31. Erich

    I've got a question about how to pass a value from the spreadsheet in to my working ImportXML function within the same spreadsheet. Here's my function:

    =ImportXML("http://api.eve-marketdata.com/api/importxml_prices2.xml?char_name=demo&region_ids=&buysell=b&type_ids="&JOIN(",",$B2:$B7); "/emd/price")

    Now what I am trying to do is pass the region_ids= value from a cell in the spreadsheet (in my case happens to be K5) instead of entering into each function. The function appears 4 times in my spreadsheet and I only want to enter the region_ids in one cell, change it whenever I want, and have the change affect all 4 functions. Any thoughts on how to do this? Should be easy for you guys :)

    Regards,

    Erich

    reply >
    • Erich

      Well I figured out how to get the value in there, simply add &(K5), but it is not updating whenever I change the value of K5 :(

  32. David Sottimano

    Hi Erich,

    This should do it:

    =ImportXML(“http://api.eve-marketdata.com/api/importxml_prices2.xml?char_name=demo&region_ids="&K5&"&buysell=b&type_ids=”&JOIN(“,”,$B2:$B7); “/emd/price”)

    You might need to rewrite the " if you're using the formula I posted as Google docs can change it to weird characters.

    Let me know if that works.

    reply >
  33. David Sottimano

    :)

    reply >
  34. For the Search Suggest + Rankings, it keeps on telling me "error: did not find value 1" where the ranking would show up.

    Can you tell me how to fix this?

    Thank you!

    reply >
  35. Hi firstly I must thank you for the impressive resources posted here. I must confess I am somewhat of a novice with coding and I am new to using Xpath and Google Docs, but I admit I am hooked and love the scope of what can be done with this, from what I can gather thought I am too late for certain features that are no longer working or supported I am right in saying that it is no longer possible to pull ranking positions and also to pull data from twitter because from what I can see those elements fail on my spreadsheets, confirmation from any kind folk would be much appreciated, thank you.

    reply >
    • David Sottimano

      Hey Dave, good name ;)

      Yes it's all possible and still working. There are a few quirks and methods naturally change because Google, Twitter etc... make changes to the way they structure their HTML or / and API calls.

      I really have to update this guide as it doesn't reflect the changes that have occurred in the past year - until then, keep trying and don't give up. I can personally tell you that everything still works, even though we've had a few scares.

      Check back in a month or so Dave and hopefully I'll get an updated version up.

  36. Hey David,

    I've been digging around trying to find a solution to this question for days. I want to use the import function to scrape a page that is password/user protected (its my own page).

    On this page is a large table of information that i would like to get into a spreadsheet that is auto updated every time its opened.

    I can use a firefox or chrome add-on to do what I am trying to do - but it takes too many clicks. Plus a new spreadsheet gets created every time.

    Any help is super appreciated.!

    When I use //* i get:

    "Server Error in '/' Application.Object reference not set to an instance of an object.Description: An unhandled exception occurred during the execution of the
    current web request. Please review the stack trace for more information
    about the error and where it originated in the code. Exception Details:
    System.NullReferenceException: Object reference not set to an instance of
    an object. Source Error:The source code that generated this unhandled exception can only be shown
    when compiled in debug mode. To enable this, please follow one of the below
    steps, then request the URL:1. Add a ""Debug=true"" directive at the top of
    the file that generated the error. Example: or:2) Add the following section to the configuration file of
    your application: Note that this second technique will
    cause all files within a given application to be compiled in debug mode.
    The first technique will cause only that particular file to be compiled in
    debug mode.Important: Running applications in debug mode does incur a
    memory/performance overhead. You should make sure that an application has
    debugging disabled before deploying into production scenario. Stack Trace: [NullReferenceException: Object reference not set to an instance of an
    object.] AspDotNetStorefront.SkinBase.get_CheckUserAgentForMobile() +94
    AspDotNetStorefront.SkinBase.GetTemplateName() +55
    AspDotNetStorefront.SkinBase.OnPreInit(EventArgs e) +2518
    System.Web.UI.Page.PerformPreInit() +31
    System.Web.UI.Page.ProcessRequestMain(Boolean
    includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +282 Version Information: Microsoft .NET Framework Version:2.0.50727.3634;
    ASP.NET Version:2.0.50727.3634"

    reply >
  37. David Sottimano

    Hi Baron,

    This isn't easy, and it's something I was trying to perfect as well. Unfortunately getting Google docs to authenticate isn't simple, so I don't have an answer for you yet.

    I'm working with some badass developers to see if this is possible, scaleable and worth doing.

    All I can tell you is stay tuned, I'll be updating this post soon enough and hopefully I'll be able to answer your question.

    reply >
  38. Jelle

    Hi great resource!

    However when I tried it I copy-pasted your codes and I couldn't get it to work.

    I figured out that you have to use a semi-colon instead of a komma in the formula (at least that was the trick for me...).

    reply >
  39. Morten

    I get this error message "error: The data could not be retrieved. Please check the URL." when trying the basic '=importxml(A1,"//li")'

    A1 contains the URL.

    Both with , and ;

    Any suggestions?

    reply >
    • David Sottimano

      Depends on what URL you're trying to import. Try =importxml(a1,"/") OR =importdata(a1) and see if you can pull in anything.

      Does the URL have a # in it?

  40. David Gill

    Thanks for an incredible useful resource!

    @David Sottimano: any ETA on more insights on authentication? With many tools and resources running behind bars but with good API retrieval functionality - this would surely be very appreciated by the community!

    reply >
  41. Floris

    Hi David,

    What a useful resource, thanks!

    In google spreadsheets I have a list of 5000 phrases of which I want to retrieve the resultcount on Google (search page) for each phrase, e.g. . I managed to get the importXML working for 50 phrases but then I got stopped by the limit of 50: =importxml(https://www.google.com/search?sourceid=navclient&hl=en&q="at loose ends","//div[@id='resultStats']")

    I could manually copy paste the value of each batch of 50 into a separate column, then use the importXML calls for the next batch of 50, copy the value into the new column etc. However, I was wondering, would it be possible to create a script to automate this and work around the limit of 50 this way?

    reply >
    • David Sottimano

      Once everyone gets good at 50 calls, they always start looking for scripts ;) Good job! Yes, you can definitely do this, and to start you off, I'll explain how to do it in plain English. Importxml formula, copy returned value, replace ImportXML formula with copied value, go to next cell and enter importxml formula - then do it again using a loop. Or did you just want me to shut up and send some code? ;)

  42. Floris

    :-) I wouldn't dare

    However, I'm still a Google Scripts rookie, so if it's really easy for you to help me out, it would be greatly appreciated. Otherwise, I'll just work through the tutorial and try to figure it with your hints ;-)

    Thanks, David!

    reply >
  43. Joe Robison

    I can't even get through the first part without it breaking, what gives?

    Thanks! My Test Spreadsheet

    reply >
  44. Ethan Clarke

    I'm trying to use importXML to confirm postal codes for a large list of addresses. When you do a google search for an address, and google finds it, it is displayed in a div called with class ="vk_sh vk_gy" which I understand means that it is under two different classes. There are no other elements inside that div, it's just plain text. Despite it being called with a class, there is only one instance of it on the page.

    What Xpath would I use to get the text within that div?
    I've tried:
    "//div[@class='vk_sh']|[@class='vk_gy']"
    "//div[@class='vk_sh']|[@class='vk_gy']"
    "//div[@class='vk_sh vk_gy']
    "
    and many others.

    Help please! If this works, it will be very useful.

    reply >
  45. Juan Martitegui

    This is awesome. But I've been trying to do something with no luck. Maybe my IQ to low. Can anyone help me grabing this: https://senderscore.org/lookup.php?lookup=209.237.229.171&ipLookup.x=12&ipLookup.y=7... I need to grab the number in the square between 1 and 99.

    The path looks like this:


    <

    div id="lookupHeader">


    71


    but I've been unable too.

    Thanks so much

    reply >
  46. Bernie

    I am returning an api call from facebook

    function facebook(url) {
    var jsondata = UrlFetchApp.fetch("http://api.facebook.com/restserver.php?method=links.getStats&urls="+url);
    return jsondata.getContentText();
    Utilities.sleep(1000);// pause in the loop for 1000 milliseconds

    so in my gdoc I use the function =Facebook(url)

    It returns all the XML.... Since I am making multiple calls for other URL's in the same spreadsheet, I would like to parse the data so i can then have a report with all these data points in columns...

    How can I accomplish this?

    reply >
  47. arjun

    Hi,
    I am getting

    <

    p> tags from the site. I want to replace this to next line character. (alt+enter). I tried replaceregex method. No luck. =RegExReplace(ImportXml())). Is it supported? Thanks.
    Arjun

    reply >
  48. Hello,
    Really great stuff, can anybody pass me a any article link which have basic learning of Google docs with example

    reply >
  49. FYI - all of the GoogleDrive embed examples return the following error message,
    "Sorry, the file you have requested does not exist."

    reply >

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>