Automating Search Query Downloads from Webmaster Central

Just before last Christmas Google posted a method to download search queries from Webmaster Central to a CSV file using Python. Downloading a CSV files can of course be done from within Webmaster Central when you’re logged in, but by using Python and the Windows Task Scheduler it can be easily automated, which is very useful as Webmaster Central Data only goes back one month. With automation it is possible to archive this data and see trends over longer periods of time.

The original post does a reasonable job of describing the process but I’m going to try to go into more detail for those that aren’t as familiar with Python and all that jazz. And unlike the original post I’ll delve into how to use the Windows Task Scheduler for automation. I’m going to assume that you as a user know how to access the windows command line, but that’s about all that I’ll assume.

Getting the Job Done!

1. Download and configure Python.

Download Python 2.7. The default settings will be fine for installation, but you will want to take them one step further by adding Python to your system path if you haven’t. Access the advanced system settings dialogue (it’s in the Control Panel under System and Security > System > Advanced system settings).
Environment Variables
After clicking “Edit...”, you’ll want to add a semicolon to the end of the list of directories that shows up, followed by “C:\Python27\”. If you installed it to a different directory, enter that instead.

2. Get the Google Data Python Library.

This will be the only step that requires you to access the command line directly. Download the latest versionof the Google Data Python Library (2.0.16 as of this post). Extract that archive to a convenient directory, and run the following lines in the Windows command line or PowerShell.

NOTE: If the directory you extracted the archive to is different than mine, be sure to account for that.

Power Shell

If you have problems running the script, make sure the install or test scripts, be sure that you added Python to the system path, as described in the previous section.

3. Download and configure the necessary scripts.

The original scripts are hosted by Google Code, from which you should download all three scripts, but must at least download the downloader.py file. You can get by with just that if you would like to use our Distilled-customized script: get-wmt. As mentioned above, this script has been modified to download a Top Pages CSV as well as the Top Queries CSV, and it also allows you to input multiple domains if you so choose. The downloader.py is used in the background by whichever script you choose to go with, so be sure it resides in the same directory as get-wmt.py or example-simple-download.py.

So when you know whether you want to use our script or Google’s provided script (example-simple-download.py). You’ll need to open your chosen script in a text editor and change the lines highlighted below to reflect your own personal information. The username and password are for the Webmaster Tools account you use, the domains are the sites you want to run the queries for. The domains must, of course, be verified in your WMT account.

This screenshot is of the get-wmt.py file. If you use example-simple-download.py, the lines which need editing are very similar.

4. Run the script!

Navigate to the directory that the script is in and double-click the get-wmt.py file to run it. A command line window should open up while the script is executing. Once it’s closed, you should have new CSV files in the same directory as the script you just ran!

Automating It!

Now we’re going to depart a bit further from the original article, so whip out your Task Scheduler!

Finding Task Scheduler

You’ll want to create a Basic Task:

Create Basic Task

This will start the Basic Task Wizard. The first few steps are fairly self-explanatory and deal with setting the schedule for the task. I personally run the script weekly so that my data will overlap. When you select an action, you’ll want to choose “Start a program”.

This will bring you to the “Start a Program” page:Script

The proper way to set this up is hinted at above. Simply enter the script name in the top box. To tell the Task Scheduler where to find the script, specify its location in the “Start in” area. That’s it! Your script will now run weekly or monthly or daily or at whatever interval you specified.

Have fun with your new data!

Benjamin Estes

Benjamin Estes

Benjamin is a senior consultant who joined Distilled in 2010. Having earned a BA in Mass Media, his intention is to continue studying the ways in which people interact with media and apply those lessons to his consulting. Ben-h264 // Born and...   read more

Get blog posts via email

11 Comments

  1. Ian

    Thanks very much for putting this guide together. Something I've been meaning to get around to.

    reply >
  2. So wait, how do I change the path for Python 2.7? I have limited understanding of what I am doing here. I know I want the automation, but not sure how it all works.

    reply >
  3. Benjamin Estes

    Ah, I sort of glossed over that, didn't I. I added a sentence to that section to clarify, but essentially: when you click on "Edit..." in the window pictured, a box with a list of directories will appear. Add a ";" to the end of that last to indicate you are adding another directory, then type in "C:\Python27\" or whichever directory you used to install Python.

    Hope that helps!

    reply >
  4. Addam Hassan

    Thanks for sharing!

    reply >
  5. Yasir

    Thanks for the detailed explanation. Just ran it yesterday and it worked like a charm.

    I omitted the trailing slash in the domains section which caused the issue so make sure you don't forget the / to your domains in the .py script.

    Cheers,
    Yasir

    reply >
  6. Thanks Benjamin - we were just working on this last week, so very timely.

    Can you tell me if it is possible to automate downloads of linked content or linking domains.? I realize this is not supported in the API, but wasn't sure if there was a hack out there for it.

    Thanks again

    reply >
  7. Benjamin Estes

    Hi Brian,

    I'm not necessarily an expert when it comes to interfacing with Webmaster Central. Generally we use the SEOmoz API for linking domain data. Obviously this is not the same data set as Google's, but it has great metrics and we use it every day.

    Cheers,
    Ben

    reply >
  8. Hey thanks for the tutor and tricks! I have never knew it is possible. Will give it a shot!

    reply >
  9. That is a very good guide, it helped me very much! Some thinks were quite new to me, but im looking forward to try them on my own. Thank you!

    reply >
  10. Does this script work by "scraping" the data, or is there an actual API connection through which this data is obtained? Anyway or plans to build this script directly into Google Docs, so spreadsheets can be updated online?

    Thanks!
    Mike

    reply >
    • Benjamin Estes

      The script downloads from an API. The Python library for the API is provided by Google, this script is makes some simple calls using that library. Since everything depends on this preexisting library there is no way to port it over to Google Docs that I am aware of.

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>