Wait... which way is my traffic trending?

Our Will Critchlow delivered a presentation at ProSEO Boston this year which interested many and baffled many more. One of the cooler (and more esoteric) tools he brought up was R, which is a statistical computing program.  Working with it feels a bit like working with a Python prompt, but awesome specialized results are even closer at hand.

My goal for this post is to walk you through a first R project inspired by the pretty (and seriously useful) charts that Will showed off in his presentation. His idea was to properly assess the seasonality of traffic (whether year over year or week over week). Fortunately his solution wound up requiring about two lines of code. Check these out:

Analyzing periodic traffic data.

Explaining the Graphs

So... what is it we’re looking at? The top line represents the actual data set. This is just pulled from Google Analytics and cleaned up a bit to make a nice CSV. The second line is the actual seasonality, or the normal variation expected over different periods. In this case the period is one week, and these peaks and valleys represent the expected variation over the course of one week. The third line is the most interesting. It is the measure of the overall trend of your traffic, whence comes this post’s title. This is summarizing the direction your traffic is going in over time, without the interference of any normal periodic changes. Finally, the fourth line represents how irregular the traffic for any one day is. I’m no statistics expert, so I’m just gonna say that the higher or lower these last lines go, the farther from normal that day was. Or something like that.

Getting Started

First thing’s first: download R and RStudio. Both are free to use. Will didn’t learn of RStudio until after his presentation; it is now my preferred R environment. The R installer has about a million options; I just clicked next a whole bunch of times until it seemed installed.

Prepping Your Data

R wants data in simple CSV files. You don’t need to number lines or anything like that, and column headers are fine. Basically, just keep it simple. The only thing I’d have you keep in mind is that there should be no commas separating thousands in larger numbers; R doesn’t want them. Try to get your data looking something like this, save it as a CSV, and you’re good to go:

Sample Data

Sample Data

In the Studio

The RStudio interface.

The RStudio interface.

When RStudio first loads up, there will be a list of demos in the upper left corner. You can invoke any of these by using the demo() command and putting the name of the demo in the parenthesis. Enter your commands in the console (lower left side). Some of the demos are more inspired than others; some of them are absolutely crazy. Overall, though, you’ll get a lot more value from just following a tutorial (like this!) and just playing around with it until it makes sense.

Let’s get the data into the workspace. In the upper right corner you’ll find the workspace area:

The "workspace" tab.

The “workspace” tab.

Use the “Import Dataset” feature here to get your CSV into the workspace. This can also be accomplished with the “read.csv” function:

data = read.csv(“filename”)

The imported data set will appear in the upper left area of RStudio so you can check that it is correct. Or, just enter the name of the content into the console, which will echo it back to you. If you want to get a pretty picture really quick, try the following (assuming you have the same two column layout I pictured above).

plot(data[,2])

The [,2] is optional and tells R only to use the second column of data (visitors, in my case). This will be important.

Making the Graphs

First you’ll need to make your imported data into a time series. I’ll assume again that you have your table formatted the same way as I do.  If so, this is what you’ll need to enter:

tsdata = ts(data[,2],start=1,freq=7)

“tsdata” is now the time series version of your imported data.  If you plot this data (i.e. enter “plot(tsdata)”), you’ll get a similar but slightly more easy-to-read chart than you did plotting the data the first time around. You’ll notice that I’ve again isolated column number two, where visitor data resided.  The “start” parameter tells R which row to start at (one being the first), and the “freq” parameter indicates how long the period is (one week in this case).

But we’re already nearly done!  To get the graph that I set out as our target, all you have to do is enter the line below into the console:

plot(stl(tsdata,s.window=“periodic”))

The stl() function spits out a matrix of numbers representing our insights into the period nature of the data set.  Plot() in this case simply draws the four graphs from this matrix.  And out pops something along the lines of:

 

Boom!

Final Thoughts

To be honest, using R makes me look smart and feel very stupid, but that doesn’t mean it isn’t worth learning. In the end, all we needed to do to generate this awesome analysis was enter the following lines into R:

data = read.csv(“filename”)
tsdata = ts(data[,2],start=1,freq=7)
plot(stl(tsdata,s.window=“periodic”))

...which is pretty impressive, I think. For further information about R, do check out their manual pages.  They are pretty much incomprehensible to me, so have fun with it! Good luck! And good analysis!

Benjamin Estes

Benjamin Estes

Benjamin is a senior consultant who joined Distilled in 2010. Having earned a BA in Mass Media, his intention is to continue studying the ways in which people interact with media and apply those lessons to his consulting. Ben-h264 // Born and...   read more

Get blog posts via email

4 Comments

  1. I love posts like this! R Project is something I have definitely not tried out before.

    Is there anyway to use this to predict future trends? Possibly using the same data over a certain time period (say three months) and then using this to predict the month or two afterwards?

    reply >
  2. Hi Ben,

    Thanks for writing this. I've thought for a while that 'removing' seasonality from a data series would be really useful, but I never realised it would be so easy. I'm full of ideas on how to use this already.

    reply >
  3. Benjamin Estes

    Thanks for the comments, guys. I got a request from @stinky_ink on Twitter for a way to remove monthly seasonality from the data. This is really straightforward. Just change the "7" in the line

    tsdata = ts(data[,2],start=1,freq=7)

    to "28". This will set the period for the time series to four weeks instead of one.

    The change in the graphs will be subtle, but you will notice that the period pattern repeats every 28 days instead of seven. Also, the trend line will be a bit more smoothed out (we like to call it "super-high-level-executive-mode").

    @ Aaron Luckie - Thanks for the prediction idea. I've no idea how to do it at the moment, but if I can figure it out I'll be sure to write another post to cover it!

    reply >
  4. Hey, can you cover exporting that in R? When I plot it it shows the original data, however when I assign it to a variable and output the file the original data does not come with it:

    m <- stl(ts(R$data,frequency = 52),s.window="period")
    write.csv(m[1],file = paste("C:/zzzzzz.csv"))

    I can re-construct it later, which is fine, but I'm confused why its in the plot but not in the export code. I don't know enough about exporting time series data.

    reply >

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>