Our Will Critchlow delivered a presentation at ProSEO Boston this year which interested many and baffled many more. One of the cooler (and more esoteric) tools he brought up was R, which is a statistical computing program. Working with it feels a bit like working with a Python prompt, but awesome specialized results are even closer at hand.
My goal for this post is to walk you through a first R project inspired by the pretty (and seriously useful) charts that Will showed off in his presentation. His idea was to properly assess the seasonality of traffic (whether year over year or week over week). Fortunately his solution wound up requiring about two lines of code. Check these out:
Explaining the Graphs
So... what is it we’re looking at? The top line represents the actual data set. This is just pulled from Google Analytics and cleaned up a bit to make a nice CSV. The second line is the actual seasonality, or the normal variation expected over different periods. In this case the period is one week, and these peaks and valleys represent the expected variation over the course of one week. The third line is the most interesting. It is the measure of the overall trend of your traffic, whence comes this post’s title. This is summarizing the direction your traffic is going in over time, without the interference of any normal periodic changes. Finally, the fourth line represents how irregular the traffic for any one day is. I’m no statistics expert, so I’m just gonna say that the higher or lower these last lines go, the farther from normal that day was. Or something like that.
First thing’s first: download R and RStudio. Both are free to use. Will didn’t learn of RStudio until after his presentation; it is now my preferred R environment. The R installer has about a million options; I just clicked next a whole bunch of times until it seemed installed.
Prepping Your Data
R wants data in simple CSV files. You don’t need to number lines or anything like that, and column headers are fine. Basically, just keep it simple. The only thing I’d have you keep in mind is that there should be no commas separating thousands in larger numbers; R doesn’t want them. Try to get your data looking something like this, save it as a CSV, and you’re good to go:
In the Studio
When RStudio first loads up, there will be a list of demos in the upper left corner. You can invoke any of these by using the demo() command and putting the name of the demo in the parenthesis. Enter your commands in the console (lower left side). Some of the demos are more inspired than others; some of them are absolutely crazy. Overall, though, you’ll get a lot more value from just following a tutorial (like this!) and just playing around with it until it makes sense.
Let’s get the data into the workspace. In the upper right corner you’ll find the workspace area:
Use the “Import Dataset” feature here to get your CSV into the workspace. This can also be accomplished with the “read.csv” function:
data = read.csv(“filename”)
The imported data set will appear in the upper left area of RStudio so you can check that it is correct. Or, just enter the name of the content into the console, which will echo it back to you. If you want to get a pretty picture really quick, try the following (assuming you have the same two column layout I pictured above).
The [,2] is optional and tells R only to use the second column of data (visitors, in my case). This will be important.
Making the Graphs
First you’ll need to make your imported data into a time series. I’ll assume again that you have your table formatted the same way as I do. If so, this is what you’ll need to enter:
tsdata = ts(data[,2],start=1,freq=7)
“tsdata” is now the time series version of your imported data. If you plot this data (i.e. enter “plot(tsdata)”), you’ll get a similar but slightly more easy-to-read chart than you did plotting the data the first time around. You’ll notice that I’ve again isolated column number two, where visitor data resided. The “start” parameter tells R which row to start at (one being the first), and the “freq” parameter indicates how long the period is (one week in this case).
But we’re already nearly done! To get the graph that I set out as our target, all you have to do is enter the line below into the console:
The stl() function spits out a matrix of numbers representing our insights into the period nature of the data set. Plot() in this case simply draws the four graphs from this matrix. And out pops something along the lines of:
To be honest, using R makes me look smart and feel very stupid, but that doesn’t mean it isn’t worth learning. In the end, all we needed to do to generate this awesome analysis was enter the following lines into R:
data = read.csv(“filename”) tsdata = ts(data[,2],start=1,freq=7) plot(stl(tsdata,s.window=“periodic”))
...which is pretty impressive, I think. For further information about R, do check out their manual pages. They are pretty much incomprehensible to me, so have fun with it! Good luck! And good analysis!