Combining the HTML5 History API and the Canonical Tag for Improved Tracking

As an active web user I’m always getting emails along these lines:

It’s frustrating. Because the URL that Will’s trying to email me is actually pretty short but because of the Google Analytics tracking parameters attached to the URL it becomes bloated. Every time I see this I cry a little inside for two reasons - firstly as a user I can’t tell quite as quickly what the URL is all about, and secondly as a marketer I know that this means that the website’s tracking will be slightly less accurate.

So, I brainstormed a solution. Please note that the javascript I’m putting here is by no means production ready so please don’t use it on your important sites without first getting a REAL DEVELOPER to check over the code. For example, I believe this will break IE at the moment. If you’re a REAL DEVELOPER maybe you could leave me a better version in the comments and I’ll link to you?

The Solution - Changing the User’s Browser to the Canonical Tag

One of the new magical HTML5 features that I’m excited about is the history API. Rob built a little micro-site that demonstrates the concept very nicely. I love hacking around with cutting edge stuff so I decided to write a little bit of javascript that would fetch the canonical tag from the page and then use pushState to change the URL to the canonical tag (note that this JS requires jquery - I’m sure it’s possible to build a non-jqeury-reliant version very easily):

This means that regardless of the URL that you landed on, the URL you’ll see in your browser will be the canonical version meaning that any sharing, emailing, bookmarking etc of this URL will be neat and tidy.

I put up a very simple page with this javascript installed for you to have a play around.

What Does This Do To My Tracking?

This is a common question (when I bounced this around the office) and the answer is that all your tracking is intact. All the pushstate function is doing is changing the URL displayed in your browser client side - the URL that the server sees you visiting is the one with all the tracking parameters intact. So hopefully this is a pretty elegant solution. For a more detailed analysis of how crawlers handle ajax and more traditional uses of the history api check out this post by Rob Ousbey.

What if Browsers Did This Automatically?

I’m interested to hear from the community - what would happen if browsers were to do this by default? I feel like it might open up a world of neat user experience if browsers would automatically default to displaying the canonical version of the URL. Maybe this would break the web in other ways? What I do know is that the reality of how users are sharing your URLs looks very different to the safe and controlled environment you tested it out in...

(PS - from the archives check out fixing 5 common SEO problems with HTML5)

Green Matrix photo credit Bigstockphoto