Combining the HTML5 History API and the Canonical Tag for Improved Tracking

As an active web user I’m always getting emails along these lines:

It’s frustrating. Because the URL that Will’s trying to email me is actually pretty short but because of the Google Analytics tracking parameters attached to the URL it becomes bloated. Every time I see this I cry a little inside for two reasons - firstly as a user I can’t tell quite as quickly what the URL is all about, and secondly as a marketer I know that this means that the website’s tracking will be slightly less accurate.

So, I brainstormed a solution. Please note that the javascript I’m putting here is by no means production ready so please don’t use it on your important sites without first getting a REAL DEVELOPER to check over the code. For example, I believe this will break IE at the moment. If you’re a REAL DEVELOPER maybe you could leave me a better version in the comments and I’ll link to you?

The Solution - Changing the User’s Browser to the Canonical Tag

One of the new magical HTML5 features that I’m excited about is the history API. Rob built a little micro-site that demonstrates the concept very nicely. I love hacking around with cutting edge stuff so I decided to write a little bit of javascript that would fetch the canonical tag from the page and then use pushState to change the URL to the canonical tag (note that this JS requires jquery - I’m sure it’s possible to build a non-jqeury-reliant version very easily):

This means that regardless of the URL that you landed on, the URL you’ll see in your browser will be the canonical version meaning that any sharing, emailing, bookmarking etc of this URL will be neat and tidy.

I put up a very simple page with this javascript installed for you to have a play around.

What Does This Do To My Tracking?

This is a common question (when I bounced this around the office) and the answer is that all your tracking is intact. All the pushstate function is doing is changing the URL displayed in your browser client side - the URL that the server sees you visiting is the one with all the tracking parameters intact. So hopefully this is a pretty elegant solution. For a more detailed analysis of how crawlers handle ajax and more traditional uses of the history api check out this post by Rob Ousbey.

What if Browsers Did This Automatically?

I’m interested to hear from the community - what would happen if browsers were to do this by default? I feel like it might open up a world of neat user experience if browsers would automatically default to displaying the canonical version of the URL. Maybe this would break the web in other ways? What I do know is that the reality of how users are sharing your URLs looks very different to the safe and controlled environment you tested it out in...

(PS - from the archives check out fixing 5 common SEO problems with HTML5)

Green Matrix photo credit Bigstockphoto

Get blog posts via email

7 Comments

  1. Tom Critchlow

    By the way - I literally just stumbled across this:

    http://html5doctor.com/using-modernizr-to-detect-html5-features-and-provide-fallbacks/

    Which might provide the necessary HTML5 detection and fallback for IE. Maybe.

    reply >
  2. Toni

    I still don't see how it solves the original problem. If the user is emailing
    https://www.distilled.net/blog/seo/combining-the-html5-history-api-and-the-canonical-tag-for-improved-tracking/

    Without the tracking parameters, how are you going to track it? All the email shares will look as the user came directly to the page...you will loose the context that the original "share" came from an email or RSS feed.

    reply >
    • Right, but I think what Tom is getting at is that those email shares that came from that specific campaign would not inflate the numbers for that specific campaign - you would only be getting the data for the primary visits, people who got that article through a specific medium at a specific time, so the campaign would actually be reflecting how effective that campaign was in capturing the audience it was aimed at.
      You would lose the secondary and tertiary shares, but your campaign numbers would more accurately reflect people who were actively engaged in that actual campaign.
      Or at least that's what I think Tom is getting at here. Tom?

  3. Eric Kidd

    Neat tool to see in which browser your HTML5/CSS3/SVG element is supported (in this case, history):
    http://caniuse.com/#search=history

    As you can see, History is not supported in IE until the yet-to-be-released IE 10. You CAN use Modernizr to detect what browsers don't support history and use the History JS plugin to bring about history support for non-compliant browsers. Pretty neat!

    reply >
  4. Browsers could do this automatically, but only for very common tracking codes, and I doubt any would do it because of all the incorrect uses of rel canonical around the internet. I especially doubt Google/Chrome would do it, seeing the other ways they've screwed up Analytics data already with (Not Provided).

    Also, I don't do much ecommerce work, but am I mistaken that there are plenty of ecommerce sites that pass additional parameters not used in the canonical tag for that page, in order to avoid duplicate content? This would be a mess for them since users would be copy/pasting URLs with meaningful parameters removed that actually affect content shown on screen.

    reply >
    • My first sentence should have said, Browsers could remove tracking codes automatically, but only for very common tracking services like Google Analytics. As far as rewriting to rel canonical specifically, and I doubt any would do it because of all the incorrect uses of it that can be found around the internet.

  5. Really like the approach here but have to agree with Kane that there's so many bad implementations of the canonical tag on the web that automatic handling in a browser would be a no-no.

    reply >

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>