Fixing 5 common SEO problems with HTML5... today!

HTML5 Superhero

For sometime now you’ve probably been hearing conversations about how some problem or another would be much easier solved with HTML5, but because X percent of users are still on IE we can’t “move to HTML5”.

However, in reality, HTML5 isn’t something you need to ‘switch on’ - it isn’t one big thing. Think of HTML5 not as a tool but as a toolset; a collection of new features, in many cases independent of one another. This post deals with the SEO benefit of HTML5, and so our focus is on what Google and co. can interpret and not with browser compatibility. It is not necessary for a browser to recognise all the tags on a page, as long we do not disrupt its ability to render that page (and if the browser gets some benefit too, then even better). However, we will touch on what browsers do and don’t support and how this relates to SEO; there are plenty of bits in the HTML5 toolset that the browsers have absolutely no problem with, and where they do we can often help them out.

We are going to look at some common problems encountered with SEO, and discuss how HTML5 could help to address them. I’m sure there are many other situations where HTML5 could help us out, but I’ve picked 5 to get going with. Note that not all of these are open and shut scenarios, but I think they are all scenarios where there HTML5 can provide a measure of assistance, be it now or in the very near future.

Problem 1: Pagination

Issues: duplicate content, crawl budget, juice flow

Pagination on an eCommerce siteYou have a tonne of pages, e.g listing products in a particular category, which are completely identical asides from which subset of your products are listed on them. Now you want to ensure that you rank, but you also do not want to run into duplicate content issues, or waste your crawl budget letting Google crawl hundreds of pages which add no value. However, maybe you do want them to be crawled to ensure the content is indexed? Maybe you want them crawled but are aware these same products are listed in different groups and sequences in varying categories and you are worried about the implications of this. If only you could just tell Google: “Hey! These pages are paginated listings, so please treat them accordingly!”.

This is a common scenario for many sites, especially eCommerce sites; however, whilst a common problem, it is still something we see many clients struggling with. Furthermore, often clients will have some specific site quirk or preference which makes this less straightforward than it should be.

Unfortunately Googlebot is not yet sentient and so we can’t chat with the little guy just yet, so we have to find another way. Enter HTML5 and sequential links:

A sequence of documents is one where each document can have a previous sibling and a next sibling. A document with no previous sibling is the start of its sequence, a document with no next sibling is the end of its sequence.

A document may be part of multiple sequences.

Source: W3 HTML5 Spec

That pretty accurately details our problem! So how do ‘sequential links’ work? Well, of course, via our old friend the rel attribute. Previously your previous page and next page buttons might have looked like:

<a href=’products.php?page=3’>Previous Page</a>
<a href=’products.php?page=5’>Next Page</a>

We simply add the rel attributes accordingly:

<a href=’products.php?page=3’ rel=’prev’>Previous Page</a>
<a href=’products.php?page=5’ rel=’next’>Next Page</a>

This sends a clear signal that these pages are a sequence that belong together, and obviously this semantic information is a clean way for letting Google, Bing and the other engines know about this relationship.

Now, this isn’t a silver-bullet and I would not yet recommend that you do just this and nothing else, but I’d absolutely recommend that you do add these attributes to your pagination links. It isn’t the complete solution because there are ‘sequences of documents’ that are not product listing style pages (e.g sections in a tutorial) and we do not yet have a complete picture of how much Google will trust these attributes. However, the more you can tell Google about your content the better you help her (is Google a girl?) understand and index your content, and these attributes are a fantastic indicator to Google and a good place to start.

Actions: Deploy rel prev/next on pagination links.

Problem 2: Page structure

Issues: accurate indexing, crawl budget, SERP CTR, juice flow

For years now we have been reminded over and over to focus on semantic HTML. Originally the focus on this was that it makes rendering content across devices and formats far easier when it is neatly categorised: HTML for content and meaning, CSS for presentation and style, and Javascript for additional behaviour. Removing anything in your HTML that was just there for presentation was not too difficult, but managing to fully define the meaning of the content with HTML was pretty much impossible - HTML simply wasn’t a rich enough language. Microformats started flooding in trying to fill some of the gaps, but the fact is that HTML remained ill equipped for the task.

Once again, HTML5 swoops in to save the day! It doesn’t provide all the answers, but it does go a long way towards helping by providing a whole raft of new tags we can use to organise the web’s content and give Google and the engines a lot more help in interpreting our sites. Some of these tags include:

<section>
<header>
<footer>
<article>
<hgroup>
<aside>
<nav>

These mostly relate to what you’d expect and I’m not going to describe them each in detail here; if you’d like to read a bit more about them then check out w3schools’ excellent HTML5 Tag Reference.

I’ll just give a couple of examples...

Previously the constituent parts of most pages were separated with the overworked div tag:

<div id=“myarticle”>
...
</div>
<div id=“extrafacts”>
...
</div>

However, in HTML5 we have a much wider array of tags to assign semantic meaning to these elements, so the same example may look like so:

<article>
...
</article>
<aside>
...
</aside>

You should immediately see the difference - the tags themselves now transmit some information about what it is they contain. For any search engine trying to interpret the page and catalogue the information on that page having these indicators provides a massive benefit.

As another example, we can now break pages up far more neatly; consider the traditional H1 tag problem, and look at this example:

Google News Top Stories

What is the appropriate H1 on this page?? The fact is that not all pages lend themselves to being divided up with one main title. HTML5 uses a more sensible approach an allows us to have multiple

and
elements and multiple H1s on a page:

<section id=“finance”>
<header>
<h1></h1>
</header>

... <footer> </footer> </section> <section id=“entertainment”> <header> <h1></h1> </header>

... <footer> </footer> </section>

HTML5 is intelligent enough to understand that these

and
elements are the headers and footers for their parent tags.

For an SEO this makes structuring a page significantly easier, and for a search engine it makes pages significantly easier to interpret and that helps with accurate indexing.

It isn’t completely clear how well Google handles all this new found awesomeness, so I’d advise a pragmatic approach at the moment. If you are uncomfortable with multiple H1 elements, then you can still employ the other elements. You’ll probably find that with most projects there are elements of these you can include already.

Whoa there... one last thing before you dive in!

With Firefox, Chrome of Safari you’ll be able to apply CSS to these new tags right out of the gate, but for a certain other browser this is not the case... In IE these tags will be inaccessible to CSS without a bit of help. Luckily enough, there is a solution out there in the form of the free and open source html5shiv. Download this smart bit of Javascript and install it and you’ll find you are good to go. Take that IE!

Actions: Gradually begin refactoring your page designs to slowly integrate these new elements. Begin replacing or supplementing generic div tags with these new tags.

Problem 3: Internal search pages

Issues: accurate indexing, duplicate content, crawl budget

What happens if you Google Bing’s results page for Googling Bing? Well, nothing actually because they block it with robots.txt, but my point is when a search engine starts crawling another search engine’s results pages the universe gets uneasy.

Now if you have an internal search feature on your site, the standard answer would be to block it with robots.txt and stop the hellish nightmare that can otherwise ensue. However, some sites actually blend the search feature with weird navigation systems or even use the search results as a way to list certain product categories that they then link to. The best solution is to fix the site IA and make this a non-issue but it isn’t always as easy as it should be.

Furthermore, you may find a lot of people linking to search results pages, and in some cases you may want them indexed.

Well, HTML5 provides a quick and easy way to let Google know what is going on...

<a href=’/search.php’ rel=’search’>Search the site</a>

This lets the search engines know that the target page is a search engine, and not to sound repetitive but that is a really helpful signal you can be providing the search engines and will take all of 20 seconds to add to each necessary link. Whilst you are there you may want check out opensearch.org which allows for some seriously cool stuff - including letting browsers discover your search feature and incorporate it directly into the browsers search bar.

Actions: Add rel search to all links to any on site search features.

Problem 4: Microformats != schema.org

Issues: social sharing, SERP CTR, rich snippets

Microformats and RDFa are two forms of embedding machine readable meta data into our web page that are both quite well known in the SEO community. Microformats and RDFa have seen a marked uptake in use since Google introduced rich snippets in 2009, allowing SEOs to add some bling to their SERP listings:

rich snippet example

Microdata is another such format, and is part of the HTML5 spec, but has remained somewhat in the shadows and hasn’t seen the widespread adoption of the others.

More recently there was the announcement of schema.org from Google, Bing and Yahoo. Along with plenty of excitement this seems to have caused some confusion with many SEOs thinking this to be a new format.

Schema.org is not a format or a language in itself, but it actually a vocabulary which the search engines have all agreed to understand and respect. It lays out what types of entities and attributes you can insert into the metadata on your web pages and guarantees that all the engines will understand these.

BUT, and here’s the rub....

The schema.org vocabulary is only available to those speaking in the microdata format.

You cannot use it with Microformats or the RDFa format currently. The engines say this will change in time, but in the meantime if you want to take advantage of the schema.org vocabulary you need to use HTML5’s microdata.

The schema.org site itself has plenty of details on how to use microdata and put it on your webpages, so get over there and get reading.

Luckily, you aren’t forced to decide between these formats, and Google have confirmed that it is ok to have these different formats on the same web page marking up the same metadata. So even if you have Microformats or RDFa you can still employ Microdata with schema.org, and I encourage you to do that. I do imagine, though, that Google are going to start showing a preference to metadata using the schema.org vocabulary.

Actions: Include Microdata alongside Microformats and RDFa you already have. Look for further opportunities to use these formats.

Problem 5: AJAX and URLs

Issues: accurate indexing, social sharing, juice flow

This one is well known and disliked by pretty much every SEO that there ever was. AJAX sites are really nice for users and improve the user experience greatly.... right up to the moment the user tries to bookmark the page they are on, or email it someone, or share it via social media, or use the back button, or find the page in their history the next day.

AJAX and SEO simply were never designed to mix, and now we are in a world where people want both. If you have somehow managed to avoid this problem and aren’t aware of is then I’ll briefly outline it... AJAX allows a webpage to, via the use of javascript, update the contents of a page without actually reloading the page; a new HTTP request will be sent and the new content will probably replace some old content on the page but because the page does not reload the URL does not change.

The traditional method to address this to ensure the Googlebot can spider the content is simply to ensure the AJAX calls are hooked to traditional <a> tags so you can include an href to a version of that same content which Google will pick up (and far too often even this hasn’t been done - meaning the content is stranded and will never get indexed). This is fine for the crawling aspect of SEO, but nowadays we need to consider the fact that social shares are an important aspect of SEO too and if the user can’t copy and paste the correct URL then you are already handicapped.

So, you guessed it, HTML5 comes to the rescue by providing some new DOM features that we can use with javascript to dynamically change the URL in the address bar without reloading the page. This is in the form of history.pushState() and its associated methods (replaceState() & popState()):

var stateObj = { foo: “bar” };
history.pushState(stateObj, “page 2”, “bar.html”);

Not only does history.pushState() allow us to update the URL on the fly, but it pushes these new URLs to the browsers history, so the user can use the back button as expected and find the page in their history later on.

Check out this demo of demo of HTML5 pushState.

Unfortunately, IE does not yet support pushState even with IE9 and so you cannot completely forget about using horribly workarounds that use the #, but luckily there is a pre-built solution that does the fallback for IE for you. You shouldn’t let the fact that this cannot work on IE put you off integrating it for the other browsers though; it is better to allow these improved features for some users than none at all.

Actions: The next time you are involved with the design stage of an AJAX site integrate PushState into the solution. Also consider refactoring the AJAX content you already have.

Wrap up

HTML5 hero and IE menace fightingHTML5 and the related technologies are incredibly exciting for SEO, users and the web as a whole. However, it is still going to be quite a while before there can be widespread rollout of many of the features mostly thanks to the fact that even the latest version of Internet Explorer doesn’t even attempt to support some features.

However in the meantime, there are certainly parts that can be used for SEO and parts that can be used for user features, and I encourage everyone to adopt these where possible. If there are certain features you want to roll out for those users who have modern browsers, whilst still supporting those using versions that lack that feature you should check out this extensive collection of cross-browser HTML5 fallbacks.

Where next?

For my part, I’m wondering what extensions the search engines themselves might offer once HTML5 sees wider adoption. Now that Google+ posts are showing in the SERPs, likes Tweets before them, we are seeing a migration away from ‘web search’ meaning searching web pages, and more towards it meaning searching web information. Maybe we will soon see a day when we can use rel canonical on the
<section> and
<article> type tags, which would truly allow us some flexibility without being niggled with doubts about duplicate content.

Then there is WAI-ARIA, a separate specification dealing with web applications, but which is accounted for in HTML5. With it’s ‘role’ attribute we are taking the first steps towards being able to tell crawlers about our web pages’ behaviour and not just their content.

The thing I do know is that HTML5 is coming, and every SEO needs to get up to speed. If you can start not only learning HTML5 now, but already deploying aspects of it then you are going to be well placed when the full force of HTML5 hits the web. See you on the other side!

About the author
Tom Anthony

Tom Anthony

With a background in freelance web development, a degree in Computer Science, a PhD in Artificial Intelligence (almost – he is still writing his thesis!) and having taught himself to program on a BBC Master compact at the age of 8, it could be easy...   read more