As an SEO agency, we often work with external web developers. They may be in-house working for the client or another agency like ourselves but focusing on web design and development. The level of SEO knowledge varies greatly from agency to agency and sometimes we are brought in to train developers on the parts of SEO they often influence without knowing it. Below I’m going to talk about seven elements of SEO that all developers should at least have an awareness of.
I just wanted to quickly clarify Technical SEO v Onsite SEO. For me, Onsite SEO elements are those which the user can see without looking at the source code. So I’d include things such as -
- Title Tag
- Body Text etc
Technical SEO involves the elements of a page that the user can’t see without looking at the source code. These will include elements such as -
- IP Detection / Redirection
- Site Speed
- 301 and 302 Redirects
- HTTP Headers
- Crawler Access
1) IP Detection / Redirection
I recently experienced this problem first hand on a client project and it was very, very messy. For those of you unfamiliar with this, IP detection and redirection involves determining the IP address of a user on your site, then showing them content (or redirecting them to a new URL) based on their location. To give an example, if someone landed on www.domain.co.uk and their IP address indicated they were in France, you could redirect them to www.domain.fr which contains French content.
For the user, this isn’t actually a terrible thing. It isn’t foolproof as IP detection isn’t always 100% accurate, but usually it means you can show users content which is more relevant to their location and language. Sounds good and makes sense.
However for the search engine crawlers, this can be very troublesome. Mainly because they will usually crawl from a US based IP, this means that they may never see some of your content. The recent client of mine were redirecting based on IP and ended up redirecting the search engines to their .us domain. This meant that the search engines were not seeing the other countries they were targeting including the UK and Australia.
John Mu backed this up on a Webmasters thread:
Yes, many search engine crawlers are currently based in the US......One thing which I believe you may be hinting at is automatically redirecting users based on their location — we’ll get to that later in the blog post series, but generally speaking, yes, that can cause problems for search engine crawlers and because of that, we’d strongly recommend either not doing that or limiting it to a certain set of pages (say the homepage) while still allowing crawlers to access all of the rest of the content on the sites.
Whilst developers (and some SEOs) often think there is nothing wrong with IP detection and redirection, you can see the problems that it can cause. So you will need to speak to them and let them know of the impact it can have on crawling and indexing of the site. There are times when IP detection does make sense, but I’d advise lots of caution and to make sure you are not accidently stopping the search engines from seeing your pages.
2) Site Speed
This should definitely be high on the list of priorities for your developers to be looking at. Not just because we know it to be a ranking factor, but mainly because a fast site is better for users and ultimately, conversions. How long do you stick around on a site that takes longer than a few seconds to load? Users are the same.
From an SEO point of view, you need to care about site speed because Google is obsessed with speed. I recently read In The Plex which gives an insight into the early days of Google and it describes instances where Larry Page has measured the speed of products in his head and been accurate within tenths of a second. Every product that he gives feedback on needs to be super fast for it to stand a chance of moving forward. Google understand how much users care about speed, so you should too.
If you are struggling with developers here, go over to webpagetest.org and compare your client site with some competitors. Then send the developers a copy of the video:
This can often give them the nudge they need to take site speed a bit more seriously. In terms of specifics, take a look at this epic guide from Craig about Site Speed and SEO to get hands on tips and tools for improving site speed.
3) 301 and 302 Redirects
Sorry but lots of developers (and SEOs) get this wrong. Right now, you only need to implement two types of redirect - a 301 or a 302. Thats it. No 303s, 307s or anything else. There are two main ways that this can be messed up, the first way I’m going to talk about is using the wrong type of redirect.
Getting 301s and 302s Mixed Up
To give some background and context. A 301 redirect is usually used in SEO for one of the following reasons -
- A page has moved somewhere or been taken down, so you want to redirect users and search engines to an appropriate new page
- Somehow you have created some duplicate content and want to remove them from Google’s index by redirecting them to the main canonical version
A 301 redirect will usually pass nearly all of the link juice and equity across to the URL it is pointing to. So it can be a good way of giving some strength to different pages and making sure you’re not losing any link juice on pages that 404 etc. This is why its so useful for SEOs.
Despite not being SEO friendly (we’ll cover why in a moment), there are some genuine reasons for an SEO using a 302 redirect -
- A page may just be temporarily unavailable, for example a product that is old of stock on an ecommerce website that will be back in stock very soon
- You may want to test moving to a new domain to get some customer feedback but not want to damage the old domains history and rankings
A 302 redirect works here because you are confident that the move isn’t permanent. Because of this, Google will not pass link juice across the redirect, nor will they remove the old URL from their index. These are the very same reasons why getting mixed up with 301s and 302s can hurt your SEO performance.
The common reason why some SEOs and developers get this wrong is that for the user, they don’t notice any difference. They will be redirected anyway. But the search engines will notice the difference. I’ve seen an example of a client moving their entire site to a new domain and all redirects being a 302. This is bad because -
- Link juice will not be passed across to the new URLs, meaning they are unlikely to rank well in the short term and possibly long term
- Google will not get rid of old URLs from its index which means you can have multiple URLs from the old and new domains indexed at the same time
So you can end up in a situatuon where your new site is being indexed but has hardly any strength and therefore doesn’t rank. I’ve seen instances of severe traffic drops because of incorrect implementations of redirects like this. The following image courtesy of Elliance does a good job of displaying the differences:
Redirecting all URLs back to the Homepage
This is another problem I’ve come across more than once. Google advises that when you implement redirects, you do it on a one for one basis. For example -
You should redirect:
http://www.old-site.com/product-name-12345 to http://www.new-site.com/product-name-12345
http://www.old-site.com/product-name-10000 to http://www.new-site.com/product-name-10000
What some people do wrong is redirect -
http://www.old-site.com/product-name-12345 to http://www.new-site.com
http://www.old-site.com/product-name-10000 to http://www.new-site.com
Redirecting all pages back to the homepage, or even a single top level category, is bad for users and can sometimes look manipulative. Also it is not passing the much needed link juice across to the deep pages within your site that need them, in the example above, the product pages are not getting the juice they need.
Again, I’ve seen many sites lose a lot of traffic by doing this. Mainly because they lose rankings for their deep pages which are usually long tail.
4) HTTP Header Codes
Chances are that many developers will know what these HTTP header codes mean, but in relation to SEO, they may not know what effect they have or how the search engines treat them. There are lots and lots of status codes out there (did you know that the 418 status code means I’m a teapot?!), but as an SEO, you should certainly get to know the following HTTP headers well and know what impact they can have. For a great visual way to understand header codes, the guys at SEOgadget made an infographic, click on the image below to open the full infographic:
200 Success - This means that the page has loaded successfully. For the users and search engines, this means that the page is working fine and should be indexed and ranked.
301 Permanently Moved - This has been covered in more detail above but in summary, means that a page has permanently moved elsewhere. Both users and search engines are redirected and most link juice is passed across the redirect, the old page is removed from the index.
302 Temporarily Moved - Again, this is described above, but means that a page has temporarily moved elsewhere. Users will not notice the difference between a 301 and a 302, but search engines will not pass link juice across it nor will they de-index the old URL.
404 Page Not Found - You are probably familiar with this one. For users and the search engines, this means that the page being requested could not be loaded. If an indexed page suddenly becomes a 404 page, over time the search engines will stop ranking it (from my experience and tests anyway).
Quick sidenote here from experience. Something I’ve come across a few times is the situation where a page is displayed to the user which looks like a 404, however when you look at the HTTP header, it will show a 200 Success code instead. This is not good and Google have classified these as soft 404s. The reason they are not good is that its difficult for you to spot these errors using server logs or Google Webmaster Tools. Although Google does try to spot them, its best not to rely upon Google to do this for you.
Best practice is to make sure that a 404 page actually returns a 404 HTTP header. By the way, you should check out the Distilled 404 page which our intern Andrew built.
410 Page Permanently Not Found - I’m not sure why I’d use this rather than a 301 redirect, but there may be some good uses and its worth knowing how Google treat it.
500 Internal Server Error - This is a generic error page and isn’t very helpful as it doesn’t usually give much detail as to why the error occurred. You should definitely try and keep these errors to an absolute minimum.
503 Service Unavailable - Whilst this isn’t a code you should commonly use, it is useful to know if your site is temporarily down and will be back shortly. For example if you are relaunching a site or a new design, you may have to do this by taking the site offline. Its better to return a 503 so that the search engines know to come back later. John Mu also confirmed Google’s position on this:
“Optimally, all such ”the server is down“ URLs should return result code 503 (service unavailable). Doing that also prevents search engines from crawling and indexing the error pages :-). Sometimes I’m surprised at how many large sites forget to do this...”
5) Crawler Access
For me, restricting crawler access and optimising your crawl allowance is an overlooked part of SEO. Probably because it can be a bit hard to implement and its not really an exact science. To understand this and optimise it, you must first become comfortable with the concept of crawl allowance. Rand wrote a great post over on SEOmoz about this following some comments from Matt Cutts on the topic.
At a basic level, Google will crawl roughly based on PageRank as Matt Cutts has explained previously:
Bottom line for SEOs - Don’t think that Google will automatically crawl and index every page on your site, whilst Google have made it obvious that they want to find every piece of information on the web, they do have limited resources and must be selective about which pages they crawl over and over.
The learning here is to make sure that you are not wasting your crawl allowance on pages which you do not care about. Try to make sure that when Google does crawl your site, they are spending time on your important pages. There are a number of ways to do this, just having a good site architecture is a pretty powerful way in itself:
Unfortunately, many SEOs are not always in the position of being able to work with developers and define a site architecture from scratch. Usually its a case of working with an existing site architecture and trying to fix and optimise what you can. There are a few ways to do this and you need to work with developers to use these techniques effectively.
Robots.txt - This is the first file that a search engine will request when they crawl your site. Within this file, they will try to see if there are any areas of your site or specific URLs that they should not crawl. There is some debate as to how strictly the search engines obey whats contained in the robots.txt file, but I still think you should use it and feels its reliable in most cases. A robots.txt file typically looks something like this which is from Amazon.co.uk:
If you are unfamiliar with how to write a robots.txt file, its best to get comfortable with it prior to speaking with developers on the topic. Read this guide from Google and test on your own sites.
The action to take here is to take a good look at your site and decide which sections you would not like the search engines to crawl. Use some caution here though, as you don’t want to block pages by accident.
Rel=Canonical - I’d never recommend using the rel=canonical tag on a new site. Some may disagree with me, but I see rel=canonical as a last resort in solving site architecture issues. If you can avoid these issues in the first place, do it. Don’t think of rel=canonical as a tool to help you.
The key reason for this is that this tag is not a strict rule. The search engines do not have to take notice of it and can change how they treat it at any time they want. Current evidence suggests that the search engines take notice of it and use it quite strongly. But I’d still advise against relying on it for the long term.
It is worth making your developers aware of this tag and making sure they know the implications of using it. It can be a great help in solving duplicate content issues, but at the same time, it can easily go wrong if you do not use it correctly. The big advantage to using rel=canonical from a development point of view is that (in theory) it is easier to implement than a 301 redirect:
The bottom line here is that you can’t allow clients, developers or designers to build their entire site based on flash elements. Enhancing a page using flash is fine, but I would be mindful of how search engines can see it.
I think there are a few key takeaways:
- Love your developer! They can do some awesome work for you and do not underestimate their ability to do cool stuff to help SEO
- Don’t assume they know everything - be prepared to help them understand the SEO implications of their work
- Give developers credit where its due - if they make a change to a site and it helps results, tell them