Getting Started with .htaccess: a Guide for SEOs

As an SEO, you'll almost definitely have strayed upon an Apache configuration or .htaccess file at some point. We're frequently giving recommendations to clients over duplicate content and redirects, but the implementation itself can seem intimidating and complex. While there's definitely a learning curve if you want to really master .htaccess, I think it's enough for most SEOs just to have a rough idea of what they're looking at. This blog post won't turn you into an expert, but it should leave you a little more confident when talking with your clients or developers, or looking over their Apache configurations. You'll be more able to understand what you should be looking at, what the different redirects actually look like, and a sense of how difficult or easy they are to implement.

When you open up a .htaccess or httpd.conf file, there could be many, many different options in there. We only need to focus on lines that contain the word Rewrite: RewriteRule, RewriteBase, and so on.

Start the Engine

Open the file, and look for the following line first:

RewriteEngine On

Simply put, all this line does is turn the rewrite engine on (as you probably guessed). If we have RewriteEngine Off, none of the rewrite rules will take effect. Rather like accidentally blocking your whole site in your robots.txt file, it's possible that none of your redirects are taking effect because the RewriteEngine is set to Off, or because this line is missing altogether, so it is worth giving a quick check.

The other detail you should notice about this line is that we have just a single space between the two words. We don't need to use any special characters, braces, or parentheses. You won't see anything like RewriteEngine=On or RewriteEngine("On"), as you might expect in some programming languages. The single whitespace character is enough to tell Apache that we want the RewriteEngine directive to take the value On.

Set a base

Next up is RewriteBase. Let's say our website is we're running on Apache, and we have the following lines in our configuration file:

RewriteEngine On
RewriteBase /
RewriteRule page1.html page2.html

Again, there's just a single space between RewriteBase and /; it's working in the same way as before. RewriteBase is the directive, and / is the value we're setting (the 'base'). What we're saying here is "interpret the following rules as relative to this base". / is the root folder, so what we have above is equivalent to:

Original URL

Rewritten URL

We set the base to be the root folder, and then on the following lines we started mentioning page names (page1.html and page2.html). They're interpreted as being in the root folder. Let's change the example slightly:

RewriteBase /some-folder/
RewriteRule page1.html page2.html

Notice we didn't need to change anything in the second line, but the base is now `/some-folder/`. Now our rule has the following effect:

Original URL

Rewritten URL

Don't worry yet about the rule itself; just notice the effect of changing `RewriteBase`. From what we know so far, we can start to build a .htaccess or httpd.conf file with just the following two lines:

RewriteEngine On
RewriteBase /

Getting started with rules

So the rewrite rules are in effect, and any URLs are relative to the root folder. Now it's time to actually write the rules! Let's start with one we've seen already:

RewriteRule page1.html page2.html

The format of this line is pretty simple: there's the name of the directive (RewriteRule), the URL we want to rewrite (page1.html), and finally what we want to rewrite it to (page2.html). Again, these arguments are separated by spaces.

One thing we need to be aware of here is that we haven't yet redirected anything: we're just rewriting the URL. As far as the user (or Google) is concerned, they've asked for page1.html, and they got something back. There was no redirect; there was no indication that anything was wrong with their request, or that anything had moved. Specifically, the content we gave them was whatever was on page2.html. The interaction is a bit like this:

Google: Show me all the content on page1.html

Server: Sure, here you go. (provides the content of page2.html)

Now maybe this is what you want, but as SEOs we're probably only interested in the RewriteRules when we're trying to reduce duplicate content. We need to make one further amendment to our RewriteRule. Let's make it a 301 redirect by adding just one more argument:

RewriteEngine On
RewriteBase /
RewriteRule page1.html page2.html [R=301]

A 302 redirect, of course, would be identical but for the final argument, which would be [R=302]. Now if Googlebot asks for page1.html, the exchange with the server is like so:

Google: Show me the content on page1.html

Server: Nope, it has 'permanently moved' -- you want page2. Ask me for that.

Google: Show me page2 then.

Server: OK, here you go.

So now we have some understanding of how this can be useful to us as SEOs. If a client is using Apache, we already have an idea of basic things we can check:

- Is RewriteEngine On?

- Does the value of RewriteBase makes sense together with the rewrite rules?

- Are redirects implemented where they ought to be, or are we just rewriting?

- Are there any 302 redirects which should be replaced with 301s?

And we now know how to setup a 301 redirect between any two pages on our site. We can add multiple RewriteRule lines too:

RewriteEngine On
RewriteBase /
RewriteRule oldpage.php newpage.php [R=301]
RewriteRule [R=301]
RewriteRule page3.html page4.html [R=301]

We could stop there and we'd have a basic, functional understanding of Apache rewrites. No matter how strange the syntax gets, all rewrite rules work the same way: here's a URL to be rewritten, here's the URL we're changing it to, and (optionally, in square brackets), here are some special options to take note of (such as the redirect type in my example).

Scaling with Regular Expressions

If we have a huge site though, this is going to get tedious very quickly. We're adding one new line for every single rewrite. If we need to redirect thousands of pages, this isn't a scalable way to do it. To make our rules more powerful we can use regular expressions. Now a full tutorial about regular expressions is beyond the scope of this blog post, so I'm just going to give a gentle introduction and a couple of examples. First, let's imagine we have 9999 pages that we want to redirect. They all begin with the word old. So we've got old0001, old0002...old9999. We want to preserve the number, and replace the word old with new.

What do we need to do this?

- We need a way to identify all URLs that begin with the word old

- We need a way to save the page number

- We need a way to refer to that number again when we're making the new URL

Let's deal with these one at a time. To find a word at the beginning of the relative URL, we use the special character ^:

RewriteRule ^old0001 new0001 [R=301]

That's saying: find all URLs which begin with old0001, and send them a 301 redirect to new0001. So we've made an improvement already, but we can go further. We need to save the number from the original URL, and we do that simply by putting parentheses around it:

RewriteRule ^old(0001) new0001 [R=301]

We've saved the number, but we still haven't actually used it anywhere. When we place parentheses around a portion of a URL, Apache stores that portion in a special variable, which you can refer to again by writing $1. We can actually now write the following:

RewriteRule ^old(0001) new$1 [R=301]

Let's stop for a second to break that down:

- The ^ character says we're looking at the beginning of the URL

- If the URL begins with old0001, we have a match

- The parentheses around 0001 save it into a special variable called $1

- To repeat that number in the new URL, we can just put $1 wherever we want the number to be

This is all great, but we still have to write 1000 lines! The final step is to come up with a pattern that will match any number, not just 0001 specifically. There's a special pattern to match any single digit, and it's \d. Obviously if we just typed d it would match the letter d only. The backslash just before it tells Apache to interpret this character in a special way - namely to match any digit. So let's revise our rule again:

RewriteRule ^old(\d\d\d\d) new$1 [R=301]

That will achieve our objective, but rather than typing \d out four times in a row like that, we can use curly braces to specify how many times the pattern should match. In this case, the pages we're in have exactly four numeric digits. So we need \d (which, remember, represents a single numeric digit) to be found four times in a row. Here's how we do that:

RewriteRule ^old(\d{4}) new$1 [R=301]

Now this one rule will find any page beginning with the word old followed by a four digit number. It will save that number into a special variable. Then it'll use a 301 redirect to send the visitor to a new page. The new page is on the same site, in the same folder, and all that's changed about it is that the word old in the URL has been swapped out for the word new. The four digit number remains the same.

Now take a look at the following rule:

RewriteEngine On
RewriteBase en-GB/
RewriteRule ^olddir/page-\d{2}-(.*) dir/$1 [R=302]

You may not understand everything, but you do have enough information to get a sense of what's happening. Step-by-step:

- Both our old URL and new URL are within the en-GB/ directory

- Within en-GB/, we're matching URLs in a folder called olddir/

- and the page name begins with page-

- Followed by a digit \d

- Actually, make that {2} digits

- Then another dash -

- And some stuff in parentheses `()`. We don't yet know how to interpret `.*`, but don't worry about that.

- The redirect goes to another folder called dir

- And whatever we matched in the parentheses earlier is appended to the new URL: $1

- And there's a 302 redirect taking place. As an SEO looking at this file, you might ask yourself whether it should in fact be a 301.

If nothing else, I hope that persuades you of the efficiency of regular expressions -- look how many lines of English it takes to say the same thing! Here's an example of the rule in action

Original URL

Redirected URL

We can see how, if we had a lot of pages with ugly or erroneous URLs, we could start to put together a rule to redirect them to improved versions. For example, let's say you've made a mistake implementing hreflang, and you have a directory on your site called /en-UK/, targeting English-speaking customers in Britain. You get links to all the pages in the folder, and things are going well before you realise that en-UK isn't a valid hreflang code. You fix your hreflang tags of course, but now you also want to correct the mistake in the URL, for clarity. Think through it systematically:

- The en-UK folder is in the site root folder, so RewriteBase should be /.

- We don't want to redirect unless en-UK is at the beginning of the relative URL. We don't want to redirect a blog post at `/blog/why-en-UK-is-not-valid`. So the special character ^ is needed.

- We'll need to save the remainder of the URL with (), so that we can re-use it.

- The new folder will be called en-GB, and whatever the rest of the URL was should be re-used here as $1

- Naturally, we want the redirect type to be 301.

That's enough to give us this to start with:

RewriteEngine On
RewriteBase /
RewriteRule ^en-UK/() en-GB/$1 [R=301]

This is complete but for one detail: the parentheses are empty, and so is the $1 variable. Our last challenge then, is to match everything else in the URL. In this particular example, we want to be as inclusive as possible, so it doesn't matter how long our URLs are, or whether they contain letters, digits, or special characters. For this, we need the pattern we briefly met earlier: .*

Broken down, this means:

 - . = "match any single character"

 - *  = "0 or more times"

And our final rule looks like this:

RewriteEngine On
RewriteBase /
RewriteRule ^en-UK/(.*) en-GB/$1 [R=301]

These special characters, such as . and *, can be used independently. You can start to combine them with the other options you've already encountered:


match any character, exactly two times


match any digit, 0 or more times


match the letter ‘d’, 0 or more times


match any two-digit number (\d{2}), immediately followed by 0 or more occurrences of any character (.*)

Next Steps

If you want to get truly comfortable with regular expressions, use the building blocks we've accumulated so far, and gradually build them into something more complex and powerful. 

This page is a good reference for all of the building blocks you should need, but doesn’t provide much explanation. If you’re looking for more of a tutorial, try any of the following (they’re all suitable for complete beginners):

  • RegexOne - great even for complete beginners. Interactive, free, and no signup required.

  • CodeSchool - video lessons and interactive exercises.

  • CodeAcademy - Another really good regex resource, but it’s better for you have some familiarity with Javascript.

  • Learn Regex the Hard Way - OK, this one is still an ‘alpha release’, i.e. not finished yet. But it’s part of a great series of free online books that aims to dive a little deeper into these topics than most other online courses do, so it looks promising. Bookmark it for now.

There are several really nice regex testing tools online, usually written in Javascript, so that you can see straight away whether your regular expression is working. A small nuisance, though, is that they often require you to put a backslash before every forward slash, like this:  \/. Of course, you don’t need to do this in .htaccess, and it can quickly become confusing. One tool that doesn’t have this requirement is RegexPlanet. Here’s an example where I tested the en-UK pattern:

By looking in the “@array = $input =~ $regex” column, you can see all the groups that I’ve matched with parentheses (in this case just the word “something”). When you’ve found a regex pattern that works, you can copy and paste the regular expression -- not the “Perl regex object” -- straight into your rewrite rule.

If you know any other good resources or tricks for .htaccess files, please mention them in the comments below.

Get blog posts via email

About the author
Stephan Solomonidis

Stephan Solomonidis

I originally studied the piano at Trinity College of Music while tinkering with Python projects and freelance web development on the side. I came across an ad for a job at Distilled completely by accident, applied, and now work in the London...   read more