Getting Started

Welcome to DistilledU, the online search marketing university.

You can try the demo or sign up for a full account to access all modules and our whole video library.

Try demo
3 free demo modules
Get Full Account
$40.00 with monthly plan
$33.00 with annual plan

Already a member? Log in

Questions? See our FAQs

DistilledU Content Status

  • Lesson Rating
  • 4.4 out of 5 from 19,625 reviews
  • Modules
  • 93 hours
  • Video Library
  • 132 hours

Used for training by

Special offer! Save 17.5%!

Sign up to our Annual Plan for $396.00 per year, saving you $84.00 on a monthly subscription.

Sign up for an annual plan

Add colleagues & save 10%

Get a 10% discount after signing up by adding colleagues to your DistilledU account.

Sign up now

Interactive: robots.txt

Time remaining/total 1h 10m 4.3 / 5

Basic exclusion

Time required: 10m
Lesson URL:
Teacher: Will Critchlow

There are a variety of ways to control the behavior of search engine crawlers. You can learn more about the alternatives in our technical SEO module. Robots.txt is a plain-text file found in the root of a domain (e.g. www.example.com/robots.txt). It is a widely-acknowledged standard and allows webmasters to control all kinds of automated consumption of their site - not only by search engines.

In addition to reading about the protocol, robots.txt is one of the more accessible areas of SEO since you can access any site's robots.txt. Once you have completed this module, you will find value in making sure you understand the robots.txt files of some large sites (for example Google and Amazon).

What you will learn in this module:

  • How to block all robots from certain areas of your site
  • How to restrict your robots.txt instructions to apply only to certain robots
  • How to override exclusion directives to allow access to certain areas of your site despite exclusion rules
  • Use wildcards to apply your rules to whole swathes of your site
  • Other robots.txt syntax such as sitemap file directives

The most common use-case for robots.txt is to block robots from accessing specific pages. The simplest version applies the rule to all robots with a line saying User-agent: *. Subsequent lines contain specific exclusions that work cumulatively, so the code below blocks robots from accessing /secret.html.

Add another rule to block access to /secret2.html in addition to /secret.html.



Clue Show answer
Close

Add another “disallow:” directive that blocks robots from /secret2.html

Not yet completed

Exclude directories

Time required: 5m
Teacher: Will Critchlow

Allow specific paths

Time required: 5m
Teacher: Will Critchlow

Restrict to specific user agents

Time required: 5m
Teacher: Will Critchlow

Add multiple blocks

Time required: 5m
Teacher: Will Critchlow

Use more specific user-agents

Time required: 10m
Teacher: Will Critchlow

Basic wildcards

Time required: 5m
Teacher: Will Critchlow

Block certain parameters

Time required: 5m
Teacher: Will Critchlow

Match whole filenames

Time required: 10m
Teacher: Will Critchlow

Add an XML sitemap

Time required: 5m
Teacher: Will Critchlow

Add a video sitemap

Time required: 5m
Teacher: Will Critchlow

Related Videos

 
Close

Sign up for a free demo account

Simply fill out the form below to accces our 3 demo modules for free

or pay for a full account

Close

Log in to your account

Forgot your password?

Close

Sign up for a full account

  • Only $40 per month or $33.00 per month with annual plan
  • 93 hours of lesson and tests spread over 26 modules
  • Share your test scores and badges with others using your DistilledU profile.
  • Video Library with over 132 hours of advanced content
  • Constantly growing resource - new modules and features added regularly

or try 3 demo modules for free

Close

Suggest a term

Please suggest any terms that you feel are missing from our glossary using the form below. Your suggestion(s) will be sent to Distilled for consideration.

Close

Suggest an edit to term:

Original:
Your edit:

Your suggested edit will be sent to Distilled for consideration.