Welcome to DistilledU, the online search marketing university.
You can try the demo or sign up for a full account to access all modules and our whole video library.
Already a member? Log in
Questions? See our FAQs
There are a variety of ways to control the behavior of search engine crawlers. You can learn more about the alternatives in our technical SEO module. Robots.txt is a plain-text file found in the root of a domain (e.g. www.example.com/robots.txt). It is a widely-acknowledged standard and allows webmasters to control all kinds of automated consumption of their site - not only by search engines.
In addition to reading about the protocol, robots.txt is one of the more accessible areas of SEO since you can access any site's robots.txt. Once you have completed this module, you will find value in making sure you understand the robots.txt files of some large sites (for example Google and Amazon).
The most common use-case for robots.txt is to block robots from accessing specific pages. The simplest version applies the rule to all robots with a line saying User-agent: *. Subsequent lines contain specific exclusions that work cumulatively, so the code below blocks robots from accessing /secret.html.
Add another rule to block access to /secret2.html in addition to /secret.html.