Home WEBRIS Digital Marketing Wiki Robots.txt File – What Is It? How to Use It?

Robots.txt File – What Is It? How to Use It?

Robots.txt File – What Is It? How to Use It?

no Comments

In short, a Robots.txt file controls how search engines access your website.

This text file contains “directives” which dictate to search engines which pages are to “Allow” and  “Disallow” search engine access.


example-robots-txt-fileScreenshot of our Robots.txt file


Adding the wrong directives here can negatively impact your rankings as it can hinder search engines from crawling pages (or your entire) website.




What are “Robots” (in regards to SEO)?

Robots are applications that “crawl” through websites, documenting (i.e. “indexing”) the information they cover.

In regards to the Robots.txt file, these robots are referred to as User-agents.

You may also hear them called:

  • Spiders
  • Bots
  • Web Crawlers

These are not the official User-agent names of search engines crawlers. In other words, you would not “Disallow” a “Crawler”, you would need to get the official name of the search engine (the Google crawler is called “Googlebot”).

You can find a full list of web robots here.



Image credit

These bots are influenced in a number of ways, including the content you create and links pointing to your website.

Your Robots.txt file is a means to speak directly to search engine bots, giving them clear directives about which parts of your site you want crawled (or not crawled).


How to use Robots.txt file?

You need to understand the “syntax” in which to create you Robots.txt file.

1. Define the User-agent

State the name of the robot you are referring to (i.e. Google, Yahoo, etc). Again, you will want to refer to the full list of user-agents for help.

2. Disallow

If you want to block access to pages or a section of your website, state the URL path here.

3. Allow

If you want to unblock a URL path within a blocked parent directly, enter that URL subdirectory path here.


wikipedia robots file

Wikipedia’s Robots.txt file.


In short, you can use robots.txt to tell these crawlers, “Index these pages but don’t index these other ones.”


Why Robots.txt Is So Important

It may seem counter intuitive to “block” pages from search engines. There’s a number of reasons and instances to do so:


1. Blocking sensitive information

Directories are a good example.

You’d probably want to hide those that may contain sensitive data like:

  • /cart/
  • /cgi-bin/
  • /scripts/
  • /wp-admin/


2. Blocking low quality pages

Google has stated numerous times that it’s important to keep your website “pruned” from low quality pages. Having a lot of garbage on your site can drag down performance.

Check out our content audit for more details.


3. Blocking duplicate content

You may want to exclude any pages that contain duplicate content. For example, if you offer “print versions” of some pages, you wouldn’t want Google to index duplicate versions as duplicate content could hurt your rankings.

However, keep in mind that people can still visit and link to these pages, so if the information is the type you don’t want others to see, you’ll need to use password protection to keep it private.

It’s because there are probably some pages that contain sensitive information you don’t want to show on a SERP.


Robots.txt Formats for Allow and Disallow

Robots.txt is actually fairly simple to use.

You literally tell robots which pages to “Allow” (which means they’ll index them) and which ones to “Disallow” (which they’ll ignore).

You’ll use the latter only once to list the pages you don’t want spiders to crawl. The “Allow” command is only used when you want a page to be crawled, but its parent page is “Disallowed.”

Here’s what the robot.txt for my website looks like:


example of robots text file


The initial user-agent command tells all web robots (i.e. *) – not just ones for specific search engines – that these instructions apply to them.


How to Set Up Robots.txt for Your Website

First, you will need to write your directives into a text file.

Next, upload the text file to your site’s top-level directory – this need to be added via Cpanel.


adding robots.txt to cpanel

Image credit

Your live file will always come right after the “.com/” in your URL. Ours, for example, is located at https://webris.org/robot.txt.

If it were located at www.webris.com/blog/robot.txt, the crawlers wouldn’t even bother looking for it and none of its commands would be followed.

If you have subdomains, make sure they have their own robots.txt files as well. For example, our training.webris.org subdomain has it’s own set of directives – this is incredibly important to check when running SEO audits.


Testing Your Robots.txt file

Google offers a free robots.txt tester tool that you can use to check.


robots.txt tester


It is located in Google Search Console under Crawl > Robots.txt Tester.


Putting Robots.txt to work for improved SEO

Now that you understand this important element of SEO, check your own site to ensure search engines are indexing the pages you want and ignoring those you wish to keep out of SERPs.

Going forward, you can continue using robot.txt to inform search engines how they are to crawl your site.



Leave a Review.

All Ratings & Reviews

Robots.txt File – What Is It? How to Use It?

1 user rated

Your Review

Robots.txt File – What Is It? How to Use It?

Give us your rating here