Definition of ROBOTS.TXT:
A Robots.txt file stops Search Engine crawlers from indexing those web pages which do not add any business value to your website. In other words, it instructs Search Engines bots how to exactly crawl and index your website pages’
Source: Seo Topper Youtube tutorial
Robots.txt helps the major search engines, like Google, Bing, Yahoo to properly index webpages using a robots.txt file.
We can utilize the /robots.txt file to give set of instructions about our website domain to web robots which is known as ‘The Robots Exclusion Protocol’.
Basically, it acts in a way that when a robot visits a website domain, for example http://www.website.com/page.html, it first checks for any instruction at http://www.website.com/robots.txt, and then goes ahead to crawl the website.
Usually, this robot file allows all major search engines to crawl and index the whole website content, but some need privacy so we can exclude, disallow them to crawl with search engine Bot.
Usually, there the same folder in our website domain which we not wish visit by with the help of search engines. Example of some of the folders are.
A Perfect example of robots.txt File:
# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file
#
# To ban all spiders from the entire site uncomment the next two lines:
# User-Agent: *
# Disallow: /
User-agent: *
Disallow: /account
Disallow: /cache/
Disallow: /components/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /tmp/
In the above example
‘User-agent’:
It means Search Engine Bot or spider or Crawler.
‘*’ (Asterisk) :
It represent that it is for all search Engines Bots or spiders. There many type of crawler of major search engine. Also in case Google on there are mainly five type of crawlers: One is Googlebot, second is Googlebot-Mobile, third is Googlebot-Image, fourth is Mediapartners-Google and fifth is Adsbot-Google.
Googlebot, Yahoobot and Bingbot are some of the major bot of Search Engines.
Disallow: Do not crawl any page
Allow: Can crawl and index all the pages
And don’t forget to put a ‘/’ (forward slash) in front of the colon(:). Otherwise it will work in just an opposite way.
I hope this blog helps you in creating the robots.txt file for your website. If you have any query, please do post in the comment box below.
0 Comment(s)