layout spacer image
layout spacer image
layout spacer image   HOSTY.NETWHM master reseller logo seperatorHosting Support The Way It Should Be. Quick and Easy.
master reseller hosty.net face logo thin slice
Whm reseller header bar thin slice
layout spacer image Welcome
      To Simplicity
layout spacer image
Home
layout spacer image
Hosting Plans
layout spacer image
Reseller Plans
layout spacer image
Master Reseller Plans
layout spacer image
FAQ
layout spacer image
My Account
layout spacer image
Support
layout spacer image
layout spacer image
WHM alpha master hosting image
FrothTerse Plan
POWERFUL HOSTING
10gigs of space           100gigs of bandwidth Unlimited domains!
WHMCS RVSkin Master WHM Reseller Hosting
another hosting reseller image about whm reseller
MeanTerse Plan
POWERFUL HOSTING
25gigs of space           250gigs of bandwidth Unlimited domains!
whmreseller v3 master hosting image for selling whm
image about how to sell whm hosting
MaxTerse Plan
POWERFUL HOSTING
UNMETERED space          UNMETERED bandwidth Unlimited domains!
reseller WHM Master with fantastico and whmcs
 master whm login whm rvskin fantastico sitebuilder
layout spacer image
layout spacer image
layout spacer image layout spacer image layout spacer image layout spacer image
layout spacer image
layout spacer image
layout spacer image
cpanel hosting login, hosting reseller and master reseller login

Robots.Txt Explained

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.

More details

Plese use some of the numerous external resources:


The rest of this page gives an overview of how to use /robots.txt on your server, with some simple recipes.

How to create a /robots.txt file

Where to put it

The short answer: in the top-level directory of your web server.

The longer answer:

When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

What to put in it

The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.

Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

To exclude all robots from the entire server
User-agent: *
Disallow: /

To allow all robots complete access
User-agent: *
Disallow:

(or just create an empty "/robots.txt" file, or don't use one at all)

To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
To exclude a single robot
User-agent: BadBot
Disallow: /
To allow a single robot
User-agent: Google
Disallow:

User-agent: *
Disallow: /
To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/stuff/
Alternatively you can explicitly disallow all disallowed pages:
User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

Content of this tage were taken from robotstxt.org

Add to: Mr. Wong Add to: Webnews Add to: Icio Add to: Oneview Add to: Yigg Add to: Linkarena Add to: Digg Add to: Del.icoi.us Add to: Reddit Add to: Simpy Add to: StumbleUpon Add to: Slashdot Add to: Netscape Add to: Furl Add to: Yahoo Add to: Blogmarks Add to: Diigo Add to: Technorati Add to: Newsvine Add to: Blinkbits Add to: Ma.Gnolia Add to: Smarking Add to: Netvouz Add to: Folkd Add to: Spurl Add to: Google Add to: Blinklist Information

Master WHMReseller WHM hosting page rank PR4 image
Master Reseller Hosting Hosty spacer image
Master Reseller Hosting Hosty spacer image Master Reseller Hosting Hosty spacer image Master Reseller Hosting Hosty spacer image