Reply

 

LinkBack Thread Tools Rate Thread
Old 02-14-2005, 05:51 PM   #1 (permalink)
serverunion
Guest
 
Posts: n/a
Post What is a robots.txt file and do I need it on my site?

What is a robots.txt file and do I need it on my site?
Author: Nathan Drach

You may have noticed when reviewing your website usage statistics a reference to a file named “robots.txt”. This is a basic configuration file that robots (search engine spiders) use to note which files and directories they should not index.

It is important to have the file, robots.txt, in the root of your website (http://www.yourdomain.com/robots.txt). The basic use of the robots.txt is to note which files and directories the robots should not index.

There are 2 components to the syntax of the robots.txt file structure: user agent line and the disallow line.

The syntax is as follows: “:”

User-agent
A list of the robots visiting your site may be viewed in the stats logs of your website. A listing of User-agents may be viewed at http://www.robotstxt.org/wc/active.html

A specific robot may be referenced in the User-agent syntax.

User-agent: googlebot

A wildcard may also be used to note all robots.

User-agent: *

Disallow
This section of the syntax states which files and directories should not be indexed.

If you have a page named “addresses.html”, and you do not want the robot to index it, add the following line after the User-agent syntax:

Disallow: addresses.html

You may also limit the robot at the directory level. If the directory “/billing” in your website should be restricted from indexing, place the following text after the User-agent syntax.

Disallow: /billing/

A wildcard may also be used to note all files on site.

Disallow: /

How do I make comments in my robots.txt file?
Coding comments may be placed in the bobots.txt file be placing “#” before any text. An example is:

# specifies googlebot as the User-agent.
User-agent: googlebot

It is good practice to have your comments on a separate line than the directive. Also, limit white space at the beginning of directive lines.


Putting robots.txt to use:

Now that we know the functionality of robots.txt, let’s put it to practice.

If you want to allow ALL robots to visit ALL the pages on your website, you should use the “*” wildcard in your syntax in the robots.txt file.

User-agent: *
Disallow:

Disallow all indexing of your site, enter the following text in your robots.txt file.

User-agent: *
Disallow: /

Disallow robots from indexing pages within a directory named “billing”.

User-agent: *
Disallow: /billing/

Disallow robots from indexing pages within directories named “billing” and “address”.

User-agent: *
Disallow: /billing/
Disallow: /address/

Disallow googlebot from your site.

User-agent: googlebot
Disallow: /

Disallow googlebot from accessing you contact page (contact-us.html).

User-agent: googlebot
Disallow: /contact-us.html

Disallow googlebot from accessing you contact page (about-us/contact-us.html).

User-agent: googlebot
Disallow: /about-us/contact-us.html


Robots.txt in a nutshell:
Robots.txt is a useful tool when used to limit robot access to your site, such as content in work and sensitive information. At this time there is no endorsement by the Robots exclusion standard working group for an allow directive that would push robots to index portions of the site. This is still initiated by linking internally and externally to you web site.


About the author:
Nathan Drach is the founder of Server Union, LLC. ServerUnion is a US based technology company providing web hosting solutions from big to small. For more information and articles visit: www.serverunion.com
  Reply With Quote
Old 02-24-2005, 01:08 PM   #2 (permalink)
WHC Moderator
 
Join Date: Feb 2004
Location: Texas
Posts: 831
Nice article! For further refferance, you might check out an article I did about the robots.txt file here:

Chat With Search Engine Spiders - Lockergnome.com
__________________
The Web Hosting Show - The Voice of the Web Hosting World
Think of it as talk radio mixed with Web hosting discussion for both Web hosts and Web hosting clients! New episode every Monday!
silverfreak is offline   Reply With Quote
Old 03-17-2005, 08:07 AM   #3 (permalink)
WHC Administrator
 
Join Date: Oct 2004
Location: http://kooshin.com
Posts: 4,195
I find it every useful and informative Thanks for sharing
__________________
For Reliable and Affordable Web Hosting Packages, Please visit kooshin.com
KooshinDesigns.com- Version One Online

ArticleStorage.com- Your #1 Resource For Articles
4Ulyrics.com -- The Lyrical Hideout, For Sale - Email me for details.
kooshin.com is offline   Reply With Quote
Old 04-12-2005, 03:04 PM   #4 (permalink)
serverunion
Guest
 
Posts: n/a
Works nicely to keep those sneakie SE's out of some content. even if you have an empty robots.txt file, you will not see as many 404 errors I ahve found
  Reply With Quote
Old 04-13-2005, 03:07 AM   #5 (permalink)
JeffEDH
Guest
 
Posts: n/a
Very good article, a lot of useful information! Nice work :-)
  Reply With Quote
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT -4. The time now is 09:03 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0