The Help Centers

Help, News and other information for success in life on and off the web.

The Help Centers header image 2

robots.txt SciFi book OR Helpful website tool?

October 16th, 2007 · 1 Comment

Often when going though web error logs of clients I find that they are usually missing a robots.txt file. Most people don’t see it and it’s one of those things that falls through the cracks in the excitement of building a website.

There are hundreds of ways to build a robots.txt file but for the most part a simple text editor and some patience are all you need to create one quickly and easily. Read More Below...

First off you don’t need to know very much except what you want search engines to find and what you don’t want search engines to find.

Once you have the list of files and folders it’s time to build your robots.txt

A little syntax lesson. There are a few commands that you can use in a robots.txt file

# is a comment

User-agent: is what crawler you want the following commands to work with User-agent: * will apply to EVERYTHING

Disallow: What you want to not allow Disallow:* will block EVERYTHING.

Sitemap: Where your sitemaps.org formatted site map is located.

Ok now an example…
User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /folder/file.html
Sitemap: http://www.thehelpcenters.com/sitemap.xml.gz

So now to explain line by line

  1. EVERY Crawler must follow the following rules
  2. Comment
  3. Don’t allow crawlers to crawl /cgi-bin
  4. Don’t allow crawlers to crawl /folder/file.html
  5. Use this file as your site map.

You can exclude folders or files from the crawlers and even specify which crawlers. Most often you don’t need a very complex robots.txt and the time you spend on it will reduce bandwidth, duplicate or incorrect content in the search engines and help guide search engines on what to include in your site. A few more examples are

For a FrontPage website
User-agent: *
Disallow: /_private/
Disallow: /_borders/
Disallow: /_derived/
Disallow: /_fpclass/
Disallow: /_overlay/
Disallow: /_themes/
Disallow: /_vti_bin/
Disallow: /_vit_cnf/
Disallow: /_vti_log/
Disallow: /_vti_pvt/
Disallow: /_vti_txt/

For a WordPress Website
User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /stats/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-content/
Sitemap: http://www.thehelpcenters.com/sitemap.xml.gz

If you want to dig a bit deeper take a look at http://www.robotstxt.org/wc/norobots.html for more information

Tags: Search Engines · Web Development

You Comment I follow!

1 response so far ↓

  • 1 Let Adsense Crawl free! | Adsense Lane // Jun 17, 2008 at 3:57 pm

    [...] is the robots.txt - This is a little text file that helps web site crawlers find out more about your website and how [...]

You must log in to post a comment.