Files to Leave out of robots.txt

The idea behind robots.txt is to keep honest robots out of certain areas of a web site. One file not to include in robots.txt is…robots.txt. The bots need to read robots.txt to find out their restrictions. A robots.txt is assumed to exist in the root directory of a web site.

Another file to leave out of robots.txt is the sitemap file: generally sitemap.xml and/or sitemap.xml.gz. It is imperative that the bots find this file so they can pare down the amount of indexing needed.

Other files not to exclude from bots are verification files: the ones created to prove you have access to the web site. Some common applications are Google sitemap verification, McAfee Site Advisor, and Yahoo Site Explorer.

Doug

Bookmark and Share
This entry was posted in Web Site. Bookmark the permalink.

Leave a Reply