Web Search Help - Prohibition of Search Engine Recording
Prohibit search engine recordings
What is a robots.txt file? The search engine automatically accesses web pages on the Internet through a program robot (also known as Spider). You can create a plain text file robots.txt in your website, declare that the site does not want to be accessed by the Robot in this file, so that the part or all of the content can be included in the search engine, or Specifies that the search engine only includes the specified content.
Back to top
Where is the robots.txt file? Robots.txt file should be placed in the root directory of the website. For example, when Robots accesses a website (such as http://www.abc.com), first check if there is http://www.abc.com/robots.txt this file, if the robot finds This file will determine the scope of its access according to the content of this file.
Website URL The corresponding robots.txt URLHTTP: //www.w3.org/http://www.w3.org/robots.txtttp: //www.w3.org: 80 / http://www.w3. Org: 80 / Robots.txthttp: //www.w3.org: 1234 / http://www.w3.org: 1234 / Robots.txtHttp: //w3.org/ http://w3.org/robots. TXT
Back to top
The format "robots.txt" file of the robots.txt file contains one or more records that are separated by the space line (with Cr, Cr / NL, or NL as the end value), each record format is as follows: "
Example 1. Prohibit all parts of all search engines to access the website to download the Robots.txt file user-agent: * Disallow: / Example 2. Allow all Robot Access (or an empty file "/Robots.txt" file) User-agent: * Disallow: Example 3. Disable a search engine access user-agent: BadbotdisAllow: / Example 4. Allow a search engine to access User-Agent: BaiduspiderDiSAllow: user-agent: * Dislow: / Example 5 A simple example In this example, the site has three directorys to limit the access of the search engine, that is, the search engine will not access these three directories. It should be noted that each directory must be declared separately, not to write "Disallow: / CGI-BIN / / / TMP /". User-agent: The * has a special meaning, represents "Any Robot", so there is no "disallow: / tmp / *" or "disallow: *. Gif" in this file. ISER-AGENT: * Disallow: / CGI-BIN / DISALLOW: / TMP / DISALLOW: / ~ JOE / Back to top
robots.txt file references robots.txt file and more specific settings, see the following links: · Web Server Administrator's Guide to the Robots Exclusion Protocol · HTML Author's Guide to the Robots Exclusion Protocol · The original 1994 protocol description, as currently deployed · The Revised Internet-Draft Specification, Which Is Not Yet Completed or Impletion