screenshot
:

About the Index / No Index

Would a search engine index a URL? The answer can be surprisingly complicated. The robots.txt directive can indicate which URLs an agent is permitted to crawl, but this is more akin to the "follow" directive than the "index" directive. Although rare, an agent may decide to index a page it has been forbidden from crawl if it has enough high quality information (often through external links) about a page that the agent can use to deduce the page's content.

Most crawler agents respect the robots meta tag within an HTML block. This approach works well if the content you are blocking from a crawler is HTML, but what if it's an image or text file you don't want indexed? To get around this issue, some agents like the Googlebot and bingbot read the x-robots-tag HTTP header information. The header information can be set using htaccess rules and does not require any specific file format. If one blocks an agent from a page using robots.txt, however, that agent will never see the HTTP header information, or meta tags.

Index/No Index reads the various cues available to an crawler agent and tells you whether a page is indexable.

For More information on setting up a robots.txt file, or setting robots directives in the HTML header or meta tag: