About the Sitemap Index Analyzer

Which pages in your site map are currently being indexed and which aren't? Use the sitemap analyzer to find out.

The Sitemap Index Analyzer parses through individual Sitemap files as well as Sitemap Indices, grouping URLs by domains, subdomains, directories, and parameterized URLs. With a little input from you, the Sitemap Index Analyzer uses statistics to determine the number of indexed pages expected for each group of submitted URLs. A lower than expected number of indexed pages can be a sign of a problem.

Supply a sitemap or sitemap index file to the Sitemap Index Analyzer. After it groups the URLs, it suggests a google query to determine how many of those URLs may be indexed by the search engine. Copy the number of search results found from the google search result page into the text field. The Sitemap Index Analyzer will then determine the expected number of indexed pages for the other URL groupings. If the number is lower than expected, the Sitemap Index Analyzer will issue a warning.

For Example, at the time this app launched the SiteMap Index Analyzer showed that 97% of the parameterized URLs to the Name Uniqueness Analyzer where being indexed, but only 55% of the parameterized URLs to the Miscarriage Reassurer were being indexed, much lower than expected. The Name Uniqueness Analyzer is older than the Miscarriage Reassurer, but the Miscarriage Reassurer URLs have been given a significantly higher priority in their respective site maps. It turned we weren't properly escaping ampersands ('&') in our sitemap with our pregnancy related URLs. As a result, that map was invalid, and we were relying on the Googlebot to discover all those URLs by itself.

Possible Reasons for Lower Than Expected Pages Indexed:

Can I use a different search engine?
Absolutely! The query should work for any leading search engine. For the statistics to be valid, however, be sure to be consistent with which search engine you're using. Otherwise you may be measuring differences between search engines, instead of differences between subsections of your website.

Can I use a Sitemap Index?
Yes! Keep in mind our DatayzeBot crawler has a two second delay between requests, so if you're Sitemap Index is large it may take a little while

Is there a limit to the number of sitemaps I analyze?
Yes. The DatayzeBot is currently limited to 1000 requests per user per day. This limit is across all webmaster apps on Datayze. If you've exhausted your limit with the Thin Content Checker, for example, you're out of luck for today.

What does it mean if the number of indexed pages is larger than the number of submitted pages?
This happens when the Googlebot has discovered, and indexed, pages not included in the sitemap.

Why is Google asking if I'm human?
Google only allows users to issue queries through it's interface (see Google's terms for details). This is why our suggested query link opens a new page and takes you directly to Google's interface with a prepopulated query rather than try to parse the results ourselves. Some less scrupulous websites attempt to automate the querying process. The queries the Sitemap Index Analyzer suggests you issue are unusual for humans to ask, and likely similar to the types of queries those less scrupulous websites issue.
Do not worry, as long as you're a human (and we think you are?) and you're issuing the query on Google's website, you're complying with Google Terms.

Interested in Web Development?
Try our other tools, like the Site Navigability Analyzer, which can let you see what a spider sees. It can analyze your anchor text diversity and find the length shortest path to any page. The Thin Content Checker can analyze your site's content, let you know the percentage of unique phrases per page, and generate a histogram of page content lengths. The Site Validator can summaries the types of HTML errors on your site, as well as provide a page by page breakdown.