Parameters to CrawlSome URL parameters can change page content. Which parameters should the spider pay attention to when crawling?:
► Click to Add/Edit Value
Directories and URLs to ExcludeExcluding pages can reduce the load on the crawler and keep you from reaching the URL cap so you can analyze more of your sites. Enter the full path, or a substring of the URLs you wish to exclude.:
► Click to Add/Edit Value
Hub SizeHubs are highly connected pages which can increase site navigateability, but may not not contain much information.:
► Click to Add/Edit Value
Leaf SizeLeafs pages are often content heavy. They may contain footer and header links to help a user return to the main page, but may not contain much navigational information beyond that. For example, the datayze navigational bar contains 3 links on average whereas the footer contains 2. We might consider a leaf page on datayze as any URL with just 5 internal links.:
► Click to Add/Edit Value
Display Options:
► Click to Add/Edit Value
With These Settings

About the Site Navigability Analyzer

How Navigable is your website? What would a spider see? The Site Navigability Analyzer uses the Datayze SpiderBot to crawl your site, analyzing it's navigability. The Datayze SpiderBot calculates the shortest path from your splash page to any internal page, the overall connectivity of your site, highly connected hubs and destination leaf nodes. Let the spider find your linking errors for you. The Datayze SpiderBot is fully customizable, crawling only the pages you ask it to, and letting you define 'hub' and 'leaf' thresholds.

The Site Navigability Analyzer can also be used to improve your site's search engine optimization (SEO) ranking. View a list of referrers and the anchor text. Is your linking schema internally consistent? Do your branded links always have the same capitalization and punctuation? Do you provide alternative descriptive links to help lost website visitors find their way?

About the Spider
DatayzeBot, the datayze spider, now respects the robots exclusion standard. To specifically allow (or disallow) the crawler to access a page or directory, create a new set of rules for "DatayzeBot" in your robots.txt file. DatayzeBot will follow the longest matching rule for a specified page, rather than the first matching rule. If no matching rule is found, DatayzeBot assumes it is allowed to crawl the page. Not sure if a page is excluded by your robots.txt file? The Index/No Index app will parse HTML headers, meta tags and robots.txt and summarize the results for you.

Our spider crawls at a leisurely rate of 1 page ever 1.5 seconds. While the spider doesn't keep track of the contents of the pages it crawls, it does keep track of the number of requests issued by each visitor. Currently the crawler is limited to 1000 pages per user per day. Since the DatayzeBot does not index or cache any pages it crawls, rerunning the Site Navigability Analyzer will count against your daily allowed number of page crawls. You can get around the cap by pausing the crawler and resuming it another day.

Interested in Web Development? Try our other tools, like the Site Validator which can summarize the types of HTML errors on your site, as well as provide a page by page breakdown. It can analyze your anchor text diversity and find the length shortest path to any page. The Thin Content Checker can analyze your site's content, let you know the percentage of unique phrases per page, and generate a histogram of page content lengths. A common need among web developers is to know which pages of theirs are being indexed, and thus which are not. Thus we created the Sitemap Index Analyzer.