Parameters to CrawlSome URL parameters can change page content. Which parameters should the spider pay attention to when crawling?:
Directories and URLs to ExcludeExcluding pages can reduce the load on the crawler and keep you from reaching the URL cap so you can analyze more of your sites. Enter the full path, or a substring of the URLs you wish to exclude.:
Specific ElementsLong headers, footers, or menus can make a page appear to have more content than it actually has. By limit the Thin Content Analyzer to specific HTML elements by specifying element types, class names (e.g. ".text") or ids (e.g. #content), word length more closely matches the true amount of content each page has.:
Words to IgnoreIndicate which words should be ignored by the Website Spell Checker. For example, "datayze" is unlikely to be recognized as spell correctly by any spell checker, however for our site we wouldn't want it to be counted as a misspelling. Therefore we add it here.:
Other Options:
About the Website Spell Checker

Have a website, but no budget to hire a copy editor? Website Spell Checker to the rescue. The spell checker will comb through your site, collecting a list of all the misspelled words and which pages they occurred on. Is your website niche with a lot of topic specific jargon? Use "words to ignore" field to add site specific words you wish the spell checker to ignore.

About the Spider
DatayzeBot, the datayze spider, now respects the robots exclusion standard. To specifically allow (or disallow) the crawler to access a page or directory, create a new set of rules for "DatayzeBot" in your robots.txt file. DatayzeBot will follow the longest matching rule for a specified page, rather than the first matching rule. If no matching rule is found, DatayzeBot assumes it is allowed to crawl the page. Not sure if a page is excluded by your robots.txt file? The Index/No Index app will parse HTML headers, meta tags and robots.txt and summarize the results for you.

Our spider crawls at a leisurely rate of 1 page ever 1.5 seconds. While the spider doesn't keep track of the contents of the pages it crawls, it does keep track of the number of requests issued by each visitor. Currently the crawler is limited to 1000 pages per user per day. Since the DatayzeBot does not index or cache any pages it crawls, rerunning the Website Spell Checker will count against your daily allowed number of page crawls. You can get around the cap by pausing the crawler and resuming it another day.