Here at data·yze, we're a fan of smart everything. 404 error pages are no exception. We weren't content with a humorous message on a static page. We wanted our error pages to be able to automatically figure out where the user intended to go. The more automated the process, the better. One way to achieve this goal is to use the requested URL to search for the intended page.

If we're following Search Engine Optimization (SEO) best practices, the URL should contain keywords about the page. Two similarly spelled URLs may have some key words in common. We could parse these keywords out of the requested URL. A search engine could then use these key words and identify all the pages that matched them.

The code is as follows:

// get the path included in for query terms function getSearchTerms(){ $query = 'site:'.$_SERVER['HTTP_HOST']; $file = trim(urldecode($_SERVER['REQUEST_URI'])); // get rid of extension if it exists if (strpos($file,'.') > strpos($file,'\\')){ $file = substr($_SERVER['REQUEST_URI'],0,strrpos($_SERVER['REQUEST_URI'], '.' )); } // split url into terms $query .= preg_replace('/[^a-z0-9]+/i', ' ', $file); return $query;

Line 4 sets up the query keyword 'site:' which is used to restrict the search engine results to our domain. Some of you out there may prefer to hardcode your domain, rather than use $_SERVER['HTTP_HOST'].

Line 5 gets the URL path of the requested URL. This is where we're going to find those query keywords.

Lines 8-10 remove file extensions. This step isn't strictly necessary. It's unlikely that the file extension will add much valuable information to the query. (At least in our case, for example, most files have the same extension.) When a term is unlikely to be useful, it's best to treat it as a stop word and remove it from the query. Note that this approach will also remove any URL parameters in the URL. If you want to keep them you can always use the built in function parse_url() to append them onto the $file string.

Line 13 splits the $file string replacing all alphanumerics with spaces.

Running getSearchTerms() on this page returns " howto convert url to query string" Since we're following search engine best practices, the URL path matches the page title. A search engine would have an easy time finding the intended page. Even if some of the terms in the query are misspelled, or even missing, a search engine should be able to correct the spelling and fill in the gaps.

Try it for yourself!:

echo ''.urlencode(getSearchTerm());

If your favorite search engine is something other than google, you'll need to modify the prefix. Remember when outputting the URL to the screen to use urlencode() to convert the search terms string into a URL friendly string.