Here at data·yze, we're a fan of smart everything. 404 error pages are no exception. We weren't content with a humorous message on a static page. We wanted our error pages to be able to automatically figure out where the user intended to go. The more automated the process, the better. Our first version of a smart 404 page used the requested URL to search for the intended on.
Many of the 404s encountered on our server are the results of typos and partially copied URLs. In these cases the requested URL was similar to the intended one. If we're following Search Engine Optimization (SEO) best practices, the URL should contain keywords about the page. We could parse these keywords out of the URL. A search engine could grapple onto the key words and identify all the pages that matched them.
The code is as follows:
Line 4 sets up the query keyword 'site:' which is used to restrict the search engine results to our domain. Some of you out there may prefer to hardcode your domain, rather than use $_SERVER['HTTP_HOST']. Line 5 gets the URL path of the requested URL. This is where we're going to find those query keywords. Lines 8-10 remove file extensions. This step isn't strictly necessary. It's unlikely that the file extension will add much valuable information to the query. (At least in our case, for example, most files have the same exension.) When a term is unlikely to be useful, it's best to treat it as a stop word and remove it from the query. Note that this approach will also remove any URL parameters in the URL. If you want to keep them you can always use the built in function parse_url to append them onto the $file string. Line 13 splits the $file string replacing all alphanumerics with spaces.
Running getSearchTerms() on this page returns "site:datayze.com howto convert url to query string" Since we're following search engine best practices, the URL path matches the page title. A search engine would have an easy time finding the intended page. Even if some of the terms in the query are mispelled, or even missing, a search engine should be able to correct the spelling and fill in the gaps.
Try it for yourself!:
If your favorite search engine is something other than google, you'll need to modify the prefix. Remember when outputing the URL to the screen to use urlencode to convert the search terms string into a url friendly string.