screenshot

What is Minification?

Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.

Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.

How data·yze does Minification

Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with performance. Thus we opt for readability and ease of debugging over preformance.

A word of caution: when minimizing, always test the final product still behaves as expected!

Challenges in Minimizing HTML (without Embedded PHP)

When we talk about minimizing HTML we're often referring to minimizing white space. By default, web browsers collapse multiple white spaces into a single space, yet HTML source code formatted according to style guidlines often contains long blocks of white space to make the source code more human readable.

For example this:

<html> <head> <title> ... </title> </head> <body> <table> <tr> <td> ... </td> </tr> </table> </body> </html>
Is a much longer, albeit easier to read equivalent of:
<html><head><title> ... </title></head><body><table><tr><td> ... </td></tr></table></body></html>

It is worth noting any white space between the <pre> tag is preserved, as is any in a element with style white-space:pre or white-space:pre-wrap. In order to make our solution the most general, and omit the need for a full CSS interpreter, we're going to omit the cases of elements with white-space CSS style attribute set. We'll show you how you can modify the script to account for specific such elements should you need to. Our solution is not general enough to analyze CSS and determine which elements need to have their white space preserved by class name on the fly.

Challenges in Minimizing HTML (with Embedded PHP)

Different sections of the HTML page need to be handled separately. White space effects the execution of CSS and JavaScript differently than HTML, and minimizing them as though they were HTML could introduce bugs into the code.

The primary challenge with PHP is that, when executed, it will likely insert some information into the HTML document being sent to the user. Usually this is text, but PHP can be used as a control flow, changing which JavaScript, CSS and even HTML elements are outputted in the final HTML document sent to the user.

Consider the following obviously contrived case:

<style> body{ background-color:<?php echo 'white;}</style>'; ?> <body>

In the above example, the end style tag is outputted to the HTML buffer by the executed PHP code. In order to know that line 4 in the above example should be treated as HTML, our minifying script would need to be able to interpret the PHP code which would increase the complexity of our minifying code dramatically.

To make the problem tractable, we make the following assumptions about our PHP code:

We do allow PHP to be used to output JavaScript, HTML, and CSS as long as only one language is written to the output buffer in each block of PHP code. This allows us to pass variables and information easily from PHP to JavaScript. (e.g. var foo = <?php echo $bar; ?>; We assume that when PHP is embedded in a line, as the previous case, than it is intended to be part of the line. If the line is commented out, than the PHP can be safely removed as well.

Since PHP is executed server side rather than client side, non minimized PHP code does not effect HTML file size nor bandwidth and does not need to be minimized.

Minimizing HTML

Our approach to minimizing HTML is going to be similar to our approach to minimizing JavaScript: shallow parse the HTML looking for CSS/JavaScript/PHP. A shallow parse is one that has only superficial understanding of structure. A shallow parse is sufficient for this use case, and has a much smaller footprint than a deep parse. When we encounter CSS or JavaScript during our shallow parse, we minimize as appropriate.

As before, the first thing we want to do is preserve important strings. This time we want to be sure we preserve all emedded PHP. To do tis we create the function preserveEmeddedPHP()

function preserveEmeddedPHP($string){ global $minificationStore, $singleQuoteSequenceFinder, $doubleQuoteSequenceFinder; $start_idx = strpos($string, '<?'); //matches both <? and <?php if (strlen($string)==0){return $string;} if ($start_idx !== false){ //need to find first end terminator not in quote $php_len = 2; while (true){ // start looking for the PHP terminator from the PHP start $tmp_string = substr($string, $start_idx + $php_len); $end_php = strpos($tmp_string, '?>'); $end_php = ($end_php !== false ? $end_php+2 : strlen($tmp_string)); // find the closest string $quote_start = false; $singleQuoteSequenceFinder->findFirstValue($tmp_string); $doubleQuoteSequenceFinder->findFirstValue($tmp_string); if ($singleQuoteSequenceFinder->isValid() && (!$doubleQuoteSequenceFinder->isValid() || $singleQuoteSequenceFinder->start_idx < $doubleQuoteSequenceFinder->start_idx)){ $quote_start = $singleQuoteSequenceFinder->start_idx; $quote_end = $singleQuoteSequenceFinder->end_idx; } else if ($doubleQuoteSequenceFinder->isValid()){ $quote_start = $doubleQuoteSequenceFinder->start_idx; $quote_end = $doubleQuoteSequenceFinder->end_idx; } // check if end terminator before string declared. If not, start search again after the string declared if ($quote_start === false || $end_php <= $quote_start){ $php_len += $end_php; break; } else { $php_len += $quote_end; } } // store the found PHP $php_substr = substr($string, $start_idx, $php_len); $placeHolder = getNextMinificationPlaceholder(); $newstring = substr($string, 0, $start_idx).$placeHolder.substr($string, $start_idx+$php_len); $minificationStore[$placeHolder] = $php_substr; // search for next emedded PHP to preserve return preserveEmeddedPHP($newstring); } return $string; }
function minifyPHP($html){ global $minificationStore; $html_special_chars = array( new RegexSequenceFinder('javascript', "/<\s*script(?:[^>]*)>(.*?)<\s*\/script\s*>/si"), // javascript, can have type attribute new RegexSequenceFinder('css', "/<\s*style(?:[^>]*)>(.*?)<\s*\/style\s*>/si"), // css, can have type/media attribute new RegexSequenceFinder('pre', "/<\s*pre(?:[^>]*)>(.*?)<\s*\/pre\s*>/si") // pre ); $html = preserveEmeddedPHP($html); // pull out everything that needs to be pulled out and saved while ($sequence = getNextSpecialSequence($html, $html_special_chars)){ $placeholder = getNextMinificationPlaceholder(); $quote = substr($html, $sequence->start_idx, $sequence->end_idx - $sequence->start_idx); // subsequence (css/javascript/pre) needs special handeling, tags can still be minimized using minifyPHP $sub_start = $sequence->sub_start_idx- $sequence->start_idx; $sub_end = $sub_start + strlen($sequence->sub_match); switch ($sequence->type) { case 'javascript': $quote = minifyPHP(substr($quote,0,$sub_start)).minifyJavascript($sequence->sub_match).minifyPHP(substr($quote, $sub_end)); break; case 'css': $quote = minifyPHP(substr($quote,0,$sub_start)).minifyCSS($sequence->sub_match).minifyPHP(substr($quote, $sub_end)); break; default: // strings that need to be preservered, e.g. between <pre> tags $quote = minifyPHP(substr($quote,0,$sub_start)).$sequence->sub_match.minifyPHP(substr($quote, $sub_end)); } $minificationStore[$placeholder] = $quote; $html = substr($html, 0, $sequence->start_idx).$placeholder.substr($html, $sequence->end_idx); } // condense white space $html = preg_replace( array('/\s+/','/<\s+/', '/\s+>/'), array(' ', '<', '>'), $html); // remove comments $html = preg_replace('/),)*––>/', '', $html); // put back the preserved strings foreach($minificationStore as $placeholder => $original){ $html = str_replace($placeholder, $original, $html); } return trim($html); }

As stated above, the minifying of HTML is just a straight forward collapsing of white space. There's still room for improvement. After all, white space between elements that contain no non-white space characters are also collapsed. For example, " <i> <b> " is functionally equivalent to " <i><b>". Nevertheless this is a good first pass.

The Complete Minification Article Series:

Want to give it a try? Use our Minifier.