What is Minification?

In web development it's often adventageous to minify embeded files. Minification refers to the process of removing all unnecessary characters while leaving the core functionality in tact. The end result is a new file which is smaller in filesize to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.

Here at data·yze we use PHP to push files from our development enviornment to production. We opt to automate the minification process during this push step. This gives us all of the benefits of minimization without the drawbacks. Our users get the smaller, faster to download version of the code. Our developers working in the testing enviornment work with the larger, more human readable version of the code.

A word of caution: when minimizing, always test the final product still behaves as expected!

Challenges in Minimizing HTML (without Embedded PHP)

When we talk about minimizing HTML we're often referring to minimizing white space. By default, web browsers collapse multiple white spaces into a single space, however this is not always the case. Any white space between the <pre> tag is preserved, as is any in a element with style 'white-space:pre' or 'white-space:pre-wrap'.

In order to make our solution the most general, we're going to omit the cases of elements with white-space style attribute set.

Challenges in Minimizing HTML (with Embedded PHP)

Our approach to minimizing HTML is going to be similar to our approach to minimizing Javascript: shallow parse the HTML looking for CSS/Javascript/PHP. When we encounter CSS or Javascript, we minimize as appropriate. White space effects the execution of CSS and Javascript differently than HTML, and minimizing them as though they were HTML could introduce bugs. PHP is executed server side rather than client side and does not need to be minimized.

The primary challenge with PHP is that, when executed, it will likely insert some information into the HTML document being sent. Usually this is text, but PHP can be used as a control flow, changing which javascript, CSS and even HTML elements are outputted. Consider the following case:

<style> body{ background-color:<?php echo 'white;}</style>'; ?> <body>

Obviously this is a contrived case, and an example of poor coding style. Never the less it shows the challenges that can arrise when minimizing HTML with embedded PHP.

To make the problem tractable, we make the following assumptions about our PHP code:

We do allow PHP to be used to output javascript, HTML, and CSS as long as only one language is being outputted each time. This allows us to pass variables and information easily to javascript. (e.g. var foo = <?php echo $bar; ?>;. Since the amount of code expected to be outputted by the PHP is small, we don't worry about minimizing it.

Minimizing HTML

The outline we're going to follow minimizing HTML is similar to minimizing Javascript. We're going to make one modification to our previous Javascript minification.

Modified function MinifyJavascript()

function minifyJavascript($javascript, $inQuote = false){ $buffer = ''; if ($inQuote != false){ $idx_end = getNonEscapedQuoteIndex($javascript, $inQuote)+1; if ($idx_end == 0){ return array($javascript, $inQuote); } $quote = substr($javascript, 0, $idx_end); $quote = str_replace("\\\n",' ',$quote); $quote = preg_replace("/\s+/",' ',$quote); $buffer = $quote; $javascript = substr($javascript, $idx_end); $inQuote = false; } while (list($idx_start, $keyElement) = getNextKeyElement($javascript)){ switch ($keyElement){ case '//': $idx_end = strpos($javascript, PHP_EOL, $idx_start); if ($idx_end !== false){ $javascript = substr($javascript, 0, $idx_start) . substr($javascript,$idx_end); } else { $javascript = substr($javascript, 0, $idx_start); } break; case '/*': $idx_end = strpos($javascript, '*/', $idx_start)+2; $javascript = substr($javascript, 0, $idx_start) . substr($javascript,$idx_end); break; default: // string case if ($keyElement == '\'' || $keyElement == '"'){ $idx_end = getNonEscapedQuoteIndex($javascript, $keyElement, $idx_start+1)+1; } else { $idx_end = $idx_start + strlen($keyElement); } // php is embedded in string in javascript if ($idx_end == 0){ $idx_end = strlen($javascript); $inQuote = $keyElement; } $buffer .= minifyJavascriptCode(substr($javascript, 0, $idx_start)); $quote = substr($javascript, $idx_start, ($idx_end-$idx_start)); $quote = str_replace("\\\n",' ',$quote); $quote = preg_replace("/\s+/",' ',$quote); $buffer .= $quote; $javascript = substr($javascript, $idx_end); } } if ($inQuote){ return array($buffer, $inQuote); } $buffer .= minifyJavascriptCode($javascript); return $buffer; }

function getHTMLKeyControlElements()

function getHTMLKeyControlElements($php){ $elements = array(); $elements['<?'] = strpos($php, '<?'); if(preg_match("/<\s*script(?:\s+type=\"text\/javascript\")?\s*>/i", $php, $matches, PREG_OFFSET_CAPTURE)) { if ($matches[0][1] > 0){ $elements['<script>'] = $matches[0][1]; } } if(preg_match("/<\s*style(?:\s+type=\"text\/css\")?\s*>/i", $php, $matches, PREG_OFFSET_CAPTURE)) { if (count($matches) > 0){ $elements['<style>'] = $matches[0][1]; } } if(preg_match("/<\s*div\s+class\s*=\s*\"phpcode\"\s*>/i", $php, $matches, PREG_OFFSET_CAPTURE)) { if (count($matches) > 0){ $elements['<div>'] = $matches[0][1]; } } $elements = array_filter($elements, function($k){return $k !== false;}); if (count($elements) == 0){return false;} $min = min($elements); return array($min, array_keys($elements, $min)[0]); }

function minifyHTML()

function handlePHP($file){ $php = file_get_contents($file); $buffer = ''; while (list($start_idx, $key) = getHTMLKeyControlElements($php)){ switch ($key){ case '<?': $end_idx = strpos($php, '?>', $start_idx+1); $buffer .= minifyHTML(substr($php, 0, $start_idx)) . substr($php,$start_idx, $end_idx+2-$start_idx); $php = substr($php, $end_idx+2); break; case '<style>': $buffer .= minifyHTML(substr($php,0,$start_idx)).'<style type="text/css">'; $php = substr($php, strpos($php,'>',$start_idx+1)+1); $end_idx = strpos($php,'</style>'); while (strpos($php, '<?') < $end_idx){ $tmp_idx = strpos($php, '<?'); $tmp_end_idx = strpos($php, '?>') + 2; $buffer .= minifyCSS(substr($php, 0, $tmp_idx)) . substr($php, $tmp_idx, $tmp_end_idx-$tmp_idx); $php = substr($php, $tmp_end_idx); $end_idx = strpos($php, '</style>'); } $buffer .= minifyCSS(substr($php,0,$end_idx)). '</style>'; $php = substr($php, $end_idx+8); break; case '<script>': $buffer .= minifyHTML(substr($php, 0, $start_idx)).'<script type="text/javascript">'; $php = substr($php, strpos($php,'>',$start_idx+1)+1); $inQuote = false; $end_idx = strpos($php, '</script>'); while (strpos($php, '<?') < $end_idx){ $tmp_idx = strpos($php, '<?'); $tmp_end_idx = strpos($php, '?>') + 2; $result = minifyJavascript(substr($php, 0, $tmp_idx), $inQuote); if (is_array($result)){ $buffer .= $result[0]; $inQuote = $result[1]; } else { $buffer .= $result; $inQuote = false; } $buffer .= substr($php, $tmp_idx, $tmp_end_idx-$tmp_idx); $php = substr($php, $tmp_end_idx); $end_idx = strpos($php, '</script>'); } $result = minifyJavascript(substr($php, 0, $end_idx), $inQuote); $buffer .= $result . '</script>'; $php = substr($php, $end_idx + 9); break; case '<div>': $end_idx = strpos($php, '</div>', $start_idx+1); $buffer .= minifyHTML(substr($php, 0, $start_idx)) . substr($php,$start_idx, $end_idx+6-$start_idx); $php = substr($php, $end_idx+6); break; } } $buffer .= minifyHTML($php); return $buffer; }

function minifyHTMLFile()

As stated above, the minifying of HTML is just a straight forward collapsing of white space. There's still room for improvement. After all, white space between elements that contain no non-white space characters are also collapsed. For example, " <i> <b> " is functionally equivalent to " <i><b>". Nevertheless this is a good first pass.

function minifyHTML($html){ return preg_replace('/\s+/',' ', $html); }

The Complete Minification Article Series: