What is Minification?

Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.

Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.

How data·yze does Minification

Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with preformance. Thus we opt for readaibility and ease of debugging over preformance.

At data·yze, minimizing CSS reduces the size of our CSS files by 20-23% on average.

Minimizing CSS

Simple Approach

Let's start with a minimizeCSSsimple() function that will work for most people. The goal of this function is to:

The function looks like

function minimizeCSSsimple($css){ $css = preg_replace('/\/\*((?!\*\/).)*\*\//', '', $css); // negative look ahead $css = preg_replace('/\s{2,}/', ' ', $css); $css = preg_replace('/\s*([:;{}])\s*/','$1',$css); $css = preg_replace('/;}/','}',$css); return $css; }

So what's going on here?

Line 3 is the real meat of the function. The function preg_replace() is looking for any string that matches the regular expression, \/\*((?!\*\/).)*\*\/, and removes it from the $css string. \/\* and \*\/ match the /* and */, start and end comment delimiters in CSS respectively. Both /, and * are special characters and need to be escaped. The middle part, ((?!\*\/).)*, preforms a negative look ahead. The expression matches any string (.) so long as the expression \*\/, the escaped CSS end comment delimiter, is not a part of it. Without the negative look ahead, the regular expression would match the start delimater from the first comment with the last end delimator from the last comment, rather than the start and end delimator from the same comment. Any CSS between the first and last comments would be unintentionally removed!

The rest of the code is pretty straight forward. Line 4 collapses white space. Line 5 removes white space where it's not needed, such as before or after a colon, brace or semicolon. Finally Line 6 removes the final semicolon since it's not needed.

If we want to be pedantic, lines 4 - 6 could be merged into a single statement using a 3 element array for all the regular expressions and another for the replacements. I, personally, prefer to have one non-trivial regular expression per line. Keeping lines separate makes the code more readable, which makes it easier to debug and modify. The code only executes when our developers are ready to push changes to production. Since the code isn't client facing, the potential improvement in performance isn't worth the hit to readability.

Let's see it in action with the following example:

/* crazy ** /** test */ body { /* who would write css like this? */background: white !important;} a { color :blue ; }

The above is test is pretty crazy and test all our corner cases: (1) Two comments on a single line, (2) extra asterisks and slashes in a comment, (3) a necessary white space that shouldn't be removed and (4) a comment that spans multiple lines.

Running the CSS through our CSS minified gives us:

body{background:white !important}a{color:blue}

A Better Minification

As mentioned above, this function works for most cases. It can fail when the content property is set to specific values. Consider the following case:

a::before { content: "/*"; } a::after { content: "*/"; } li::before { content: " >"; }

The regular expression in line 3 interrprets the contentproperty of the a::before as the start of a block comment, and the content property of the a::after as the end of the block comment. The following output would be:

a::before{content:""}li::before{content:" >"}

In addition to removing comments and condensing white space, we need to preserve the contents of the content property strings. We're going to do that by scanning the string from beginning to end looking for comments and quotes before any minification is preformed. When we encounter a comment, we remove it from the string. When we encounter a content property, we save the value of the string and put a temperary placeholder in it's place.

The following code may seem a little over engineered. We promise it will make more sense as we move on to minifying JavaScript, and HTML with emedded PHP, JavaScript ad CSS.

First we start with a store using a multidemetial array. The purpose of $minificationStore is to keep track of strings that need special handeling or preserving. The function getextMinificationPlaceholder() gives us a unique key unlikely to be found in HTML, CSS, PHP or JavaScript we can use for storing and looking up strings in our store. I choose to represent my keys as <-!!-#-!!->. If you anticipate a string like that in your code, you can modify the return statement on line 5 to be any unique string you like.

$minificationStore = array(); function getNextMinificationPlaceholder(){ global $minificationStore; return '<-!!-'.sizeof($minificationStore).'-!!->'; }

Next we're going to search for specific patterns, such as the /* ... */ and " .. ". The values represented by the elipses might change, but each pattern will have a specific start and end string sequence. For this tast we write an abstract class MinificationSequenceFinder which will hold the start offset ($start_idx), end offset, ($end_idx) and type, ($type).

abstract class MinificationSequenceFinder { public $start_idx; public $end_idx; public $type; abstract protected function findFirstValue($string); public function isValid(){ return $this->start_idx !== false; } }

Next it's time to implement StringSequenceFinder. This is our class that will find blcok quotes, (when $start_delimiter = '/*'; and $end_delimiter = '*/'.

class StringSequenceFinder extends MinificationSequenceFinder { protected $start_delimiter; protected $end_delimiter; function __construct($start_delimiter, $end_delimiter) { $this->type = $start_delimiter; $this->start_delimiter = $start_delimiter; $this->end_delimiter = $end_delimiter; } public function findFirstValue($string){ $this->start_idx = strpos($string, $this->start_delimiter); if ($this->isValid()){ $this->end_idx = strpos($string, $this->end_delimiter, $this->start_idx+1); // sanity check for non well formed lines $this->end_idx = ($this->end_idx === false ? strlen($string) : $this->end_idx + strlen($this->end_delimiter)); } } }

The quote sequence finder, QuoteSequenceFinder is a little more complicated. When looking for the end quote of a string we must take care not to use an escaped quote from the middle of the string (e.g. content: "\"").

class QuoteSequenceFinder extends MinificationSequenceFinder { function __construct($type) { $this->type = $type; } public function findFirstValue($string){ $this->start_idx = strpos($string, $this->type); if ($this->isValid()){ // look for first non escaped endquote $this->end_idx = $this->start_idx+1; while ($this->end_idx < strlen($string)){ // find number of escapes before endquote if (preg_match('/(\\\\*)('.preg_quote($this->type).')/', $string, $match, PREG_OFFSET_CAPTURE, $this->end_idx)){ $this->end_idx = $match[2][1] + 1; // if odd number of escapes before endquote, endquote is escaped. Keep going if (!isset($match[1][0]) || strlen($match[1][0]) % 2 == 0){ return; } }else{ // no match, not well formed $this->end_idx = strlen($string); return; } } } } }

The function getNextSpecialSequence() is pretty straight forward. The input variable $sequences is an array of MinificationSequenceFinder and search the CSS string $string for any special sequences. If any such sequences are found, we store the finder in an array $special_idx according to it's offset into the CSS string. The SequenceFinder with the smallest offset (and thus the one occuring first in the string) is returned. We'll use this function again in the next two sections.

function getNextSpecialSequence($string, $sequences){ // $special_idx is an array of the nearest index for all special characters $special_idx = array(); foreach ($sequences as $finder){ $finder->findFirstValue($string); if ($finder->isValid()){ $special_idx[$finder->start_idx] = $finder; } } // if none found, return if (count($special_idx) == 0){return false;} // get first occuring item asort($special_idx); return $special_idx[min(array_keys($special_idx))]; }

Now we can write our code minifyCSS() function.

$singleQuoteSequenceFinder = new QuoteSequenceFinder('\''); $doubleQuoteSequenceFinder = new QuoteSequenceFinder('"'); $blockCommentFinder = new StringSequenceFinder('/*', '*/'); function minifyCSS($css){ global $minificationStore,$singleQuoteSequenceFinder, $doubleQuoteSequenceFinder, $blockCommentFinder; $css_special_chars = array($blockCommentFinder, // CSS Comment $singleQuoteSequenceFinder, // single quote escape, e.g. :before{ content: '-';} $doubleQuoteSequenceFinder); // double quote // pull out everything that needs to be pulled out and saved while ($sequence = getNextSpecialSequence($css, $css_special_chars)){ switch ($sequence->type) { case '/*': // remove comments $css = substr($css, 0, $sequence->start_idx).substr($css, $sequence->end_idx); break; default: // strings that need to be preservered $placeholder = getNextMinificationPlaceholder(); $minificationStore[$placeholder] = substr($css, $sequence->start_idx, $sequence->end_idx - $sequence->start_idx); $css = substr($css, 0, $sequence->start_idx).$placeholder.substr($css, $sequence->end_idx); } } // minimize the string $css = preg_replace('/\s{2,}/s', ' ', $css); $css = preg_replace('/\s*([:;{}])\s*/','$1',$css); $css = preg_replace('/;}/','}',$css); // put back the preserved strings foreach($minificationStore as $placeholder => $original){ $css = str_replace($placeholder, $original, $css); } return trim($css); }

The while loop in lines 10-20 is where we hadle the found strings from MinificaionSequenceFinder instances. When a comment is found, case /* on line 12, we remove the substring from the CSS string. Since we only have three types of SequenceFinders, the default case includes only quoted strings. In this case we replace the substring with our placeholder from getNextMinificationPlaceholder().

Once all special strings are dealt with, we can use regular expressions to deal with extra whitespace and semicolons. Lines 23-25 come straight from lines 4-5 in our minimizeCSSsimple(). Note we no longer need the line for removing block comments, since that was handled in our while loop.

Finally, we put the special strings back into our CSS.

We have now replaced our almost good enough 5 line function minimizeCSSsimple() with 123 new lines of PHP. Like I said, it may seem a little over engineerered, but at least the output of the new function is now as we'd expect:

a::before{content:"/*"}a::after{content:"*/"}li::before{content:" >"}


The Complete Minification Article Series:

Want to give it a try? Use our Minifier.