What is Minification?

Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.

Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.

How data·yze does Minification

Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with performance. Thus we opt for readability and ease of debugging over preformance.

At data·yze, minimizing JavaScript reduces the size of our files by 32% on average.

EasyMinifying JavaScript

JavaScript can be a little nerve wracking to minify. Statements can be terminated by a semi-colon or a new line. Yet, statements can also span multiple lines and contain semi-colons within them. If you're not careful when you minify you can easily break your code. Fortunately you can get large gains with simple tricks. The following function, easyMinify achieves an 11% reduction on our JavaScript code by only condensing multiple white spaces and still keeps the code pretty readable.

function minimizeJavascriptSimple($javascript){ return preg_replace(array("/\s+\n/", "/\n\s+/", "/ +/"), array("\n","\n "," "), $javascript); }

Being data people, we're never satisfied with a partial result.

Our Approach

Our minification process works by preforming a shallow parse of our JavaScript. A shallow parse is one that has only superficial understanding of structure. Structure is deduced by identifying key elements such as the start of a comment or quoted string. In contrast, a deep parse is one that attempts to understand the structure of the script in it's entirety. A deep parse would require something akin to the JavaScript compiler. Fortunately, the extra understanding that comes with a deep parse is not necessary for most minifiers. Thus we stuck with the shallow parse, which is simpler and has a smaller code foot print. This makes it easier to read, easier to test, and easier to debug.

Our minification script assumes the JavaScript it is minifying compiles and is correct! As the saying goes: garbage in, garbage out. We're also going to rely on some of the code we wrote while learning to minify CSS.

Important Detail: Don't forget to copy the code from part 1 (Minifying CSS with PHP) of this three part Minification series or your minifyJavascript() function will not work!

More Complete JavaScript Minification

The first consideration we need to make is how to handle regular expresions. We wouldn't want to accidently remove a semicolor or condense white spaces within a regular expression. Doing so would alter the behavior of the regular expression and the resulting javascript would behave differently from the original non-minififed code. Regular expressions should be treated like quoted strings, and need to be preserved.

In JavaSciript regular expressions can be defined using the RegExp class with a defining string, e.g. var re = new RegExp('[0-9]'), or by surrounding the regular expression between two slashes, e.g. var re = /[0-9]/. The former case is already handled by our string StringSequenceFinder, but we need a new sequence finder to handle the latter case. To do this we write JSRegexSequenceFinder.

class JSRegexSequenceFinder extends MinificationSequenceFinder { function __construct() { $this->type = 'regex'; } /* check to make sure this isn't the start of a comment or a division */ public function findPossibleStart($string, $idx = 0){ $start_idx = strpos($string,'/', $idx); if ($start_idx === false){ return false; } if (substr($string, $start_idx, 2) === '//' || substr($string, $start_idx, 2) === '/*'){ // found comment, not pattern, don't bother continuing return false; } $tmp = $start_idx - 1; // get first nonspace previous char while ($tmp > 0 && substr($string, $tmp, 1) == ' '){$tmp--;} if ($tmp > 0){ $char = substr($string, $tmp, 1); // if char or number than this is division, get further if (is_numeric($char) || ctype_alpha($char) || $char == ')' || $char == ']'){ return $this->findPossibleStart($string, $start_idx + 1); } } return $start_idx; } public function findFirstValue($string){ $this->start_idx = $this->findPossibleStart($string); if ($this->start_idx === false){ return; } // position of first newline after pattern $nl = strpos($string, "\n", $this->start_idx); // look for first non escaped endquote $end_idx = $this->start_idx+1; while ($end_idx < strlen($string) // if there's still room to explore in the string && ($nl === false || $end_idx < $nl)) // and we're not at a newline yet { // find number of escapes before endquote if (preg_match('/(\\\\*)(\/)/',$string,$match,PREG_OFFSET_CAPTURE, $end_idx)){ $end_idx = $match[2][1] + 1; // if odd number of escapes before endquote, endquote is escaped. Keep going if (!isset($match[1][0]) || strlen($match[1][0]) % 2 == 0){ if ($nl !== false && $end_idx > $nl){return false;} $this->end_idx = $end_idx; return; } // no match, not well formed } else { $this->start_idx = false; return; } } } }

JSRegexSequenceFinder functionality is similar to our other MinificationSequenceFinder instances: the function findFirstValue searches the string for the first instance of a slash. One of the things that makes regular expressions more challenging to find in a programatic way is that the slash character can also signify the start of a line comment, block comment, or used in division. Thus we use the helper function findFirstValue to preform a lookahead and lookbehind to sanity check that the slash found is, indeed, the start of the regular expression. The check to see whether the slash is part of a comment is on line 17. If it is, than either the $blockSequenceFinder or the $lineCommentFinder (defined below for line comments) will have a smaller starting index, $start_idx, than any regular expresion we could find, so there's no point in continuing the search. Lines 24-31 check that the slash is not part of a division by ensuring the character preseeding the slash is a alphanumeric, e.g. 1/y or x/y, or part of an expression block, e.g. (x)/y.

Once we have identified the possible starting slash of a regular expression, we scan through the string for the terminating slash. We need to be careful that the next slash we encounter is not escaped. This check is preformed on line 52.

Now we're ready to write our minification function, minifyJavascript

$lineCommentFinder = new StringSequenceFinder('//',"\n"); function minifyJavascript($javascript){ global $minificationStore,$singleQuoteSequenceFinder,$doubleQuoteSequenceFinder,$blockCommentFinder,$lineCommentFinder; $java_special_chars = array($blockCommentFinder,// JavaScript Block Comment $lineCommentFinder,// JavaScript Line Comment $singleQuoteSequenceFinder,// single quote escape, e.g. :before{ content: '-';} $doubleQuoteSequenceFinder,// double quote new JSRegexSequenceFinder() // JavaScript regex expression ); // pull out everything that needs to be pulled out and saved while ($sequence = getNextSpecialSequence($javascript,$java_special_chars)){ switch ($sequence->type){ case '/*': case '//':// remove comments $javascript = substr($javascript,0,$sequence->start_idx).substr($javascript,$sequence->end_idx); break; default: // quoted strings or regex that need to be preservered $start_idx = $sequence->start_idx; $end_idx = $sequence->end_idx; $placeholder = getNextMinificationPlaceholder(); $minificationStore[$placeholder] =substr($javascript,$start_idx,$end_idx - $start_idx); $javascript = substr($javascript,0,$start_idx).$placeholder.substr($javascript,$end_idx); } } // special case where the + indicates treating variable as numeric, e.g. a = b + +c $javascript = preg_replace('/([-\+])\s+\+([^\s;]*)/','$1 (+$2)',$javascript); // condense spaces $javascript = preg_replace("/\s*\n\s*/","\n",$javascript); // spaces around newlines $javascript = preg_replace("/\h+/"," ",$javascript); // \h+ horizontal white space // remove unnecessary horizontal spaces around non variables (alphanumerics, underscore, dollar sign) $javascript = preg_replace("/\h([^A-Za-z0-9\_\$])/",'$1',$javascript); $javascript = preg_replace("/([^A-Za-z0-9\_\$])\h/",'$1',$javascript); // remove unnecessary spaces around brackets and parentheses $javascript = preg_replace("/\s?([\(\[{])\s?/",'$1',$javascript); $javascript = preg_replace("/\s([\)\]}])/",'$1',$javascript); // remove unnecessary spaces around operators that don't need any spaces (specifically newlines) $javascript = preg_replace("/\s?([\.=:\-+,])\s?/",'$1',$javascript); // unnecessary characters $javascript = preg_replace("/;\n/",";",$javascript); // semicolon before newline $javascript = preg_replace('/;}/','}',$javascript); // semicolon before end bracket // put back the preserved strings foreach($minificationStore as $placeholder => $original){ $javascript = str_replace($placeholder,$original,$javascript); } return trim($javascript); }

Lines 1-26 should look familiar from part 1 of our Minification series. We start by scanning through the JavaScript looking for special sequences that need special handeling, preserving quoted strings and removing comments.

Line 38 prevents us from accidentally condensing two plus sins into an prefix increment, e.g. a + +b into a ++b, by placing the expression +b in parentheses. Technically this can increase the size of our file, but in the practice number of bytes added is small compared to what we'll take away in the next few lines.

The first thing we need to do is condense multiple spaces into a single space. JavaScript allows for statements to be terminated either by a semicolon, or at times a newline via a process called Automatic Semicolon Insertion. Thus it's important to preserve at least one newline per block of spaces. It's possible we could break JavaScript if we condense a section of whitespace containing a newline into a single horizontal space. Therefore line 30 replaces all blocks of white space that contain at least one newline into a single new line without any horizontal white space, line 31 replaces all blocks of horizontal white space with a single space.

Next we remove unnecessary whitespace. Lines 33 and 34 remove horizontal whitespace around non alphanumeric characters that can be used in variable names. Line 36 removes all white space (which at this point is only newlines) around open parentheses, square brackets and curly brackets. Line 37 removes the space proceeding closing parenteses, square brackets and curly brackets. We can't necessarily remove the space (newline) after a closing bracket or parentehesis because it could be the statement terminating character. Finally, line 39 removes spaces (newlines) around characters that cannot be terminators.

The final step is to remove unnecessary statement terminators. Statements don't need to be terminated by both a semicolon and a newline. We could keep either, but since semicolons are less ambiguous, we keep semicolons is line 41. A statement is assumed terminated by the end curly bracket, so the semi colon is not strictly necessary and can be removed that in line 42.

Evaluating the Output

So how well does our minifier work? Let's put it to the test. Consider the following example.
for ( var i = 0; i < 20; i++ ){ //comment i += 0; i += 5; }
Here we have some unnecessary spaces, an unnecessary semicolon after the statement i += 5, and some very unnecessary semicolons in the for loop. Just to make this example as contrived as possible, I've also added some unnecessary newlines in the middle of the for loop to see if the minifier was fooled into thinking those lines were statements. Running this through our minifyJavaScript code gives us the following output.
for(var i=0;i<20;i++){i+=0;i+=5}
The necessary spaces and semicolons are still present, where everything else has been removed.

The Complete Minification Article Series:

Want to give it a try? Use our Minifier.

Code Liscence

Although code shared on data·yze is source-avaliable, it is still proprietary and data·yze maintains it's intellectual property rights. In particular, data·yze restricts redistribution of the code. Code displayed above may be copied, modified, displayed or adapted for use on other websites (commercial or otherwise) only under certain conditions and may not be repackaged or redistributed. See Terms for details.