screenshot

What is Minification?

Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.

Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.

How data·yze does Minification

Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with performance. Thus we opt for readability and ease of debugging over preformance.

At data·yze, minimizing JavaScript reduces the size of our files by 32% on average. Here is the code we use:

EasyMinifying JavaScript

JavaScript can be a little nerve wracking to minify. Statements can be terminated by a semi-colon or a new line. Yet, statements can also span multiple lines and contain semi-colons within them. If you're not careful when you minify you can easily break your code. Fortunately you can get large gains with simple tricks. The following function, easyMinify achieves an 11% reduction on our JavaScript code by only condensing multiple white spaces and still keeps the code pretty readable.

function minimizeJavascriptSimple($javascript){ return preg_replace(array("/\s+\n/", "/\n\s+/", "/ +/"), array("\n","\n "," "), $javascript); }

Being data people, we're never satisfied with a partial result.

More Complete Minification JavaScript

Our minification process works by preforming a shallow parse of our JavaScript. A shallow parse is one that has only superficial understanding of structure. Structure is deduced by identifying key elements such as the start of a comment or quoted string. In contrast, a deep parse is one that attempts to understand the structure of the script in it's entirety. A deep parse would require something akin to the JavaScript compiler. Fortunately, the extra understanding that comes with a deep parse is not necessary for most minifiers. Thus we stuck with the shallow parse, which is simpler and has a smaller code foot print. This makes it easier to read, easier to test, and easier to debug.

Our minification script assumes the JavaScript it is minifying compiles and is correct! As the saying goes: garbage in, garbage out. We're also going to rely on some of the code we wrote while learning to minify CSS.

Important Detail: Don't forget to copy the code from part 1 (Minifying CSS with PHP) of this three part Minification series or your minifyJavascript() function will not work!

First step, we need to a new sequence finder, RegexSequenceFinder to identify regular expressions in JavaScript. Regular expressions should be treated like quoted strings, and need to be preserved. Since te delimiter is no longer fixed strings with fixed lengths, we need to keep track where the subsequence starts and ends using $this->sub_start_idx and $this->sub_end_idx.

class RegexSequenceFinder extends MinificationSequenceFinder { protected $regex; public $full_match; public $sub_match; public $sub_start_idx; function __construct($type, $regex) { $this->type = $type; $this->regex = $regex; } public function findFirstValue($string){ $this->start_idx = false; // reset preg_match($this->regex, $string, $matches, PREG_OFFSET_CAPTURE); if (count($matches) > 0){ // full match $this->full_match = $matches[0][0]; $this->start_idx = $matches[0][1]; if (count($matches) > 1){ // substart $this->sub_match = $matches[1][0]; $this->sub_start_idx = $matches[1][1]; } $this->end_idx = $this->start_idx + strlen($this->full_match); } } }

Now we're ready to write our minification function, minifyJavascript

$lineCommentFinder = new StringSequenceFinder('//', "\n"); function minifyJavascript($javascript){ global $minificationStore, $singleQuoteSequenceFinder, $doubleQuoteSequenceFinder, $blockCommentFinder, $lineCommentFinder; $java_special_chars = array($blockCommentFinder, // JavaScript Block Comment $lineCommentFinder, // JavaScript Line Comment $singleQuoteSequenceFinder, // single quote escape, e.g. :before{ content: '-';} $doubleQuoteSequenceFinder, // double quote new RegexSequenceFinder( 'regex', "/\(\h*(\/[\k\S]+\/)/") // JavaScript regex expression ); // pull out everything that needs to be pulled out and saved while ($sequence = getNextSpecialSequence($javascript, $java_special_chars)){ switch ($sequence->type) { case '/*': case '//': // remove comments $javascript = substr($javascript, 0, $sequence->start_idx).substr($javascript, $sequence->end_idx); break; default: // quoted strings or regex that need to be preservered $start_idx = ($sequence->type == 'regex' ? $sequence->sub_start_idx: $sequence->start_idx); $end_idx = ($sequence->type == 'regex' ? $sequence->sub_start_idx + strlen($sequence->sub_match): $sequence->end_idx); $placeholder = getNextMinificationPlaceholder(); $minificationStore[$placeholder] = substr($javascript, $start_idx, $end_idx - $start_idx); $javascript = substr($javascript, 0, $start_idx).$placeholder.substr($javascript, $end_idx); } } // special case where the + indicates treating variable as numeric, e.g. a = b + +c $javascript = preg_replace('/([-\+])\s+\+([^\s;]*)/', '$1 (+$2)', $javascript); // condense spaces $javascript = preg_replace("/\s*\n\s*/", "\n", $javascript); // spaces around newlines $javascript = preg_replace("/\h+/", " ", $javascript); // \h+ horizontal white space // remove unnecessary horizontal spaces around non variables (alphanumerics, underscore, dollar sign) $javascript = preg_replace("/\h([^A-Za-z0-9\_\$])/", '$1', $javascript); $javascript = preg_replace("/([^A-Za-z0-9\_\$])\h/", '$1', $javascript); // remove unnecessary spaces around brackets and parentheses $javascript = preg_replace("/\s?([\(\[{])\s?/", '$1', $javascript); $javascript = preg_replace("/\s([\)\]}])/", '$1', $javascript); // remove unnecessary spaces around operators that don't need any spaces (specifically newlines) $javascript = preg_replace("/\s?([\.=:\-+,])\s?/", '$1', $javascript); // unnecessary characters $javascript = preg_replace("/;\n/",";",$javascript); // semicolon before newline $javascript = preg_replace('/;}/', '}', $javascript); // semicolon before end bracket // put back the preserved strings foreach($minificationStore as $placeholder => $original){ $javascript = str_replace($placeholder, $original, $javascript); } return trim($javascript); }

Lines 1-30 should look familiar from part 1 of our Minification series. We start by scanning through the JavaScript looking for special sequences that need special handeling, preserving quoted strings and removing comments.

Line 31 prevents us from accidentally condensing two plus sins into an prefix increment, e.g. a + +b into a ++b, by placing the expression +b in parentheses. Technically this can increase the size of our file, but in the practice number of bytes added is small compared to what we'll take away in the next few lines.

The first thing we need to do is condense multiple spaces into a single space. JavaScript allows for statements to be terminated either by a semicolon, or at times a newline via a process called Automatic Semicolon Insertion. Thus it's important to preserve at least one newline per block of spaces. It's possible we could break JavaScript if we condense a section of whitespace containing a newline into a single horizontal space. Therefore line 34 replaces all blocks of white space that contain at least one newline into a single new line without any horizontal white space, line 35 replaces all blocks of horizontal white space with a single space.

Next we remove unnecessary whitespace. Lines 38 and 39 remove horizontal whitespace around non alphanumeric characters that can be used in variable names. Line 42 removes all white space (which at this point is only newlines) around open parentheses, square brackets and curly brackets. Line 43 removes the space proceeding closing parenteses, square brackets and curly brackets. We can't necessarily remove the space (newline) after a closing bracket or parentehesis because it could be the statement terminating character. Finally, line 46 removes spaces (newlines) around characters that cannot be terminators.

The final step is to remove unnecessary statement terminators. Statements don't need to be terminated by both a semicolon and a newline. We could keep either, but since semicolons are less ambiguous, we keep semicolons is line 49. A statement is assumed terminated by the end curly bracket, so the semi colon is not strictly necessary and can be removed that in line 50.

Evaluating the Output

So how well does our minifier work? Let's put it to the test. Consider the following example.
for ( var i = 0; i < 20; i++ ){ //comment i += 0; i += 5; }
Here we have some unnecessary spaces, an unnecessary semicolon after the statement i += 5, and some very unnecessary semicolons in the for loop. Just to make this example as contrived as possible, I've also added some unnecessary newlines in the middle of the for loop to see if the minifier was fooled into thinking those lines were statements. Running this through our minifyJavaScript code gives us the following output.
for(var i=0;i<20;i++){i+=0;i+=5}
The necessary spaces and semicolons are still present, where everything else has been removed.

The Complete Minification Article Series:

Want to give it a try? Use our Minifier.

Code Liscence

Although code shared on data·yze is source-avaliable, it is still proprietary and data·yze maintains it's intellectual property rights. In particular, data·yze restricts redistribution of the code. Code displayed above may be copied, modified, displayed or adapted for use on other websites (commercial or otherwise) only under certain conditions and may not be repackaged or redistributed. See Terms for details.