What is Minification?

Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.

Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.

How data·yze does Minification

Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with preformance. Thus we opt for readaibility and ease of debugging over preformance.

At data·yze, minimizing JavaScript reduces the size of our files by 32% on average. Here is the code we use:

EasyMinifying JavaScript

JavaScript can be a little nerve wracking to minify. Statements can be terminated by a semi-colon or a new line. Yet, statements can also span multiple lines and contain semi-colons within them. If you're not careful when you minify you can easily break your code. Fortunately you can get large gains with simple tricks. The following function, easyMinify achieves an 11% reduction by only condensing multiple white spaces and still keeps the code pretty readable.

function minimizeJavascriptSimple($javascript){ return preg_replace(array("/\s+\n/", "/\n\s+/", "/ +/"), array("\n","\n "," "), $javascript); }

Being data people, we're never satisfied with a partial result.

More Complete Minification JavaScript

Our minification process works by preforming a shallow parse of our JavaScript. A shallow parse is one that has only superficial understanding of structure. Structure is deduced by identifying key elements such as the start of a comment or quoted string. In contrast, a deep parse is one that attempts to understand the structure of the script in it's entirety. A deep parse would require something akin to the JavaScript compiler. Fortunately, the extra understanding that comes with a deep parse is not necessary for most minifiers. Thus we stuck with the shallow parse, which is simpler and has a smaller code foot print. This makes it easier to read, easier to test, and easier to debug.

Our minification script assumes the JavaScript it is minifying compiles and is correct! As the saying goes: garbage in, garbage out. We're also going to rely on some of the code we wrote while learning to minify CSS.

First step, we need to a new sequence finder, RegexSequenceFinder to identify regular expressions in JavaScript. Regular expressions should be treated like quoted strings, and need to be preserved. Since te delimter is no longer fixed strings with fixed lengths, we need to keep track qhere the subsequence stops and ends.

class RegexSequenceFinder extends MinificationSequenceFinder { protected $regex; public $full_match; public $sub_match; public $sub_start_idx; function __construct($type, $regex) { $this->type = $type; $this->regex = $regex; } public function findFirstValue($string){ $this->start_idx = false; // reset preg_match($this->regex, $string, $matches, PREG_OFFSET_CAPTURE); if (count($matches) > 0){ // full match $this->full_match = $matches[0][0]; $this->start_idx = $matches[0][1]; if (count($matches) > 1){ // substart $this->sub_match = $matches[1][0]; $this->sub_start_idx = $matches[1][1]; } $this->end_idx = $this->start_idx + strlen($this->full_match); } } }

Now we're ready to write our minification function

$lineCommentFinder = new StringSequenceFinder('//', "\n"); function minifyJavascript($javascript){ global $minificationStore, $singleQuoteSequenceFinder, $doubleQuoteSequenceFinder, $blockCommentFinder, $lineCommentFinder; $java_special_chars = array($blockCommentFinder, // JavaScript Block Comment $lineCommentFinder, // JavaScript Line Comment $singleQuoteSequenceFinder, // single quote escape, e.g. :before{ content: '-';} $doubleQuoteSequenceFinder, // double quote new RegexSequenceFinder( 'regex', "/\(\h*(\/[\k\S]+\/)/") // JavaScript regex expression ); // pull out everything that needs to be pulled out and saved while ($sequence = getNextSpecialSequence($javascript, $java_special_chars)){ switch ($sequence->type) { case '/*': case '//': // remove comments $javascript = substr($javascript, 0, $sequence->start_idx).substr($javascript, $sequence->end_idx); break; default: // quoted strings or regex that need to be preservered $start_idx = ($sequence->type == 'regex' ? $sequence->sub_start_idx: $sequence->start_idx); $end_idx = ($sequence->type == 'regex' ? $sequence->sub_start_idx + strlen($sequence->sub_match): $sequence->end_idx); $placeholder = getNextMinificationPlaceholder(); $minificationStore[$placeholder] = substr($javascript, $start_idx, $end_idx - $start_idx); $javascript = substr($javascript, 0, $start_idx).$placeholder.substr($javascript, $end_idx); } } // special case where the + indicates treating variable as numeric, e.g. a = b + +c $javascript = preg_replace('/([-\+])\s+\+([^\s;]*)/', '$1 (+$2)', $javascript); // condense spaces $javascript = preg_replace("/\s*\n\s*/", "\n", $javascript); // spaces around newlines $javascript = preg_replace("/\h+/", " ", $javascript); // \h+ horizontal white space // remove unnecessary horizontal spaces around non variables (alphanumerics, underscore, dollarsign) $javascript = preg_replace("/\h([^A-Za-z0-9\_\$])/", '$1', $javascript); $javascript = preg_replace("/([^A-Za-z0-9\_\$])\h/", '$1', $javascript); // remove unnecessary spaces around brackets and parantheses $javascript = preg_replace("/\s?([\(\[{])\s?/", '$1', $javascript); $javascript = preg_replace("/\s([\)\]}])/", '$1', $javascript); // remove unnecessary spaces around operators that don't need any spaces (specifically newlines) $javascript = preg_replace("/\s?([\.=:\-+,])\s?/", '$1', $javascript); // unnecessary characters $javascript = preg_replace("/;\n/",";",$javascript); // semicolon before newline $javascript = preg_replace('/;}/', '}', $javascript); // semicolon before end bracket // put back the preserved strings foreach($minificationStore as $placeholder => $original){ $javascript = str_replace($placeholder, $original, $javascript); } return trim($javascript); }

Line 31 prevents us from accidentally condensing two plus sins into an prefix increment, e.g. a + +b, by placing the expression +b in parentheses. Technically this can increase the size of our file, but in the practice number of bytes added is small compared to what we'll take away in the next few lines.

The first thing we need to do is condense multiple spaces into a single space. It's important to preserve at least one newline per block of spaces. JavaScript allows for statements to be terminated either by a semicolon, or at times a newline via a process called Automatic Semicolon Insertion. It's possible we could break JavaScript if we condense a section of whitespace containing a newline into a single horizonal space. Line 34 replaces all blocks of white space with a newline into a single new line, line 35 replaces all blocks of horizontal white space with a single space.

Next we remove unnecessary whitespace. We'll point out that the some of the following lines can be combined to fewer lines of code. When preformance doesn't matter, I generally prefer readability over code size. Lines 38 and 39 remove horizonal whitespace around non alphanumeric characters that can be used in variable names. Line 42 removes all white space (which at this point is only newlines) around open parenteses, square brackets and curley brackets. Line 43 removes the space proceeding closing parenteses, square brackets and curely brackets. We can't necessarly remove the right space (newline) because it could be the statement terminating character. Finally, line 46 removes spaces (newlines) around characters that cannot be terminators.

Finally we remove unnecessary statement terminators. Statements don't need to be terminated by both a semicolon and a newline. We could keep either, but since semicolons are less ambigous, we keep semicolons is line 49. Finally, a statement is assumed terminated by the end curelybracket, so the semi colon is not strictly necessary and can be removed that in line 50.

The Complete Minification Article Series:

Want to give it a try? Use our Minifier.