Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.
Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.
Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with performance. Thus we opt for readability and ease of debugging over preformance.
At data·yze, minimizing JavaScript reduces the size of our files by 32% on average.
JavaScript can be a little nerve wracking to minify. Statements can be terminated by a semi-colon or a new line. Yet, statements can also span multiple lines and contain semi-colons within them. If you're not careful when you minify you can easily break your code. Fortunately you can get large gains with simple tricks. The following function, easyMinify achieves an 11% reduction on our JavaScript code by only condensing multiple white spaces and still keeps the code pretty readable.
Being data people, we're never satisfied with a partial result.
Our minification process works by preforming a shallow parse of our JavaScript. A shallow parse is one that has only superficial understanding of structure. Structure is deduced by identifying key elements such as the start of a comment or quoted string. In contrast, a deep parse is one that attempts to understand the structure of the script in it's entirety. A deep parse would require something akin to the JavaScript compiler. Fortunately, the extra understanding that comes with a deep parse is not necessary for most minifiers. Thus we stuck with the shallow parse, which is simpler and has a smaller code foot print. This makes it easier to read, easier to test, and easier to debug.
Our minification script assumes the JavaScript it is minifying compiles and is correct! As the saying goes: garbage in, garbage out. We're also going to rely on some of the code we wrote while learning to minify CSS.
Important Detail: Don't forget to copy the code from part 1 (Minifying CSS with PHP) of this three part Minification series or your minifyJavascript() function will not work!
The first consideration we need to make is how to handle regular expresions. We wouldn't want to accidently remove a semicolor or condense white spaces within a regular expression. Doing so would alter the behavior of the regular expression and the resulting javascript would behave differently from the original non-minififed code. Regular expressions should be treated like quoted strings, and need to be preserved.
In JavaSciript regular expressions can be defined using the RegExp class with a defining string, e.g. var re = new RegExp('[0-9]'), or by surrounding the regular expression between two slashes, e.g. var re = /[0-9]/. The former case is already handled by our string StringSequenceFinder, but we need a new sequence finder to handle the latter case. To do this we write JSRegexSequenceFinder.
JSRegexSequenceFinder functionality is similar to our other MinificationSequenceFinder instances: the function findFirstValue searches the string for the first instance of a slash. One of the things that makes regular expressions more challenging to find in a programatic way is that the slash character can also signify the start of a line comment, block comment, or used in division. Thus we use the helper function findFirstValue to preform a lookahead and lookbehind to sanity check that the slash found is, indeed, the start of the regular expression. The check to see whether the slash is part of a comment is on line 17. If it is, than either the $blockSequenceFinder or the $lineCommentFinder (defined below for line comments) will have a smaller starting index, $start_idx, than any regular expresion we could find, so there's no point in continuing the search. Lines 24-31 check that the slash is not part of a division by ensuring the character preseeding the slash is a alphanumeric, e.g. 1/y or x/y, or part of an expression block, e.g. (x)/y.
Once we have identified the possible starting slash of a regular expression, we scan through the string for the terminating slash. We need to be careful that the next slash we encounter is not escaped. This check is preformed on line 52.
Now we're ready to write our minification function, minifyJavascript
Lines 1-26 should look familiar from part 1 of our Minification series. We start by scanning through the JavaScript looking for special sequences that need special handeling, preserving quoted strings and removing comments.
Line 38 prevents us from accidentally condensing two plus sins into an prefix increment, e.g. a + +b into a ++b, by placing the expression +b in parentheses. Technically this can increase the size of our file, but in the practice number of bytes added is small compared to what we'll take away in the next few lines.
The first thing we need to do is condense multiple spaces into a single space. JavaScript allows for statements to be terminated either by a semicolon, or at times a newline via a process called Automatic Semicolon Insertion. Thus it's important to preserve at least one newline per block of spaces. It's possible we could break JavaScript if we condense a section of whitespace containing a newline into a single horizontal space. Therefore line 30 replaces all blocks of white space that contain at least one newline into a single new line without any horizontal white space, line 31 replaces all blocks of horizontal white space with a single space.
Next we remove unnecessary whitespace. Lines 33 and 34 remove horizontal whitespace around non alphanumeric characters that can be used in variable names. Line 36 removes all white space (which at this point is only newlines) around open parentheses, square brackets and curly brackets. Line 37 removes the space proceeding closing parenteses, square brackets and curly brackets. We can't necessarily remove the space (newline) after a closing bracket or parentehesis because it could be the statement terminating character. Finally, line 39 removes spaces (newlines) around characters that cannot be terminators.
The final step is to remove unnecessary statement terminators. Statements don't need to be terminated by both a semicolon and a newline. We could keep either, but since semicolons are less ambiguous, we keep semicolons is line 41. A statement is assumed terminated by the end curly bracket, so the semi colon is not strictly necessary and can be removed that in line 42.
Want to give it a try? Use our Minifier.