Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.
Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.
Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with performance. Thus we opt for readability and ease of debugging over preformance.
Being data people, we're never satisfied with a partial result.
In JavaSciript regular expressions can be defined using the RegExp class with a defining string, e.g. var re = new RegExp('[0-9]'), or by surrounding the regular expression between two slashes, e.g. var re = /[0-9]/. The former case is already handled by our string StringSequenceFinder, but we need a new sequence finder to handle the latter case. To do this we write JSRegexSequenceFinder.
JSRegexSequenceFinder functionality is similar to our other MinificationSequenceFinder instances: the function findFirstValue searches the string for the first instance of a slash. One of the things that makes regular expressions more challenging to find in a programatic way is that the slash character can also signify the start of a line comment, block comment, or used in division. Thus we use the helper function findFirstValue to preform a lookahead and lookbehind to sanity check that the slash found is, indeed, the start of the regular expression. The check to see whether the slash is part of a comment is on line 17. If it is, than either the $blockSequenceFinder or the $lineCommentFinder (defined below for line comments) will have a smaller starting index, $start_idx, than any regular expresion we could find, so there's no point in continuing the search. Lines 24-31 check that the slash is not part of a division by ensuring the character preseeding the slash is a alphanumeric, e.g. 1/y or x/y, or part of an expression block, e.g. (x)/y.
Once we have identified the possible starting slash of a regular expression, we scan through the string for the terminating slash. We need to be careful that the next slash we encounter is not escaped. This check is preformed on line 52.
Line 38 prevents us from accidentally condensing two plus sins into an prefix increment, e.g. a + +b into a ++b, by placing the expression +b in parentheses. Technically this can increase the size of our file, but in the practice number of bytes added is small compared to what we'll take away in the next few lines.
Next we remove unnecessary whitespace. Lines 33 and 34 remove horizontal whitespace around non alphanumeric characters that can be used in variable names. Line 36 removes all white space (which at this point is only newlines) around open parentheses, square brackets and curly brackets. Line 37 removes the space proceeding closing parenteses, square brackets and curly brackets. We can't necessarily remove the space (newline) after a closing bracket or parentehesis because it could be the statement terminating character. Finally, line 39 removes spaces (newlines) around characters that cannot be terminators.
The final step is to remove unnecessary statement terminators. Statements don't need to be terminated by both a semicolon and a newline. We could keep either, but since semicolons are less ambiguous, we keep semicolons is line 41. A statement is assumed terminated by the end curly bracket, so the semi colon is not strictly necessary and can be removed that in line 42.
Want to give it a try? Use our Minifier.