Minification refers to the process of removing all unnecessary characters from a file while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.
Want to skip the technical discussion in this article? No problem, you can try the latest version of our Minifier.
Before we begin the technical discussion we should explain where and how we use Minification. We use PHP to transfer files from our development environment to production. We opt to automate the minification process during this publication step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code). Since minification is preformed when a page is published, and not each time a page is accessed, we're less concerned with performance. Thus we opt for readability and ease of debugging over preformance.
A word of caution: when minimizing, always test the final product still behaves as expected!
When we talk about minimizing HTML we're often referring to minimizing white space. By default, web browsers collapse multiple white spaces into a single space, yet HTML source code formatted according to style guidlines often contains long blocks of white space to make the source code more human readable.
For example this:
Is a much longer, albeit easier to read equivalent of:
Additionally, it is worth noting any white space between the <pre> tag needs to be preserved, as is any in a element with style white-space:pre or white-space:pre-wrap. In order to make our solution the most general, and skip the need for a full CSS interpreter, we're going to omit the cases of elements with white-space CSS style attribute set. We'll show you how you can modify the script to account for specific such elements should you need to. Our solution is not general enough to analyze CSS and determine which elements need to have their white space preserved by class name on the fly.
Consider the following obviously contrived case:
In the above example, the end style tag is outputted to the HTML buffer by the executed PHP code. In order to know that line 4 in the above example should be treated as HTML, our minifying script would need to be able to interpret the PHP code which would increase the complexity of our minifying code dramatically.
To make the problem tractable, we make the following assumptions about our PHP code:
Since PHP is executed server side rather than client side, non minimized PHP code does not effect HTML file size nor bandwidth and does not need to be minimized.
As before, the first thing we want to do is preserve important strings. This time we want to be sure we preserve all emedded PHP. To do tis we create the function preserveEmeddedPHP()
The function preserveEmeddedPHP() seems simple enough. When a <? is detected in a string (line 5), preserveEmeddedPHP() sets $end_php to the first position of '?>' in line 15. We can't stop there. That closing '?>' may have occured in the middle of a string and is not intended to end the PHP code block. Thus lines 18-28 look for the next search for the nearest double and single quoted strings.
Next we're going to extend our MinificationSequenceFinder and create a sequence finder capable of searching for regular expressions called RegexSequenceFinder
RegexSequenceFinder searches for the first occurance of the regex in the sample string. Once it's found, the start index (start_idx) and the end index (end_idx) of the entire regex are stored at lines 20 and 34. If a submatch is also found, it's start index (sub_start_idx) as well as the entire matching string (sub_match) are also stored at lines 30 and 31.
As stated above, the minifying of HTML is just a straight forward collapsing of white space. There's still room for improvement. After all, white space between elements that contain no non-white space characters are also collapsed. For example, " <i> <b> " is functionally equivalent to " <i><b>". Nevertheless this is a good first pass.
To see how minifyPHP does, consider the following input.
The above example contains some CSS that will need to be minified as CSS, not HTML, and an HTML comment. Minified we get
Want to give it a try? Use our Minifier.