Minification refers to the process of removing all unnecessary characters while leaving the core functionality of the code in tact. The end result is a new file which is smaller in size to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.
Here at data·yze we use PHP to push files from our development environment to production. We opt to automate the minification process during this push step. This approach gives us all of the benefits of minimization (smaller files which require less bandwidth for the users) without the drawbacks (forcing our developers to work with giant blobs of difficult to read code).
A word of caution: when minimizing, always test the final product still behaves as expected!
When we talk about minimizing HTML we're often referring to minimizing white space. By default, web browsers collapse multiple white spaces into a single space, yet nicely formatted HTML source code often contains long blocks of white space to make the source code more human readable.
For example this:
It is worth noting any white space between the <pre> tag is preserved, as is any in a element with style white-space:pre or white-space:pre-wrap. In order to make our solution the most general, and omit the need for a full CSS interpreter, we're going to omit the cases of elements with white-space CSS style attribute set. We'll show you how you can modify the script to account for specific such elements should you need to. Our solution is not general enough to analyze CSS and determine which elements need to have their white space preserved by class name on the fly.
Consider the following obviously contrived case:
In the above example, the end style tag is outputted to the HTML buffer by the executed PHP code. In order to know that line 4 in the above example should be treated as HTML, our minifying script would need to be able to interpret the PHP code which would increase the complexity of our minifying code dramatically.
To make the problem tractable, we make the following assumptions about our PHP code:
Since PHP is executed server side rather than client side, non minimized PHP code does not effect HTML file size nor bandwidth and does not need to be minimized. Since the amount of code expected to be outputted by the PHP is small, we don't worry about minimizing it.
The value of the elements array corresponds to the offset where the control sequence was first encountered with a simple switch statement.
Note lines 18-22. These lines are searching for a div with class phpcode (and no additional attributes.) This class is a special case at data·yze where white-space has been set to pre-wrap, and we're guaranteed to have no nested div elements. We have chosen to leave it in the code as an example of how you could handle white-space:pre elements, or classes that set white-space to pre and pre-wrap, should you have them on your site.
The next function, handlePHP preforms the specific minification needed for each code block.
This code works as follows. We use a switch statement on the next control element returned by getHTMLKeyControlElements(). In each case we minimize the block of code from the current offset to the offset of the next control element as PHP. We then handle the block of code from the control element to the ending control flow element as appropriate. For Style and Script tags we need to look for embedded PHP.
Line 59 corresponds with the our specific white-space:pre-wrap class case. Note, no minification is preformed within the div, as the white space needs to be preserved.
As stated above, the minifying of HTML is just a straight forward collapsing of white space. There's still room for improvement. After all, white space between elements that contain no non-white space characters are also collapsed. For example, " <i> <b> " is functionally equivalent to " <i><b>". Nevertheless this is a good first pass.