In web development it's often adventageous to minify embeded files. Minification refers to the process of removing all unnecessary characters while leaving the core functionality in tact. The end result is a new file which is smaller in filesize to the original, yet identical from a machine perspective. The core benefit to smaller files is that they require less bandwidth and are faster for the client to download. Although not intended, the minification profess can make code more difficult for humans to read, which is why minification can also been seen as light weight obfuscation.
Here at data·yze we use PHP to push files from our development enviornment to production. We opt to automate the minification process during this push step. This gives us all of the benefits of minimization without the drawbacks. Our users get the smaller, faster to download version of the code. Our developers working in the testing enviornment work with the larger, more human readable version of the code.
Being data people, we're never satisified with a partial result.
Lines 5-8 handle singleton clauses not surrounded with braces. Line 11 normalizes white space. White spaces blocks with a new line are reduced into a single newline, where white space blocks without newlines are reduced to a single space. We need to preserve the newline because we do not yet know which ones are terminating statements and which ones aren't. Line 12 strips extra horizontal white space between non variables and numbers. Lines 14-15 strips newlines around punctuation. Again, we do not know yet which newlines terminate statements, so we do not remove newlines after any end braketing (line 15). Line 17 rolls up 'else' clauses. At this point the only newlines left must terminate clauses, so we replace them with semicolons. Since newlines and semicolons require the same number of bits, we could remove all semicolons in favor of newlines if we prefered. Either way, the code will have the same footprint.
While preforming our shallow parse we're going to scan the code for the next key marker that indicates the start of a string or function. We preform this task in getNextKeyElement.
The next step is to find the minimum index in our $elements array. Note that strpos can return false if the specified key marker isn't found in the string, and zero is equivalent to false doing loose comparison. Thus line 16 filters out the 'false' indexes from our array. We need a custom filter for array_filter with strict comparison so we don't accidently filter key marker with zero offset.
We use the min function to find the minimum index (line 19), and array_keys (line 20) to find the marker that corresponds to that minimum index. How we process the characters that come after that key marker will depend on which key marker it is, so we return the tupial (offset, key marker).
Line 3 uses preg_match to find the next instance of $char from the $start offset. The second capture group captures the character we're searching for. The first, (\\\\*) is capturing the number of dashes before it. If there are no dashes, or the number of dashes is even, then the character is not escaped and we've found the end deliminator. Otherwise, it is escaped and just an instance of the character emedded in the string/comment.