The Art Of The Bodge
A bodge, a kludge, a hack. All essentially the same thing. A quick and dirty solution that can be inefficient, hard to maintain or clumsy. Now I bet you are wondering how you could ever have a good bodge and, though I like the description above, I prefer the definition “engineering but with larger tolerances”. Just because something is, ‘spaghetti code’ does not mean it is not justified, with many using the bodge as a tool to help speed up production when total redesign of a solution is impractical.
UTF-8 and GREP
Many of you reading this have most likely heard of Ken Thompson, the man who created the original Unix operating system. Thompson was the master of the bodge. His most notable being the definition for UTF-8, the character encoding used by almost 92% of the World Wide Web today.
A little history here, in the 1960’s the Americans came up with ASCII, a clever way of encoding characters to 7 bits. Then new 8 bit processors came along, allowing a whole 127 other possible characters in the encoding. This was fine up until the 1990’s, when the World Wide Web was invented. People now wanted to send their ASCII encoded emails from America to the multibit encoding of Japan. This caused so many problems that the Japanese even came up with the word ‘mojibake’, meaning garbled text as the result of decoding using an unintended character encoding. Due to this, the Unicode consortium was created to impose a standard on the current 100,000 characters of the world.
Moving forward, let’s look at UTF-8. It had to encode those 100,000 characters which would need about 32 bits per character. UTF-8 starts by encoding English in the same way as ASCII (if it isn’t broken don’t fix it), meaning you would have had a lot of zero’s in every English character you type, averaging it at 24 bits of wasted space. Another problem was old systems that would see 8 0’s would see this as a NULL and stop reading. And finally as if it wasn’t bad enough the whole thing had to be backwards compatible. So UTF-8 starts by taking the original ASCII encoding for the first 7 bits, with an extra 0 on the end to make 8 bits. Now if you want something higher than this you add headers to the bytes. For two bytes, the first three characters would be 110. Two 1’s, two bytes. And the next byte would start with 10, a continuation flag. The rest could be filled in with the encoding for the character. This continues, with each byte adding a 1 to the first byte used, up to 111110 or 5 bytes of encoding. Not only did this fix every problem currently posed by different encodings, it would also allow for the entire code to be written on a mere napkin, which it was by Ken Thompson and Rob Pike in a 1992 in a New Jersey diner when they invented UTF-8.
However, thisasn’t Ken’s first bodge. In 1974 a friend of his, Lee E. McMahon, was attempting to analyze the text of old federalist papers. The ed editor Ken had developed for the Unix system could support regular expressions but on a smaller scale so was unable to help McMahon. Therefore, Ken decided to take the code from the ed editor and bodge it into its own standalone tool overnight, by Globally searching for Regular Expressions and Printing them out, also known as GREP, a now standard command in UNIX.
Another example of the potential benefits of the bodge came from a game called “Ratchet and Clank: Up Your Arsenal” in 2004. Their third installment and the first online game in the franchise. It had one major , it lacked the ability to allow developers to patch or update the code. This would have been the end of the game, if it were not for the fact that they had a bug in their system, the End User License Agreement was downloaded from the server and stored in a static buffer that was never checked for the correct size. They saw that they could exploit this bug by overflowing the EULA string. Overflowing this string enough allowed that to overwrite a global variable that just so happened to be a callback handler for a specific network packet. Once they had rewritten the handler, they were able to send a network packet to jump to a pointer, which pointed to code that was stored earlier in the EULA.
A critical problem arose when the EULA used strcpy, meaning that when it finds a 0 byte it ends the string. The code used too many 0 bytes, so, instead they bodged the compiled code to contain no 0 bytes and proceeded to construct a bootstrap in the assembler to un-bodge it in the game. This allowed them to use their own buffer overflow bug to create a bodge that allowed them to update and patch their game, as they wanted.
Ratchet & Clank: Up Your Arsenal (2004) ©Insomniac Games
Bodging Isn’t A Magic Fix
In general, the bodge is an approach often brought on by having to implement solutions for systems that are so tied down, there is little else you can do but bodge it the best you can. However, developers can feel the need to draw on the art of the bodge when being pressured into creating quick fixes in order to meet tight deadlines.
Having a developer implement a bodge could be sustainable for smaller teams, but as a team grows and adapts, coming across that bodge several months later could result in serious problems. This bodge now, deal with it later patter is a common problem for Agile projects lacking proper management. However, this need to bodge can be corrected through the implementation of correct retrospectives, the correct estimation of the time it takes to complete a task and remembering the Agile manifesto, “Individuals and interactions over processes and tools”.
As much as I have praised the art of the bodge, it is worth remembering that for every successful bodge there are a hundred more that can destroy a system and your employee’s enthusiasm.