Sep 2, 2014 6:30 AM

Most Code Is an Ugly Mess. Here's How to Make It Beautiful

This is what ugly code looks like. It’s a dependency diagram—a representation of interdependence or coupling (the black lines) between software components (the grey dots) within a program. A high degree of interdependence means that changing one component inside the program could lead to cascading changes in all the other connected components, and in turn […]

This is what ugly code looks like. It's a dependency diagram—a representation of interdependence or coupling (the black lines) between software components (the grey dots) within a program. A high degree of interdependence means that changing one component inside the program could lead to cascading changes in all the other connected components, and in turn to changes in their dependencies, and so on.

Programs with this kind of structure are brittle, and hard to understand and fix. This dependency program was submitted anonymously to TheDailyWTF.com, where working programmers share "Curious Perversions in Information Technology" they find as they work. A user commented, "I found something just like that blocking the drain once."

Introducing the Big Ball of Mud

Software is complicated because it tries to model the irreducible complexity of the world. Even a simple software requirement for a small company that, say, provides secretarial services for the medical insurance industry—"We need an application that makes it easier for our scribes to write up reports from doctors’ examinations"—will always reveal a swirling hodgepodge of exceptions and special cases.

Some of the doctors will have two addresses on file, some will have three. A report will always begin with a summary of the patient’s claimed condition, unless it’s being written for Company X, which wants a narration of the doctor’s exam up front. And so on.

The program you create in response to these requirements must reduce repetitive labor, automate the work that must be done each time, yet remain flexible enough to allow variation. The business practices that can be formalized into sets of procedures are easy to convert into code.

Geek Sublime: The Beauty of Code, the Code of Beauty

, by Vikram Chandra.

But soon, as you adapt your procedural engine for the exceptions, for all the variations that exist in the real world, you find yourself snarled in squirming thickets of if-then-else constructs, each of which contains yet other if-else-if and switch-case monsters, and you find that you have to break out of your beautiful Report-Main-Body loop and backtrack to other reports to retrieve history, and then, inevitably your procedures become more complex and start doing two things instead of one, RetrievePatientInfo() is now doing the retrieving but is also checking for valid addresses, you know that functionality should be somewhere else but you don’t have the time to bother, the users ask for a new feature and you patch it in, and of course you mean to come back later and clean everything up, but then, before you know it, you are trapped inside an unwholesome, uncontrollable atrocity, what Brian Foote and Joseph Yoder called a Big Ball of Mud.

Often, it is not the lack of programming skill that leads to the emergence of a Big Ball of Mud, but something akin to the time-honoured Indian practice of jugaad. Jugaad is Hindi for a creative workaround, a working improvisation that is built in the absence of resources and under pressure of time.

There can be something heroic about jugaad, as in the strange-looking trucks one sees bumping down country roads in rural India, which on closer examination turn out to be carts with diesel irrigation pumps strapped on. Jugaad makes do, it gets work done, it maneuvers around uncooperative bureaucracies, it hacks. In recent years, jugaad has been recognized as down-to-earth creativity, as a prized national resource, and has acquired the dignifying sobriquet of "frugal engineering."

In software, repeated applications of excessively frugal engineering by a series of programmers leads to a scheme that has no discernible structure, within which components use each other’s functionality promiscuously, so that the logic of the program becomes hard or impossible to follow. Yet software needs maintenance: bugs need to be fixed, new features are demanded by users. How can you fix something you can’t understand?

What if your fix introduces new bugs that reveal themselves in some future disaster which corrupts and loses data? The impulse then is to rewrite the whole program from the bottom up, in accordance with hard-won principles of good program design. But—often there is no budget for a complete rewrite, there is no time, there isn’t enough manpower. So maybe you patch a bit here, work in a clumsy kludge there—jugaad!

Mostly, managers prefer to plug up the holes and leave the Big Balls of Mud to roll on. COBOL, a language first introduced in 1959 by Grace Hopper (‘Grandma COBOL’), still processes 90 per cent of the planet’s financial transactions, and 75 per cent of all business data. You can make a comfortable living maintaining code in languages like COBOL, the computing equivalents of Mesopotamian cuneiform dialects.

These ancient applications—too expensive to replace, sometimes too tangled to fix or improve—run on, serving up the data that appears on the chromed-up surface of your browser, which gives you the illusion that your bank and your local utility companies live on the technological cutting edge. But as always, the past lives on under the shiny surface of the present, and often, it is too densely tangled to comprehend.

An Elegant Solution to an Ugly Problem*

The day that millions will dash off beautiful programs—as easily as with a pencil—still remains distant. The "lovely gems and brilliant coups" of coding remain hidden and largely incomprehensible to outsiders. But the beauty that programmers pursue leads to their own happiness, and—not incidentally—to the robustness of the systems they create, so the aesthetics of code impact your life more than you know.

For example, one of the problems that have always plagued programmers is the "maintenance of state." Suppose you have a hospital that sends out invoices for services provided, accepts payments, and also sends out reminders for overdue payments.

On Tuesday evening, Ted creates an invoice for a patient, but then leaves the office for an early dinner; there is now an "Invoice" object in the system. This object has its "InvoiceNumber" field set to 56847, and its "Status" field set to "Created." All of these current settings together constitute this invoice’s "state."

The next morning, Ted comes in and adds a couple of line items to this invoice. Those inserted line items and a new "Status" setting of "Edited" along with all the other data fields are now the invoice’s state. After a coffee break, Ted deletes the second line-item and adds two more. He has changed the invoice’s state again. Notice that we’ve already lost some information—from now on, we can’t ever work out that Ted once inserted and deleted a line item.

If you wanted to track historical changes to the invoice, you would have to build a whole complex system to store various versions. Things get even more complicated in our brave new world of networked systems. Ted and his colleagues can’t keep up with the work, so an offshored staff is hired to help, and the invoice records are now stored on a central server in Idaho.

On Thursday afternoon, Ted begins to add more line items to invoice 56847, but then is called away by a supervisor. Now Ramesh in Hyderabad signs on and begins to work on the same invoice. How should the program deal with this?

Should it allow Ramesh to make changes to invoice 56847? But maybe he’ll put in duplicate line items that Ted has already begun working on. He may overwrite information—change the "Status" field to "Sent"—and thereby introduce inconsistencies into the system. You could lock the entire invoice record for 56847 on a first come, first served basis, and tell Ramesh he can’t access this invoice because someone else is editing it. But what if Ted decides to go to lunch, leaving 56847 open on his terminal? Do you maintain the lock for two hours?

Guarding against inconsistencies, deadlocks of resources by multiple users, and information loss has traditionally required reams of extremely complex code. If you’ve ever had a program or a website lose or mangle your data, there’s a good likelihood that object state was mismanaged somewhere in the code. A blogger named Jonathan Oliver describes working on a large system:

It was crazy—crazy big, crazy hard to debug, and crazy hard to figure out what was happening through the rat’s nest of dependencies. And this wasn’t even legacy code—we were in the middle of the project. Crazy. We were fighting an uphill battle and in a very real danger of losing despite us being a bunch of really smart guys.

The solution that Oliver finally came to was event sourcing.

With this technique, you never store the state of an object, only events that have happened to the object. So when Ted first creates invoice 56847 and leaves the office, what the program sends to CentralServer in Idaho are the events "InvoiceCreated" (which contains the new invoice number) and "InvoiceStatusChanged" (which contains the new status). When Ted comes back the next morning and wants to continue working on the invoice, the system will retrieve the events related to this invoice from CentralServer and do something like:

Invoice newInvoice = new Invoice();
foreach( singleEvent in listOfEventsFromCentralServer )
{
newInvoice.Replay( singleEvent );
}

That is, you reconstitute the state of an object by creating a new object and then "replaying" events over it. Ted now has the most current version of invoice 56847, conjured up through a kind of temporally shifted rerun of events that have already happened. In this new system, history is never lost; when Ted adds a line item, a "LineItemAdded" event will be generated, and when he deletes one, a "LineItemDeleted" event will be stored.

If, at some point in the future, you wanted to know what the invoice looked like on Wednesday morning, you would just fire off your "Replay" routine and tell it to cease replaying events once it got past 9 a.m. on Wednesday morning.

You can stop locking resources: because events can be generated at a very fine granular level, it becomes much easier to write code that will cause CentralServer to reject events that would introduce inconsistencies, to resolve conflicts, and—if necessary—pop up messages on Ted and Ramesh’s screens. Events are typically small objects, inexpensive to transfer over the wire and store, and server space grows cheaper every day, so you don’t incur any substantial added costs by creating all these events.

As I learnt about the beauty of event sourcing, I was reminded of other discussions of identity-over-time that had bent my mind. The Buddhists of the Yogachara school (fourth century CE) were among the proponents of the doctrine of "no-self," arguing: "What appears to be a continuous motion or action of a single body or agent is nothing but the successive emergence of distinct entities in distinct yet contiguous places."

There is no enduring object state, there are only events. To this, 11th-century philosopher Abhinavagupta responded with the assertion that there could be no connection between sequential cognitive states if there were not a stable connector to synthesize these states across time and place. There may be no persistent object state, but there needs to be an event-sourcing system to integrate events into current state. For Abhinavagupta, memory is the pre-eminent faculty of the self: "It is in the power of remembering that the self’s ultimate freedom consists. I am free because I remember."

Excerpted from Geek Sublime: The Beauty of Code, the Code of Beauty, by Vikram Chandra. To be published by Graywolf Press in September.

*UPDATE 09/05/14 1:00 p.m. ET: This story was updated to clarify its second subhead.