Monday, September 26, 2011

Learn Your Libraries

Story 1
I still remember the month long final project from my first ever CS class. Actually, I remember almost nothing from the project except for a bug that caused me to have to throw away my first three weeks of work. I had just learned about enumerations in Pascal and thought they were the neatest thing since sliced bread. I designed my entire project around enumerations. Users would enter their menu choices via enumerations. I would display the enumerations as options. All of this depended very heavily on my understanding of enumerations which, as it turned out, was in no way related to what enumerations actually are.

Unfortunately, I didn't find this out until I had spent two weeks writing the entire program and tried running it. It didn't work at all. Like any good developer I randomly changed various lines of code until it worked. A week later, with the project deadline looming, my project was not any closer to working. It was time for drastic action. Since I couldn't seem to get the enumerations to do what they were "supposed" to do, I actually read up on them. Imagine my dismay when I realized that all the assumptions that I had made were wrong. And not just a little bit wrong. The reality (enums are a typesafe way to have named constants) and my vision (they'll do everything I want, including cooking dinner and solving the halting problem) were completely unrelated. I spent the rest of the day griping and complaining about why would anybody create such a useless design feature. When I realized that that wasn't getting me any closer to a program that I could actually turn in, I finally sat down and rewrote the entire program using only Pascal features that I actually understood, and eschewed enumerations entirely.

Story 2
The other day a friend asked me what I knew about Java Serialization. After asking for details, it turns out that he was working on a project where they were doing bit-diddling on the Serialized output of an object. One of the assumptions that they had was that if you had an objects instance1 and instance2, and you serialize both of them, that the serialized byte data would be identical if instance1 and instance2's fields are all identical. They had run a bunch of experiments and it seemed like their assumption wasn't always holding. I think the question to me was hoping that I would point out something they had missed that would give them an easy fix.

Upon check the Serialization Spec, it turns out their assumption was not valid. My suggested solution was to not use Java's Serialization, but rather write their own conversion to bytes that would fit their specific needs. This may seem like reinventing the wheel, but as I've mentioned before, I believe that rolling your own is often the way to go.

Story 3
On a project I once worked on, the decision was made that all data was going to be handled as XML. All of the data was stored as an XML document in a single column in a database. To process individual data items, XPath queries would be made. HTML reports would be created by running giant XSLT transforms on the data. The assumption was that because XML has such great support, it would be easy to add new data (just turn it into XML and merge it with the other data), and create new reports (just write a new XSLT).

The reality is that just because you can create a general tool to handle syntactically correct XML, you still need to customize the logic for the semantics. Which meant every new data source added still required code and logic changes. As for generating the HTML via XSLT transformations, its very difficult to use modern HTML features, like Ajax, this way. And by making such a large portion of the code base XSLT and XPath, we couldn't take advantage of the IDE's ability to do easy code completion, navigation, and refactoring, or leverage the development team's OO skill to create small, easily testable, and reusable components. And then there's the performance issues, both in time and space, that were encountered with trying to manipulate and store large quantities of large XML documents.

The moral of these stories is that you need to know your tools. Enumerations, Serialization, and XML are all great, when used appropriately. When not... well, invalid assumptions lead to bugs. Invalid assumptions about crucial components of your software lead to large scale rewrites. To avoid the large scale rewrite, we often have a period of denial where we try to make the unworkable work. The worst case is that with enough bubble gum, chicken wire, and ingenuity we do make it work, at least some of the time. This results in a maintenance nightmare keeping it working and large scale rewrites are even less likely to happen once you've got "working" code.

While we all know that assumptions can be bad, the danger is when we don't realize we are making assumptions. In the first story I thought I know what enumerations did. The developers in the second story had used Java Serialization on many projects and thought they fully understood it. In the third story the benefits of XML were considered without awareness of all of the limitations and restrictions that came along. i.e. We didn't know what we didn't know.

The conservative solution to this problem is to only use technologies you have already used successfully on all your projects. While this seems to be favored by many people, I am too much a fan of shiny objects and so like to try new things. The key to using new technologies (or existing ones in new ways) is to limit the scope until you have used it successfully. If it is going to be integral to your design, create a small throw-away prototype first to make sure you find the dark corners and sharp edges first.

The most important solution, though, is to be willing to admit you were wrong. Its impossible to get very far in life without making assumptions, and despite your best precautions, sooner or later, you will be wrong about an assumption. Rather than try to patch on hack after hack to try to make a bad assumption viable, be willing to break from the past and rewrite/redesign code given your new knowledge. Even if it means throwing away a lot of work.

1 comment:

DeltaEchoBravo said...

"XML is like dynamite. If it isn't solving your problem, you aren't using enough of it." :P