Monday, January 31, 2011

Unit Testing - the Bad

Previously I talked about some of the benefits of unit testing. However, it's not all peaches and cream. There are costs to unit testing.

The most obvious cost is the time taken to write the unit tests. If you are trying to test a component which manipulates very complex objects, you first have to generate these complex objects and then you have to validate them after the fact. If you are trying to test a component that is being used in a multi-threaded environment you have to create a mock environment with multiple threads. In these types of scenarios it can take significantly more time and effort to write meaningful unit tests than it did to write the actual component in the first place. The testing environment can be complex enough that it'll also take more time to debug the tests than it does to debug the code.

If you are writing unit tests, you are probably using a framework like JUnit or TestNG. You may also be using additional tools to help reduce some of the complexity discussed in the previous paragraph. However, each of these tools adds a dependency. It's frustrating enough when you can't build a project because you are missing a library. Not being able to build because the tests are missing a library is worse. This can lead to developers just deleting/commenting/ignoring tests so they can go forward, meaning all that time you spent is wasted.

One of the biggest challenges to developing software is maintaining code as requirements change. A promise of automated unit tests is that they allow you to refactor code without fear of breaking things. However, a large enough suite of unit tests can actually make it more work to restructure code. When you decide that you need to completely restructure entire packages, destroying classes, and creating new ones, there's obviously a lot of work in modifying the rest of the code to be consistent with these changes. Unit tests can be a large part of your code base, and with large structural changes they will have to also be drastically changed to appropriately test the new code. While logically this is no different than the cost of writing the code and unit tests to start with, psychologically it is different. This added work can encourage you to just tack on changes in the most expedient way. i.e. A large suite of unit tests can actually cause the exact opposite effect that is desired, making you less likely to make valuable architectural changes rather than more likely.

Imagine that you have modified your phone number code from before so that phone numbers are passed around as PhoneNumber objects rather than as Strings. To work with this you modify the PhoneFormatter's format method to take a PhoneNumber object rather than a String. Well, now all of your PhoneFormatter unit tests have broken. While this particular example isn't too much work to fix, especially if you refactor the test code first, it's still extra work. I'm sure you can imagine how larger changes on more complex code could require major reworking of unit tests.
  1 import static org.junit.Assert.*;
  2 import org.junit.Test; 
  3 import org.junit.Before; 
  5 public class PhoneFormatterTest {
  7   private PhoneFormatter mFormatter; 
  9   @Before public void setup() {
 10     mFormatter = new PhoneFormatter(); 
 11   } 
 13   @Test public void testDirectNumber() {
 14     check("(757) 555-1234", "7575551234"); 
 15     check("(757) 555-1234", "17575551234"); 
 16     check("555-1234", "5551234"); 
 17     check("911", "911"); 
 18   } 
 20   @Test public void testDialNine() {
 21     check("(757) 555-1234", "97575551234"); 
 22     check("(757) 555-1234", "917575551234"); 
 23     check("911", "9911"); 
 24   } 
 26   private void check(String expected, String number) {
 27     PhoneNumber pn = new PhoneNumber(number);
 28     assertEquals(expected, mFormatter.format(pn)); 
 29   } 
 30 }

Sunday, January 23, 2011

Book Learning

The other evening I was discussing old computer technologies with a colleague and he talked about programming with punch cards. While I am not that old my early programming days were very different from today. During my first few years of formal computer science lessons I only had access to a computer for 1.5 to 2 hours a week. To be able to get my assignments done on time, I had to do much of the programming on paper and pencil away from the computer. Maybe its just that experience, but I still find that the best way for me to learn a computer technology is to step away from the computer. Obviously, the hands on aspect of using the technology is invaluable but stepping away from your specific problems and thinking more general is crucial to true understanding. This is the reason that despite the fact that computer books are static, and hence quickly outdated, and the internet provides an almost unlimited source of reference material, books still have a place.

you don't even know what it is you don't knowWhen you teach yourself something new, especially in an ad-hoc way, there are bound to be gaps in what you know. Not only that, but there are gaps in what you know you need to learn. i.e. you not only don't know things, you don't even know what it is you don't know. This can cause you to work around problems the don't even exist, approach things in a way that is counter to the technology - giving you no end of headaches, or just concede that you can't do things that really can be done.

A good book can not only fill those gaps but show you were you have more gaps that need to be filled. The book can also provide the necessary context to help you glue all the pieces that you have learned into a coherent whole and see the big picture. Once you see the big picture it is easier to learn and retain all the details and to even anticipate how those funny corner cases, that every technology has, will behave.

Ulterior Motive
I have to admit to having an ulterior motive to this post. A coworker referred me to a contest where I could win some Ruby on Rails books. I love Rails because I think it is such a clean framework for writing web pages and doesn't fall prey to problems in the commonly used .NET and Java frameworks that I've complained about before. However, despite my comments above, most of my Rails knowledge comes from building a few apps and using the web as a reference, not book learning. I have repeatedly bumped into problems that are caused by my lack of truly understanding Rails. I am definitely at a place with my Rails knowledge where I could benefit from reading a good book.

To win this contest I have to explain how I am working to succeed as a "Rails up-and-comer." Instead, this post talked about books. However, hopefully this will also show a little bit of who I am in the process. I write this blog to publicly show my thoughts and experiences in my attempt to be an "up-and-coming" software development expert. I hope that reflecting on my thoughts enough to write these posts, plus any feedback I may get as a result of the posts, will help me in this process.

Similarly I use Ruby on Rails on my personal projects not just because I think it is easier to use than the alternatives (and more fun), but because I think the software development ideals embedded in Rails and the way it accomplishes them will influence other languages and frameworks in the futue. At the very least, it influences me. I.e. knowing Rails helps me be a better all around software developer, just as "book learning" does.

Monday, January 17, 2011

Unit Testing - the Good

Automated Unit Tests are a big part of the Agile movement. One of the reasons is that refactoring is a big part of Agile. Any time you change code, you risk breaking it. Unit tests mitigate that risk, allowing you to refactor with less fear. The ability to easily verify that you haven't broken things when you refactor code or do bug fixes or other maintenance is probably the biggest gain from creating automated unit tests, but it isn't the only one. Besides increased confidence, unit tests can sometimes actually decrease development time in the short term and also provide long term documentation of your software.

Development Time
When you write code how do you know when it is right? There are different levels of tested that you are probably familiar with.

 0. The code is written
 1. The code compiles
 2. The code runs
 3. You've tested the code

Ever fixed a small typo in your code and then just checked it in? Yep, that's level 0 testing. Level 1 testing (also known as proof by compilation) is more common, especially if you are using an IDE like Eclipse that does constant compiling. Level 2 testing is smoke testing - will the program still run? Unfortunately, some times these levels aren't enough. Whether it is pressure from customers or a boss, or just pesky pride, you actually want to validate that the logic in the code is correct. This requires level 3 testing.

The most obvious way to test is to run your program over and over again, providing inputs that exercise your newly changed code. This can seem easier than writing unit tests but is it really?

If your application is something large, or has external dependencies, each run can take a long time. If your new code isn't perfect the first time through, the "code - compile - deploy - execute - enter test data - validate results" development cycle can be a very slow way of debugging. Writing the unit tests has a higher up front cost, but now the debug cycle has been reduced to "code - compile - execute tests".  Which means that unless you write your code perfectly the first time, it can actually be quicker to write unit tests than not.

Unit Tests provide a great example of how to use a component. Let's say you have a method that formats phone numbers in a canonical way. Reading the unit tests can show you how different addresses get formatted. And unlike almost all other forms of software documentation, it is never out of date. As long as the unit tests pass you know that this "documentation" is still accurate.

As I stated in the introduction, the most obvious benefit of unit testing is confidence. It gives you confidence that the code you've written is correct as you write it. If you create unit tests based on bug reports that come in, you can be confident that they have both been fixed and that the bugs won't reappear in the future. If you are maintaining code with existing unit tests, you can have confidence that your changes aren't breaking other assumptions you may not have been aware of.

Here is a simple unit testing example of a class that formats phone numbers that provides a concrete example of the three ideas from above:
  1 import static org.junit.Assert.*;
  2 import org.junit.Test; 
  3 import org.junit.Before; 
  5 public class PhoneFormatterTest {
  7   private PhoneFormatter mFormatter; 
  9   @Before public void setup() {
 10     mFormatter = new PhoneFormatter(); 
 11   } 
 13   @Test public void testDirectNumber() {
 14     assertEquals("(757) 555-1234", mFormatter.format("7575551234")); 
 15     assertEquals("(757) 555-1234", mFormatter.format("17575551234")); 
 16     assertEquals("555-1234", mFormatter.format("5551234")); 
 17     assertEquals("911", mFormatter.format("911")); 
 18   } 
 20   @Test public void testDialNine() {
 21     assertEquals("(757) 555-1234", mFormatter.format("97575551234")); 
 22     assertEquals("(757) 555-1234", mFormatter.format("917575551234")); 
 23     assertEquals("911", mFormatter.format("9911")); 
 24   } 
 25 }
Having a unit test like this really helps with the development time. Just seeing how the different numbers are supposed to be formatted can help see what you need to do. And it'll be quicker to run the unit test once than to manually test your code with the different types of numbers. It gives the documentation for future developers who can see how the numbers are supposed to be formatted - i.e. what the format method does. And it provides confidence that your code not only handles all of these situations right now, but it still will after it is modified at some future date.

Monday, January 10, 2011

Learning Haml

ERb is the default markup language with Ruby on Rails. However it is not your only choice. While playing around with a rails project I came across Haml. I decided to give it a try, and I am glad that I did. Just like rails opened my eyes as to how clean a web framework could be, Haml has shown me how clean a markup language can be.

Rather than using the XML paradigm of open and close tags, it uses the Python approach of indentation to mark blocks. I'll give an example to show what I mean. Here is the HTML for a simple web page:
  1 <html> 
  2   <head> 
  3     <title>Hello World</title> 
  4   </head> 
  5   <body> 
  6     <h1>Hello World</h1> 
  7     <div id='mainContent'> 
  8       This is a sample html page. 
  9     </div> 
 10   </body> 
 11 </html> 
Before I show the Haml equivalent, I'd like to point out something. In the above HTML example there are 60 characters of content (the 'Hello World's and 'mainContent' and 'This is a sample html page.') and 80 characters of markup! This verbosity in the markup is one of the ugliest things in XML and HTML. You have to work to find the content amidst all the markup.

Here is the Haml that generates the equivalent HTML.
  1 %html 
  2   %head 
  3     %title Hello World 
  4   %body 
  5     %h1 Hello World 
  6     #mainContent 
  7       This is a sample html page.

As you can see, it is much terser. For the same 60 characters of content there are only 25 characters of markup. (I am slightly cheating by not counting whitespace, but I had the same whitespace in the HTML to help make it readable, even though it wasn't required, so I think it is fair to not count it).

Haml Markup
As you probably figured out from the above example, the primary markup is the %.  %tagname creates HTML that looks like: <tagname></tagname> Any content that is either on the same line as this tag (like title and h1 in the example above) or indented on following lines (like head and body) gets placed between the opening and closing tags.

Attributes can be specified via a ruby style hash.  So a tag like:
<img alt="Some Text" height="100" width="75" src="image.jpg" />
could be specified like:
%img{:alt=>"Some Text", :height=>100, :width=>75, :src=>"image.jpg"}
Since id and class are such common attributes, they can be specified similar to how they are specified in CSS.
<div class="myClass" id="myId">
can be written:
Haml has also decided that div is the main structural component of a page, and therefore is implicit if you specify and id or class without a tag.  i.e. the above div could also be written
Of course Haml is intended to be used with rails which means that you can embed ruby into your Haml document. Any line that begins with '=', or any tag that ends with '=' is assumed to be ruby and evaluated.  i.e.
%h1= 5+3
= 2*2*2
results in HTML
Any line that being with '-' evaluates the ruby without placing the results in the HTML. This is useful for control flow. For example:
  - (1..4).each do |i|
    %li= i
creates the HTML:

The above examples aren't intended as a full lesson on Haml, just a teaser to give you an idea of what it looks like. If you want to learn Haml you can look at the reference material at Unfortunately, Haml is currently tied to Ruby on Rails. Hopefully similarly clean/terse markup languages will be created for other frameworks.

Monday, January 3, 2011

Code Maintenance - You Can't Afford to Not Do It

Exercise is something that many of us who sit behind a computer all day don't get enough of. The reasons may be different for each of us but they typically boil down to the same thing - not enough time. Yet in the long run, is this true? Being fit tends to increase life expectancy, giving us more total time. Being fit gives us the energy and ability to do things that we wouldn't otherwise be able to do. Being fit decreases our odds of many different diseases, which gives us time to do things. Exercise even tends to increase our mood, which can allow us to be more productive. So even though exercise takes up time in the short term, it will probably more than pay for itself in the long term. So stop reading this and exercise!  (or read this while exercising).

your code base is probably out of shapeHave you ever had a code base that was so ugly that the easiest path forward was just to blow it up and start over? Have you ever wanted to (or been asked to) add a feature which is logically easy to add, but ends up being a ton of work because of the structure of the existing code base? Do you spend a lot of time tracking down and fixing bugs? If you answered yes to any of these questions, your code base is probably out of shape.

Most code starts out fit. However, no matter what methodology you are using, new features and requirements always come up. At first, since you started with a clean design, it is fairly easy to add these new features. In theory the original design is supposed to be updated and reworked to incorporate these new features, formally if using waterfall and via aggressive refactoring if using Agile. In practice this stage is often not given enough attention. While the first tacked on feature isn't a big deal, this affect snowballs over time until your beautifully fit and sculpted code becomes a blob of interconnected modules and features that is afraid to go out in public for fear of catching bugs.

Its up to you to take the time to keep your code fit. If you don't have the support of management, then build the time into your schedule. Don't "complete" a requested feature until you have taken the time to make sure it is clean. Sooner or later you'll have to take the time. If you don't take it now it'll be when the preponderance of bugs appear in the future. It is your professional responsibility to choose when it makes sense to take the extra time.

Oh, and just as exercise can have immediate, if not tangible, benefits (like improved mood), so can code maintenance. Analyzing and rethinking how code is written and how it could be written is one of the best ways to improve your own design and development skills making you a better programmer.

Obviously, it is possible to go overboard with code maintenance. Just as some people spend there whole life in the gym trying to have the perfect body but then never have free time to enjoy it, so too is it possible to spend all your time cleaning code and never actually creating anything new.  So what is the right amount of code maintenance? You'll have to figure out what makes sense for your environment. However, if you ever try to make a change that ends up being a lot more complicated than it should be, it is time to clean up your code.

So consider this a friendly public service announcement to maintain your code.  And exercise!