Monday, November 29, 2010

Design - Sometimes Naive Is Best

I am working on a project where I have to read in a data file and store the data in a database. The datafile is straightforward comma separated values with headers. i.e. something like:

header1, header2, header3
value1a, value2a, value3a
value1b, value2b, value3b

There are a fixed and well-defined set of possible values for the headers. The database table I am storing this in has one column per header type. The only caveat is that there is no guarantee as to the order of the headers, or that each possible header will exist in any given file.

Obviously, this is a fairly straightforward problem to solve. However, there are nearly 100 possible header values, so if the code looks ugly, it'll look really ugly. As I am always trying to improve my coding, I toyed with various approaches to the problem to come up with the cleanest approach.

Database Access
Since we use JPA/Hibernate on other projects, it seemed natural to use that on this project. To that end, I have a class like:
  1 @Entity 
  2 public class DataRecord {
  3   private Long mId; 
  4   private String mValue1; 
  5   private String mValue2; 
  6   // rest of instance variables go here 
  8   // getters and setters go here 
  9 }
Given that this class is going to be useful elsewhere in my code and is consistent with our other projects, I took the existence of this class to be one of the design constraints. The question is how to best populate these objects based on the file.

File Parsing
I'm not going to go into the details of how I parse the file. For simplicity assume that I have used the String.split method with the appropriate regular expression so that I have variables
String[] headers; // an array of the headers specified in the file
String[][] lines; // an array of lines where each line is  
                  // an array of the values specified in the file

My initial desire is to create a map of header name to DataRecord setter method. i.e. something like:
recordMap = {"header1" => setValue1, "header2" => setValue2, /* etc. */ }
and then the parsing code could look something like:
  1 for(String header : headers)  setMethods[ctr++] = recordMap.get(header);
  2 for (String[] line : lines) { 
  3   DataRecord record = new DataRecord(); 
  4   ctr = 0; 
  5   for (String value : line) { 
  6     setMethods[ctr++].call(record, value); 
  7   } 
  8 }
Unfortunately, functions are not first class objects in Java and as of this writing look like they won't be until Java 8. And yes, using Java is another of my design constraints.

So what are my options for accomplishing something like this in Java? Well, I could store the name of the setter method in the recordMap object and then use reflection to accomplish what I want. While reflection has a performance hit, it probably would be negligible compared to the file and database I/O that is already happening, not to mention the reflection that Hibernate is already using under the covers. However, using reflection like this feels like an inappropriate use of the technology - it is turning method calls that could be typesafe into dynamic calls.

Anonymous Inner Classes
If I am not going to use reflection, I could do this the "Java" way, using interfaces and anonymous classes in the place of method pointers.  I.e. something like:
public interface FieldSetter {
  void setField(DataRecord record, String value); 
Then the recordMap initialization would look something like:
private static Map<String, FieldSetter> recordMap = 
  new HashMap<String, FieldSetter>(); 
static { 
  recordMap.put("header1", new FieldSetter() {
    public void setField(DataRecord record, String value) {
  /* repeat for every other possible header */ 
The parsing code can then be written almost as shown in the pseudo code in the approach section above. This works, but Yuck that sure is an ugly way to initialize the recordMap object.

So how else could this be done? Well, it turns out that the all of the possible legal values for headers happen to conform to the Java spec for identifier names. This means I could have an enumeration like so:
public enum Headers { header1, header2, header3, /* etc */ }
The code that reads the header line would look something like:
Headers[] headerEnums = new Headers[headers.length] 
int ctr = 0; 
for(String headerName : headers) { 
  headerEnums[ctr++] = Headers.valueOf(headerName); 
The parsing code would then have a giant switch statement in the inner for loop:
  1 for (String[] line : lines) {  
  2   DataRecord record = new DataRecord() 
  3   ctr = 0;  
  4   for (String value : line) {  
  5     switch (headers(ctr++) {  
  6       case header1:  record.setValue1(value); break;
  7       case header2:  record.setValue2(value); break;
  8       // etc  
  9     }  
 10   }  
 11 }
This isn't great, but actually ends up being fewer lines and more readable than the interface/anonymous method approach. However, it feels dirty to require that the enum values, which are supposed to be names that are semantically meaningful to the program, have to have exactly the same name as the headers in the data input files. We could have the Headers enum have a constructor which takes the name - thereby decoupling the enum name from the file name. But now we have to implement our own valueOf that handles that name, plus more than doubling the size of the enum declartion, since it would now look like:
public enum Headers { 
  HEADER1("header1"), HEADER2("header2")....;
  /* insert constructor and valueOf code here */ 
Which means with the enum approach, I can choose between dirty and a little bit ugly, or clean and uglier.  Ugh.

"Naive" Approach
So which of these approaches should I choose? Well, first, how would I have solved this problem back before I knew as much about design?
  1 for (String[] line : lines) {  
  2   DataRecord record = new DataRecord() 
  3   for (int i = 0; i < line.length; i++) {
  4     String value = line[i]; 
  5     String header = headers[i]; 
  6     if ("header1".equals(header)) record.setValue1(value);
  7     else if ("header2".equals(header)) record.setValue2(value);
  8     /* etc. */ 
  9   } 
 10 }
How "clean" is this approach? We specify each header value exactly once - in the if statement. We've co-located the header value with the behavior just like my original idea of a recordMap object. This code isn't any larger than the other approaches. There is nothing sneaky or clever happening so the code should be easy to understand, maintain, and update by other people (or myself) down the line. All in all, I think we have a winner.

Final Thoughts
Besides reiterating the obvious moral that sometimes being clever can be bad, I wanted to address a couple other things.

take some time now to save time in the futureFirst, why did I spend so much time and effort on something that very easily could've been a homework assignment back in school? Well, maybe it was foolish, but I'd like to offer up this one defense. In the "real world", unlike in school, code lives on and will have to be maintained. Taking some time now to think about clean design could very well pay itself back in saved time in the future.

Second, what about performance?  All of those string comparisons in the final solution bug me as being slow. If we have 50 headers, and its the last one that matches, that's 50 string comparisons that have been made - not particularly efficient. However, as mentioned up above when discussing reflection, no matter how slow this is, it is almost certainly orders of magnitude quicker than the file I/O and database accesses. So this fear for performance is almost certainly unfounded.  Should profiling determine that this is a bottleneck, the String[] headers array can have all of its strings interned, which would allow us to convert all of the equals checks to the much quicker == check.

Third, what about code style? Why didn't I put curly braces around the body of each if statement (lines 6 and 7)? Sure, with one statement it isn't required, but it is generally considered good style to always use curly braces. Partly it was for purposes of (slightly) shrinking the size of this blog post that is already too long. The other is that I think in this particular situation, there were going to be nearly 100 possible header values, and thus nearly 100 else if clauses. Given this, I think the code might actually be more legible and understandable if it only takes up 100 lines, rather than 200 or 300 lines (depending on whether you put your else on the same line as the previous closing curly).

Fourth, I do want to reiterate the main point. Just because you can solve a problem using a fancy technique or design pattern doesn't mean you should. Your job as a developer to weigh the pros and cons and come up with the best solution for your particular problem.

Monday, November 22, 2010

Scrolling in JavaScript

With the app I've been working on I'd like to be able to have a map that is larger than what fits on the screen for the icons to move around in. This can be done straightforwardly by putting scrollbars around the canvas element. In addition, I'd like for the window to always show the icon while it is moving, which means it may have to autoscroll. Here's how I did that.

HTML Scroll Bars
Here is what the HTML looks like that contains the canvas:
<div id="windowContainer">
  <canvas class="gridLayer" id="gridLayer" width="800" height="500"></canvas>
  <canvas class="gridLayer" id="drawLayer" width="800" height="500"></canvas>
  <canvas class="gridLayer" id="iconLayer" width="800" height="500"></canvas>
  <canvas class="gridLayer" id="controlLayer" width="800" height="500"></canvas>
and here is what the styles for these CSS classes look like:
#windowContainer {
  position: relative;
  width : 500px;
  height: 300px;
  overflow: auto;
.gridLayer {
  position: absolute;
  top: 0px;
  left: 0px;
There are two keys to making this work. First the surrounding container (windowContainer) has smaller dimensions (500x300) than the canvas (800x500). The second is the style overflow: auto which is applied to the windowContainer. This makes the scroll bars appear since the contained content (the canvases) are larger than the windowContainer.

You may have noticed that I applied the width and height attributes directly to the canvas object rather than to the style. The reason is that the width and height define the logical dimensions of the canvas, which I use throughout the code, while the style sheet defines the size it appears on the screen. Since I want those two to be the same, I define the width/height on each tag rather than on the CSS style.

JavaScript Scrolling
To make sure that a moving icon is always in view, I created a WindowScroller object. This object takes an icon that is to be kept in view and the window with the scroll bars. I want the icon to be partially centered, so this object also has a border which is how far the icon should be kept from the edge of the window. The WindowScroller object is responsible for scrolling the window appropriately anytime the icon moves.

The challenge I had was how to detect when the icon has moved. Using my Java instincts, I was going to have an IconMoveListener of which the WindowScroller would be one, and each Icon would have a set of these listeners that have subscribed to watch for move events. There are two problems with this approach. The first is that it is more complicated than what I need. The second is that it is buggy. If you'll recall, I had made the decision to decouple the logical moving of the icon from the drawing of the icon. If I scroll the window when the icon is moved, the icon will appear to move again when it is redrawn, and this causes for very jittery movement. To solve this problem I started to come up with more complicated approaches, but then a simple solution occurred to me.

Rewrite Draw
I really don't care when an Icon is logically moved, all I care about is that anytime it is drawn, it should be visible. To accomplish this I overwrote the icon's draw method with one that scrolls. Here's what it looks like:
  1 function WindowScroller(icon, window, border) {
  2   var origDraw = icon.draw; 
  3   icon.draw = function() { 
  4     var left = window.scrollLeft(); 
  5     left = Math.min(left, icon.realPoint.x - border); 
  6     left = Math.max(left, icon.realPoint.x - window.width() + icon.iconLayer.grid.size + border); 
  8     var top = window.scrollTop(); 
  9     top = Math.min(top, icon.realPoint.y - border); 
 10     top = Math.max(top, icon.realPoint.y - window.height() + icon.iconLayer.grid.size + border); 
 12     window.scrollLeft(left); 
 13     window.scrollTop(top); 
 15   } 
 16 }

On line 2 I save the original draw method.  Line 3 saves the new draw method. On line 5 I make sure that the left point of the window is far enough left to include the icon and the border. On line 6 I make sure that the left point of the window is also far enough right to include the icon and the border. Lines 8-10 make the same calculation but with the top and bottom of the window. Lines 12 and 13 actually scroll the window and line 14 calls the original draw method.

The scrollLeft and scrollTop methods that are called on lines 12 and 13 are jQuery functions. I use jQuery for this to increase the cross-browser compatibility.

Why a Class With No Methods?
One thing that strikes me as funny is that I've created a class that just has a single method - its constructor. This smells like it should be a method in another class rather than its own class. I originally had it in its own class because I assumed I would need other helper methods. I've left it in its own class because I don't know where to put it. It doesn't belong in the Icon class because the Icon really shouldn't know about the window the canvas is in or things like that. None of the other classes feel appropriate either, since it doesn't use any other class. Hence the decision to leave it in its own class. Maybe at a future date either more functionality will go in this class or a better class to fold this in to will present itself.

Well, without further ado, here is the demo. I suggest scrolling to the right and clicking on an empty grid square and seeing what happens.


Monday, November 15, 2010

Setters and Getters Are Evil

There are a lot of things I like about Java, but one thing I dislike is the way it pushed Getters and Setters. With Beans and tools that use them like GUI builders, Java really encourages one to write getters and setters. This seems to have resulted in the practice of providing getters and setters by default, even to the point where IDEs will auto generate them for you.

This is BAD! It promotes lazy object oriented coding. Rather than thinking about what interface should be presented to the world, it is too easy just to provide access to all of the internal fields via getters and setters. Who needs Data Hiding anyway? When you are in the habit of writing getters for all your fields, it is unlikely that you are actually decoupling your components.

Even worse than the getters are the setters. When you have setters it is hard to enforce class invariants. Worse, you lose all the advantages of  making an immutable class. Often times setters are created out of laziness as well. Have you ever had a class that had a lot of properties that needed to be set, but then was never changed? This seems to be a common occurrence. (Almost) no one likes to create constructors with a lot of arguments, and sometimes it is just easier and cleaner to write the code with setters. You tell yourself that you'll never use the setters except at construction time, but what if you forget or another developer joins the project? It's easy to just create setters, but it is better to avoid it, potentially by creating a special purpose configuration class.

¡Viva la RevoluciĆ³n!
I realize that this is a lost cause because too many people and tools rely on setters and getters, but I'll fight it anyway. Next time you create a setter or a getter please do me (and anyone who might have to maintain your code) a favor, and consider very carefully whether it is really a good idea.

Monday, November 8, 2010

Handling Diagonal Lines

In a previous post I added support for having drawn horizontal and vertical lines act as walls that my moving icons can't go through. This post talks about how I added support for diagonal lines.

The basic approach is to determine which cells a line goes through and then disconnect that cell from all its neighbors, making it impossible to move to those cells.

Basic Algorithm

A diagonal line starts at one lattice point and ends at another, passing through cells along the way. The algorithm I came up with was just to "walk" the line. This looks like:
  1. Figure out starting cell (x0, y0)
  2. For each x coordinate that the line goes through:
    1. Calculate the y coordinate for that x coordinate (x1, y1)
    2. Mark every cell (x0, y0) -> (x0, y1)
    3. Set x0 = x1 and y0 = y1.
  3. You are done.
The one caveat is handling when the line actually goes through a lattice point. In that case, you need to make sure that you only mark the two cells on the diagonal, and not three or four. (the algorithm as above with end up marking threes of them without a special case). You do need to disconnect the two non-marked cells since they no longer connect through the diagonal. The code that actually does all of this looks like:

  1 Map.prototype.breakDiagonalLine = function(line) {
  2   var p1 = line.p1; 
  3   var dy = line.slope.rise / Math.abs(line.slope.rise);
  4   var yOffset = (dy > 0) ? 0 : -1; 
  5   while (!p1.eq(line.p2)) { 
  6     var cellPoint = new Point(p1.x, p1.y + yOffset);
  7     this.breakPoint(p1); 
  8     for (var tx = 1; tx <=; tx++) {
  9       var nextY = Math.floor(p1.y + tx*line.slope.rise/;
 10       if (tx == nextY -= (1 + yOffset); 
 11       while (nextY != cellPoint.y) { 
 12         this.breakWholeCell(cellPoint); 
 13         cellPoint =, dy);
 14       } 
 15       this.breakWholeCell(cellPoint); 
 16       cellPoint =, 0);
 17     } 
 18     p1 =, line.slope.rise); 
 19   } 
 20   this.breakPoint(line.p2); 
 21 }

Because of how I calculate slope, the run is guaranteed to be positive. Line 3 calculates the signum of the rise which is used for traversing the cells on line 13.  Line 4 calculates the y offset of the starting cell, relative to the starting point.  i.e. going diagonally upwards starts with a different cell than going diagonally downwards.  The loop on lines 5 - 19 handle the each segment from lattice point to lattice point along the line, iterating once for each lattice point that is reached.

Line 6 determines the first cell that is to broken, using the offset calculated on line 4.  Line 6 breaks apart the cells that are diagonally touching at this point.  The loop on lines 8 to 17 iterate over every x coordinate until the next lattice point.  Line 9 calculates the y coordinate for the given x coordinate (step 2.1 from above).  Line 10 handles the special case of the new x,y coordinates being a lattice point.  The loop one lines 11-14 mark the appropriate cells (step 2.2 from above). Line 15 ensures that we've marked the last cell needed and line 16 increments the x coordinate (step 2.3 from above).

Line 18 sets p1 to the next lattice point so it is ready for the next iteration through the loop. Line 20 breaks apart diagonally touching cells at the final point along the line. Which means we are ready for the demo.


As before, click an icon, click a destination, and watch the icons avoid the lines.

Monday, November 1, 2010

Intentional Ignorance

I was recently looking at a quiz that purports to test your Java knowledge. One of the questions required determining which overloaded method was going to be called in a specific code sample. (sorry, I don't have a link to the exam or question. I got the question wrong and this post is my justification for why it is a bad question (aka sour grapes).

Risk of Too Much Knowledge
Imagine I was intimately familiar with the Java Language Specification and knew exactly how method resolution would happen. Further, imagine I have some code that looks like:
  1 public class Test {
  3   public static void doSomething(Object o) {
  4     System.out.println("Hello"); 
  5   } 
  7   public static void doSomething(int... x) {
  8     System.out.println("Goodbye"); 
  9   } 
 11   public static void main(String[] argv) {
 12     doSomething(5); 
 13   } 
 14 }
If you run this code what will be printed?  Do you know? Is it reasonable to assume that every developer on the team will know what this code does just by looking? And that they will be able to understand this without a lot of thought?  i.e. is this readable and good code?

Of course its not good code. If you are overloading a method name, it should be because the same logical action happens for each method. (e.g. they both say "Hello") If this is the case, then you don't care that the method call on line 12 resolves to two different methods - in both cases the same behavior happens. If you have two methods with the same name that do two different things, please, please, please rename one of them.

But what if I was such an expert in this thing, that it was just second nature to me that varargs takes lower priority than non varargs methods? Would I notice the possible confusion in the code? I don't think twice about writing code that looks like x = 5 + 2 * 10 of course this evaluates to 25. Order of operations is just that ingrained in me. Given that this is a concept that exists in math and not just computer science, I don't think this is unreasonable. However, I don't want that attitude to happen in this situation.  By testing for this sort of knowledge I fear that some impressionable developers out there might think that everyone should know this and therefore write code that depends on it. i.e. this test question, as it is written, may be doing a disservice to developers everywhere.

Lesson That Should Be Taught
The important thing that all Java developers should know is that code like the above is legal, and that there is a defined way to determine which method is called. You can look up the specific behavior if you need to know, but just knowing to look it up is half the battle.

However, instead of looking up the correct behavior, or even documenting what is happening, the code should be rewritten (probably by renaming methods if they are logically different) so that it is obvious what is supposed to happen. This is the real lesson that should be taught - don't do this!

So Why Intentional Ignorance?
It would seem that the best scenario is to both know what will happen with the code, but be aware enough not to write code that requires this knowledge.  So am I just being lazy by not wanting to know this?  Maybe, but I'd like to leave you with this passage where Dr. Watson is describing Sherlock Holmes from "A Study in Scarlet" that has always stayed with me.

His ignorance was as remarkable as his knowledge. Of contemporary literature, philosophy and politics he appeared to know next to nothing. Upon my quoting Thomas Carlyle, he inquired in the naivest way who he might be and what he had done. My surprise reached a climax, however, when I found incidentally that he was ignorant of the Copernican Theory and of the composition of the Solar System. That any civilized human being in this nineteenth century should not be aware that the earth travelled round the sun appeared to me to be such an extraordinary fact that I could hardly realize it. 
    "You appear to be astonished," he said, smiling at my expression of surprise. "Now that I do know it I shall do my best to forget it."
    "To forget it!"
    "You see," he explained, I consider that a man's brain originally is like a little empty attic, and you have to stock it with such furniture as you choose. A fool takes in all the lumber of every sort that he comes across, so that the knowledge which might be useful to him gets crowded out, or at best is jumbled up with a lot of other things, so that he has a difficulty in laying his hands upon it. Now the skillful workman is very careful indeed as to what he takes into his brain-attic. He will have nothing but the tools which may help him in doing his work, but of these he has a large assortment, and all in the most perfect order. It is a mistake to think that that little room has elastic walls and can distend to any extent. Depend upon it there comes a time when for every addition of knowledge you forget something that you knew before. It is of the highest importance, therefore, not to have useless facts elbowing out the useful ones."
    "But the Solar System!" I protested.
    "What the deuce is it to me?" he interrupted impatiently: "you say that we go round the sun. If we went round the moon it would not make a pennyworth of difference to me or to my work."