Monday, May 9, 2011

Programming Contests and Languages

This past weekend was the qualifying round for the Google Code Jam. Google allows you to use any language you would like in this contest. This year I have decided to use Ruby as my language of choice. In the past I have always used Java, as that is the language that I currently know the best. However, I think Ruby will allow me to write cleaner, more compact code which will let me code faster and hopefully write fewer bugs.

Simplicity
Why do I think this? Let me give you a simple example. A common task in these contests is to be able to read in a line of space separated integers and convert it into an array for later processing. Here is how I accomplish this task in Java:
  1 BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
  2 String[] parts = reader.readLine().split(" "); 
  3 int[] array = new int[parts.length]; 
  4 for (int i = 0; i < array.length; i++) {
  5   array[i] = Integer.parseInt(parts[i]); 
  6 }
Here is how I do the same thing in Ruby:
array = gets.split.map { |i| i.to_i}

Bugs
Imagine I now want to loop through the array and determine how many array elements are the same as their position. I.e. if I have the array [0, 2, 1, 3, 5, 4], elements 0 and 3 are the same as their index and so I'd want to get a value of 2. Here is Java code that accomplishes this task:
  1 int count = 0;
  2 for (int i = 0; i < array.length; i++) {
  3   if (array[i] == i) count++; 
  4 }
Here's Ruby code that does the same thing:
count = 0
array.each_index {|i| count += 1 if array[i] == i }
Now these two code samples are almost identical. Some people would argue that its just a matter of style which you prefer. However, as someone who has written countless loops like this under the time pressure of programming contests I can tell you there is one crucial difference. In the Java code I am explicitly incrementing the index variable i in the for loop and checking it against the length of the array. In Ruby, the method each_index does this for me. Why does this matter? Well, imagine that you have nested loops. Well, it can be really easy to accidentally write your inner loop like:
for (int j = 0; j < array2.length; i++)
and now you suddenly have an infinite loop. More insidious, you could write:
int[][] map = new int[ROWS][COLS]; 
for (int r = 0; r < map.length; r++) 
   for (int c = 0; c < map.length; c++)
     array[r][c] = r + c;
when you meant to write the column loop:
for (int c = 0; c < map[r].length; c++)
and now you have code that works whenever ROWS == COLS, but not otherwise.

Having had many conversations with other competitors at contests, I can tell you that I am not the only one who has wasted large amounts of time tracking down dumb bugs just like these. Nor am I the only one who has unknowingly submitted a buggy solution because I only tested on square maps, and so failed a problem because of such a silly bug. My hope is that by using Ruby I will spend less time dealing with bugs like this, allowing me more time to actually solve the problem at hand.

Performance
Here's the catch. The dynamic aspect of Ruby comes at a cost. Many problems in programming contests like the Google Code Jam are very computationally intensive and there is always a time limit for how long your code can run. In fact GCJ's FAQ warns that Python's slowness may cause problems for you on some problems. My assumption is that Ruby has similar performance characteristics to Python, so I decided to do some simple comparison's between Ruby and Java performance on some programming contest type tasks.

Test 1
As my first test, I came up with a solution for GCJ's 2008 Round 1B's Problem C. Mousetrap. My Ruby solution takes about 40 seconds to correctly process Google's large input set. The same algorithm in Java takes about 4 seconds to process the same data set.

Test 2
My second test was based on a suggestion of a friend who was inspired by Problem D. GoroSort on this year's qualification round. What I did was write a program which creates every permutation of N integers, and then for each permutation counts up how many numbers are the same as their index. In addition to Java and Ruby, I dusted off my C and wrote a C implementation, as well, for comparison. Here are the results I get on my Macbook Pro. (all values are in seconds)



 N     Ruby      Java    
  6 0.015 0.20  0.003 
  7 0.028  0.21 0.004
  8 0.15 0.22 0.007
  9 1.3 0.24 0.021
 10  14 0.55 0.16
 11 160 3.9 2.4
 12 1,980  43 36

Conclusion
There is a definite and meaningful difference in performance between Ruby and the other languages for larger data sets. Still, for the majority of problems Ruby will be fast enough. I will just need to make sure I do a big-O analysis of my algorithms before I implement them, and if there is any doubt as to performance, write it in Java rather than Ruby. For any problems where speed doesn't look to be an issue, I think I will try to use Ruby. Hopefully this will work for me.

4 comments:

Anonymous said...

so how did the madking actually do in the competition? Did he make it to the 2nd round? I don't see that mentioned anywhere.
-Alan

Michael Haddox-Schatz said...

While the results aren't official yet, it looks like I have qualified in both the Google Code Jam (#1276 at http://code.google.com/codejam/contest/scoreboard?c=975485#sp=1261) and the TCO (room 32 at http://www.topcoder.com/stat?c=last_match&rd=14524&sm=31&em=40&nm=10&dn=1). Google's next round is this weekend, TopCoder's is in a month.

CodeInChaos said...

As a C# programmer I faced similar problems as you in Java. At least for the parsing problems I worked around by creating helper libraries beforehand that simplify splitting, converting arrays,...

Thanks to extension methods the result looked very similar to your ruby code:
`var arr=ReadLine().Split().ToInt();`

`each_index` looks nice. I need to add something similar. Those pesky index bugs have bitten me far too often. And debugging eats so much valuable time.

Michael Haddox-Schatz said...

I used to have a similar Java library string parsing for TopCoder, but scrapped it when TC added their unused code rule. I just learned to be proficient at those particular idioms. The java.util.Scanner class was helpful in this.

Its neat seeing how different languages lend themselves to contests. You see many C++ programmers have a set of standard macros they use on TopCoder. I like the idea of extension methods for this use. If Java adds them (I think there is talk now for Java 8), I'll have to consider it.