Monday, April 25, 2011

Programming Contest Season

It is the time of year for programing contests. Google Code Jam has its qualifying round on May 6th and the TopCoder Open starts the qualifiers for their algorithm contest on May 14th. Assuming nothing unexpected comes up, I intend to participate in both of these contests again. If I am to have any hope of doing well in either of the contests I will have to prepare.

There are 3 main skills you need to have to able to do well in these type of contests.

Problem Solving
These contests are not software development contests. You don't need to be able to build an application better than other people, or even at all. What you need to do is be able to solve puzzles. Sometimes this is just a simple matter of following the instructions in the problem statement. More often it is a matter of realizing that if you look at the problem the right way, you can map it to another problem (e.g. graph traversal, binary search, etc.) that you already know how to solve. And occasionally you'll have to come up with a completely novel solution on the spot.

There are a few things you can do to get better at this. The first is to learn the known solutions to common computer science problems. It doesn't do you any good to recognize that a problem requires finding the shortest path between two nodes in a graph, if you don't know an algorithm that accomplishes that. The second is to practice recognizing places where these algorithms apply. Practice solving problems helps with this. Seeing how other people solved the same problem can also help.

The more knowledge you have of existing algorithms and the more experience you have at recognizing the use of these algorithms, the less often you'll come across a truly novel problem. And even when you do, you'll be better prepared to find a solution.

Programming
The problems posed in these contests don't require a lot of software engineering. It is rare for a solution to run as long as 100 lines, even in a relatively verbose language like Java. However, it is still a key skill. If you can't convert an algorithm into code accurately and quickly, you won't last long in a contest. When learning algorithms, don't just read about them in a book or online, code them up. And then code them again a different way. That experience will be helpful when coding under the gun.

Besides being able to code algorithms, you need to know your language. Do you know how to format strings, parse strings, and round numbers? Do you know the collection libraries provided and can you use them without having to look up API documentation? Do you know the performance characteristics of various features like autoboxing or string manipulation? How long will it take you to execute an n4 algorithm when n = 50? Does it matter what calculations you perform in the inner loop? Can you create an array of length one million? Ten million? One hundred million? How big of a value can you store in an integer? a long? What happens on overflow? Are you familiar with the precision issues that can crop up when using floating point numbers?

If you don't know the answers to the above questions, learn them. Not just from a book, but by writing programs. Knowing what is feasible and what isn't can help you find a workable algorithm for solving a problem.

Speed
These contests are timed. It doesn't do you any good if you can solve a problem, but it takes 6 hours. The only way that I know to improve speed is to practice, practice, practice. There are a couple of hints I can give you, though.

The first is that bugs are the enemy. Nothing kills your time more than having to debug a program. There are things you can do to reduce bugs. One is to be consistant. Be consistant with how you name variables and types, where and when you use curly braces and how you write algorithms. Another is to use as much structure as you need. Even though it is quicker to type one letter variable names and to create one long function, if meaningful variable names and separate functions and/or classes helps structure your code in a way that you can understand it, the time lost in typing will be more than made up in debugging.

Take your time.The last, but most important piece of advice in being quick is to take your time. Plan your solution. If it is complicated, sketch out your solution on paper and pencil before coding. Spend a couple minutes analyzing the performance characteristics of your solution. A couple minutes of thinking can save 20 wasted minutes of programming a flawed solution. Taking a couple extra minutes coding up a solution carefully and cleanly can save hours of debugging time. (and the contests don't last hours, so that is critical).

Out Coding
Well, I guess I should stop procrastinating and get to practicing so that I have a hope of advancing in these contests.

Monday, April 18, 2011

attr_accessible and Security in Rails

This is a story of how easy it can be to inadvertently put security flaws in your programs.

Given that I had implemented authentication and authorization, I wanted a place where users could edit their profile information. However, I wanted to always keep the user's original name and email address that was provided by OpenID authentication. So my thought was that I would provide the user with a "preferred email address" and a "display name" which I would pre-populate based on what OpenID provided, but which they could change to anything they wanted. I'd still keep the original information, but just not ever show it to them.

Step One - New Database Columns
This was just a matter of creating a new migration:
class AddDisplayNameToUser < ActiveRecord::Migration
  def self.up
    add_column :users, :display_name, :string
    add_column :users, :preferred_email, :string
  end

  def self.down
    remove_column :users, :display_name
    remove_column :users, :preferred_email
  end
end
and running

rake db:migrate

Step 2 - Create Default Values in Model
Pre-populating these new columns was easy, all I needed was to use the "after_initialize" hook in ActiveRecord. To do this, I added the following lines to my User model:
after_initialize :init

def init
  self.display_name ||= "#{self.last_name}, #{self.first_name}"
  self.preferred_email ||= self.email
end

Step 3 - Create Web Pages
I created a Profile resource with the following line in routes.rb
resource :profile, :only=>[:show, :edit, :update]
In the "show" action, I show the user their display name and preferred email. The edit page allows them to edit these two pieces of information. The HAML for this is:
%h1 Edit Your Profile
= form_for @user, :url=>profile_path do |f|
  = f.label :display_name
  = f.text_field :display_name, :size=>50
  %br
  = f.label :preferred_email, 'Preferred email address'
  = f.text_field :preferred_email, :size=>50
  %br
  = f.submit 'Update Profile'
The update method in the controller, which gets called when the form is submitted looks like:
def update
  current_user.update_attributes params[:user]
  redirect_to profile_path
end
And with that, I have a page where a user can update their own information. I'm done, right?

Security Flaw
Do you see the security flaw?

The problem is one that frequently comes up in client/server settings, particularly on the web. The server is putting too much trust in the client. It's all well and good to restrict the user's edit page to just the Display Name and Preferred Email fields, but a wily user will just bypass the web browser and directly make an HTTP request with parameters set to change fields that we don't want changed, like the :identifier_url.

So how do we fix this? Well the obvious way is to replace the line
current_user.update_attributes params[:user]
in the update method with code that explicitly just sets the fields that we care about. But then we lose the convenience of letting this Rails library method do the work for us. Luckily there's a solution, and I am sure the title of this post has not given you any hint to that solution. If you add the line:
attr_accessible :display_name, :preferred_email
to your User model class, you are telling Rails to allow those fields to be updatable via a mass assign command like update_attributes. More importantly, as soon as you've declared at least one field as attr_accessible, then all of the non-declared fields can't be update this way.

Take Home Message
Never Trust the ClientI am sure you know this already, but it bears repeating: Never Trust the Client. There is nothing stopping a malicious user from writing their own client and sending your server any data they want. Unfortunately, this requires eternal vigilance and learning. Sometimes holes, like the one above, exist not because of code you wrote, but because of the code you didn't write. If I hadn't read about this exact situation in Michael Hartl's Ruby on Rails 3 Tutorial, I might never have even realized the security hole I had created.

Monday, April 11, 2011

New Languages Can Broaden Your Horizons

"A language that doesn't affect the way you think about programming is not worth knowing." -Alan PerlisIn linguistics there is theory called the Sapir–Whorf hypothesis which asserts that our language limits and defines our cognitive processes. In the computer realm Alan Perlis once said that, "A language that doesn't affect the way you think about programming is not worth knowing."  I don't know how true these statements are, but I do know that just about every programming language I have learned has taught me new ways to think about software development. In many cases I wasn't even aware of what I didn't know until the new language opened my eyes.

Here is an overview of languages that have changed my way of thinking about programming, in the order that I used them. I realize that some of the concepts that I learned in one language could've been learned in another, but this is the order I personally learned things. Sometimes just the act of doing something new can show you things that you could have done with the old tools, but hadn't considered in the past.

Languages

BASIC (in particular, Commodore 64 BASIC) - my first programming language. While much of what I learned at this time I later had to unlearn, learning that computers aren't magical but just do what the programmer tells them has had lifelong value.

Assembly/Machine Language - I learned what the computer actually did - i.e. what does high level code get turned into? What instructions can the CPU actually perform?

Pascal - with Pascal (well, technically Karel the Robot which we did for a month, first) I was introduced to the concepts of functions and procedures, as well as how to use looping constructs effectively. I also learned about pointers and data structures in Pascal. (i.e. the basics of structured programming as we know it). I was also first exposed to separating interfaces from implementations with Pascal (Turbo Pascal's Units), but it didn't really take.

C - C was a combination of Pascal and Machine Language for me. I still could use all of the structured programming concepts that I knew, but with C you could see how it tied to machine code. e.g. pointers are just memory addresses. They say that C is "A language which combines the flexibility of assembly language with the power of assembly language." For me this was its beauty. While not being quite at the Assembly level, I could understand exactly what my programs were doing. I also learned about the difference between compile time and link time, which is critical to understanding how to build large systems.

C++ - This was my first real exposure to OO. However, it didn't really click. I liked the encapsulation aspect of grouping methods with data in classes, but at this point in my programming life I still didn't "get" OO. What I mainly got from C++ was a way to structure my C code.

LISP - besides having Lots of Irritating Stupid Parenthesis, it introduced me to functional programming. I learned it is possible to program with variables that don't vary and without iterative constructs (Lisp has looping constructs, we just didn't use them till later on in the class). Very powerful ideas, though I have to admit that I didn't use LISP for long (even though I thought it was really cool), so I probably didn't absorb as much from it as I could've from the language.

Java - with Java I finally got OO. (or at least I think I get it now.  :) ). I think the key feature that helped me was the concept of an interface. Why would you want to create a "class" that doesn't do anything? It was the struggle to answer that question that turned on the light bulb in my head for getting OO. The other cool idea Java introduced me to was the idea of namespaces. While C++ was starting to get them at that time, I had never used them, and had been burned by class name collisions. Namespaces are a good idea for large programs, and a necessary one for large scale code reuse.

When Java 5 came out and annotations came into the language, this was another feature that didn't seem to have any value at first. I have since seen that there can be pretty large value in declarative programming though I don't think I have fully internalized how to use the power of it yet. (i.e. I can see the power of this idea, but it hasn't actually transformed me yet).

C# - when I first learned C#, it didn't change my way of thinking. It was just Java with slightly different syntax. But then I got a chance to play around with LINQ. There are two powerful ideas that I got out of LINQ. The first is directly using query language notation inside a "normal" programming language. The second is that of late evaluation. i.e. if you had something like
DataContext context = new DataContext(<db stuff>); 
Table<Person> people = context.getTable<Person>(); 
var query = people.Where(p => (p.age >= 20 && p.age <= 25))
                  .OrderBy(p => p.name).Select(p => p); 
foreach (var person in query) {
  Console.WriteLine("{0} is {1} years old", person.name, person.age) 
}
I would've naively implemented the getTable method to return the entire datatable, Where to return the reduced datatable, OrderBy would sort, and then Select would return an array with the values. This is obviously much less efficient then just letting the database do the query for you. One of the clever thing that the LINQ folks did is they accomplished this. Where, OrderBy and Select all return objects describing their actions, not the results of their actions. Its not until you try enumerating the elements in the query in the foreach loop that any data is actually retrieved or manipulated. At that point the code is able to take advantage of everything you specified to act efficiently. Compare that to the Unix command line:
find . -name "*" -print | grep "taxes" | sort -r
I am used to this type of environment where find does a calculation and passes data to grep which then passes data to sort. The idea of just passing metadata around and then doing the calculation at the end requires greater coordination between components, but when you have that the result is improved efficiency.

Ruby - with Ruby I am learning that the static typing I learned to love in Java might not be as important as I believed. Duck typing is a great idea. Even cooler are ruby blocks, i.e. closures. Being able to pass blocks of code around is HUGE and definitely a transformative idea. Prior to learning Ruby, I would occasionally run into situations where I was frustrated by Java's lack of function pointers. But now! Now I have a hard time writing code in Java, because I keep wanting to solve my problems using blocks.

Rails - while Rails is just a framework, and not truly a language, I feel it deserves its own entry. Rails leverages Ruby in such a way that it feels like you are using a domain specific language. Cool ideas I learned from Rails are convention over configuration and how to actually use the MVC pattern.

Javascript - from Javascript I learned that OO can be done in multiples ways. OO in Javascript is nothing like OO in C++/Java/C#. In fact, everything is just an associative array. The power of what you can do with such a simple idea, in conjunction with closures, is eye opening.

Conclusion
I am not a language expert, and as such there are many important features of the above languages that I omitted. The point of this post isn't what features are in what language. Rather it is a personal history showing how new languages expanded my horizons and abilities as a software developer.

Have you learned a new language lately?

Monday, April 4, 2011

Role Based Authorization in Rails

Previously, I created an OpenID based login system. Now I want to create a role based authorization system. What do I mean by role based authorization? Well, in many systems you have actions that you want to protect so that only admins can perform them. Obviously, we could implement this by adding an "admin" flag to our User model. However, sometimes, we have more than just two classes of users. For example, if you have a website with comments, you might want to have a class of users who aren't full admins, but do have authority to filter comments for spam.

A simple way to accomplish all of this is to have Role model and then let each user have 0 or more roles associated with them. Here is how you can do this, assuming that you've already got an authentication system as described in the previous post.

Data Model
First, create database tables, which will just consist of a Role table with a name column, and then a mapping table to map roles to users.

rails generate model Role name:string
rails generate model RolesUser role_id:integer user_id:integer
rake db:migrate

Then tell the User model about the roles, by adding the following line to the User class (app/models/user.rb):
has_and_belongs_to_many :roles, uniq=>true
Role Views
Now that we have a Role model, we need a way to add and delete roles. Since this is functionality that we'll want to reserve only to top level admins, let's put this functionality in an admin directory.

rails generate controller Admin::Roles index

This generated the controller and view, but not the right routes. So delete the

get "roles/index"

line that was added to routes.rb file and replace it with:
namespace :admin do
  resources :roles, :only=>[:index, :create, :destroy]
end
This allows a showing of roles and the RESTful creation and destruction of roles. Since roles are just a name, we don't really need the separate show, edit, and update paths.

The controller (at app/controllers/admin/roles_controller.rb) looks like:
class Admin::RolesController < ApplicationController

  def index
    @roles = Role.all
    @new_role = Role.new
  end

  def create
    Role.create params[:role]
    redisplay_roles
  end

  def destroy
    Role.find(params[:id]).destroy
    redisplay_roles
  end

  private

  def redisplay_roles
    redirect_to admin_roles_path
  end
end
The method redisplay_roles is a placeholder for now. A little bit further down we'll modify this to support adding of roles via Ajax calls.

So now we have to create the HTML, app/views/admin/roles/index.html.haml
%h1 Roles
= render 'role_list'
%br
= form_for [:admin, @new_role] do |f|
  = f.label :name, 'Name of new role:'
  = f.text_field :name
  = f.submit
There are two things of note here. On line 4, the array [:admin, @new_role] results in the URL for creating a Role object in the admin namespace, which is what we want. On line 2, we render the partial 'role_list' (app/views/admin/roles/_role_list.html.haml), which is shown below. The reason it is a partial is to make it easier to Ajaxify everything.
#RoleList
  %h2 There #{is_pluralize(@roles.length, 'role')}
  %ul
    - @roles.each do |role|
      %li
        %b= role.name
        [#{link_to 'x', [:admin, role], :method=>:delete, :confirm=>'You sure?'}]
is_pluralize on line 2 is a helper function I wrote (in application_helper.rb) that uses the appropriate form of to be along with the pluralized noun. It looks like:
def is_pluralize(count, noun)
  verb = (count == 1) ? "is" : "are"
  "#{verb} #{pluralize(count, noun)}"
end
The rest of the partial is just looping through the roles and showing them. On line 7, a link to the destroy action is created, by specifying the method as 'delete'.

AJAX
The above is all you need to be able to add and delete roles. Since we are redirecting back to the index page after each add or delete, I thought it would be nice to make those Ajax actions, and just stay on the page. Turns out this is pretty easy.

As a first step, if you haven't already, install jQuery. (yes you could use prototype or many other javascript libraries, but jQuery is what I am using, so it's what I'll describe.) To do this, put the line

gem 'jquery-rails'

in your Gemfile and then run the commands:

bundle install
rails g jquery:install

Now that jQuery is installed, modify the add/remove calls to be AJAX calls by adding the parameter ":remote=>true" to the link_to call on line 7 of _role_list.html.haml and to the form_for call on line 4 of index.html.haml.

Now we modify redisplay_roles to look like
def redisplay_roles
  respond_to do |format|
    format.html { redirect_to admin_roles_path }
    format.js {
      @roles = Role.all
      render :redisplay_roles
    }
  end
end
Line 3 handles the case if the method is called in a non-Ajax way. Line 5 ensures that @roles variable is set. Line 6 is where the magic happens. Rather than returning html like normal, we will return javascript, so we create a view file app/views/admin/roles/redisplay_roles.js.haml. This javascript HAML file just has one line:
$('#RoleList').replaceWith("#{escape_javascript(render :partial=>'role_list')}")
$('#RoleList') is the jQuery selector to choose the div we are modifying and replaceWith is a jQuery function which will replace the contents of that div with the argument that is passed in. To create that argument, we just render the partial 'role_list' again.

With just those simple changes, now the adding and removing of roles is done via Ajax calls.

User Views
Now that we can create roles, we need to be able to add roles to users. This will be very similar to what we did with Roles above. The command:

rails generate controller Admin::Users index show

generates our controllers and views. Again, we will delete what got added to routes.rb and replace it with:
namespace :admin do
  resources :users do
    member do
      post :add_role
      delete :delete_role
    end
  end
  resources :roles, :only=>[:index, :create, :destroy]
end
This includes the route for Roles from above.  While we aren't using the edit and update routes, we will leave them in for future enhancements. We are also adding the routes add_role and delete_role which will do what you'd expect. The users_controller looks like:
class Admin::UsersController < ApplicationController
  before_filter :load_user, :except=>[:index]

  def index
    @users = User.all
  end

  def show
  end

  def add_role
    role = Role.find_by_name params[:role]
    @user.roles.push role if role
    redisplay_roles
  end

  def delete_role
    @user.roles.delete(Role.find params[:role])
    redisplay_roles
  end

  private

  def load_user
    @user = User.find params[:id]
  end

  def redisplay_roles
    respond_to do |format|
      format.html { redirect_to [:admin, @user] }
      format.js { render :redisplay_roles }
    end
  end
end
The index.html.haml just shows a list of all the users with links to show them. 
%h1 Users
%h2 There #{is_pluralize(@users.length, 'user')}
%ul
  - @users.each do |user|
    %li= link_to user.email, [:admin, user]
show.html.haml looks like:
%h1 User
%b Id:
= @user.identifier_url
%br
%b Name:
#{@user.last_name}, #{@user.first_name}
%br
%b Email:
= @user.email
%hr
%h2 Roles
= render :partial=>'role_list'
#addRole
  = form_tag [:add_role, :admin, @user], :remote=>true do
    = label_tag :role, 'Add Role:'
    = text_field_tag :role
    = submit_tag 'Add'
%hr
= link_to 'Return to User List', [:admin, User]
Just like above, we put the roles listing in a partial. _role_list.html.haml looks very similar to the one in the roles view:
#RoleList
  This user has #{pluralize @user.roles.length, 'role'}
  %ul
  - @user.roles.each do |role|
    %li
      = role.name
      [#{link_to 'x', delete_role_admin_user_path(@user, :role=>role.id), :method=>:delete, :confirm=>'Are you sure?', :remote=>true}]
which means that redisplay_roles.js.haml is identical to the Roles version of this. I suppose I could pull this into the shared directory and let it be shared, but I am not convinced that it will always remain identical, so I'll leave the duplication for now.

Authorization
Now that we can create roles, and add users to roles we need a way to use this information. First we'll add a method to User to determine if a user has any of a given role or roles. I would like this to work if the passed in roles are Role objects or strings or symbols defining the role name. To do this I overrode to_s in Role
def to_s
  self.name
end
and added the following methods to User
def has_role?(*role_names)
    self.roles.index {|role| includes_role role_names, role}
  end

  def includes_role(role_list, role)
    role_list.index {|r| r.to_s == role.name}
  end
As you can see, has_role? takes a variable number of roles and returns true if the user has any of those roles. Now that we can make this determination, we will create a method called ensure_role. We could place it in the controller, but since we will be using it from multiple controllers we will be placing it in the authentication_helper.rb that was created in the previous post. It looks like:
def ensure_role(*role_names)
    if signed_in?
      role_names.push 'admin'
      unless current_user.has_role? *role_names
        flash[:error] = 'You do not have permission to view this page'
        redirect_to actions_index_path
      end
    else
      ensure_signed_in
    end
  end
On line 3, we push the role 'admin' onto the list of roles. We do this, because we want the 'admin' role to have all permissions and this way we don't have to explicitly list it every time we are ensuring a role. We add the line
before_filter { ensure_role 'admin' }
to our controllers in the admin directory to ensure that only users with the admin role can access these pages. Technically we could leave off the 'admin' argument, since it is implicitly added, but I think it makes the controller much more readable to have it explicitly there in the case where 'admin' is the only role allowed.

Rake Task
Here's a brief quiz. Do you see the problem that we've created?

We now have a chicken and egg problem. You can't create an admin role or assign it to yourself unless you already have it. The way we'll solve this is by creating rake tasks that you can run at the command line to create roles and assign them to users. All the details of rake are beyond the scope of this blog post which has gone on for way to long already, so I'll just tell you what I did. I created a file lib/tasks/roles.rake which looks like:
namespace :roles do
  desc 'Creates a role'
  task :create, [:role_name] => :environment do |cmd, args|
    Role.create :name=>args[:role_name]
    puts "Created role #{args[:role_name]}"
  end

  desc 'Add User to Role'
  task :add_user, [:email, :role_name] => :environment do |cmd, args|
    user = User.find_by_email args[:email]
    role = Role.find_by_name args[:role_name]
    unless user
      puts "No such user #{args[:email]}"
      return
    end
    unless role
      puts "No such role #{args[:role_name]}"
      return
    end
    user.roles.push role
    puts "added #{role.name} to #{user.last_name}, #{user.first_name}"
  end
end
Now you can run the commands:

rake roles:create[admin]
rake roles:create[<your email address>,admin]

to create an admin role and assign your user to it.

Conclusion
Wow, this post went on a lot longer than I meant it to. I guess there was a lot more ground to cover than I realized. I'll try to keep my tutorial based posts shorter in the future...