Monday, February 28, 2011

The Problem with Component Based Web Architectures

Imagine the following scenario:
  1. You browse to a web site and log in.
  2. After doing whatever you needed to do, you log out.
  3. You keep your browser tab (which is now sitting at a login page) open.
  4. The next day you return to this browser tab and enter your username and password.
Assuming that you enter your username and password correctly, I think it is a safe assumption that the behavior you would expect is for the site to log you in. Anything else could be considered a bug.

Why do I bring up such a simple scenario?  Because if the website was written using JSF and they just used out of the box settings (for example what you would get if you use seam-gen with a Seam application), the above scenario will result in an error!

Component Web Architecture
Both JSF and ASP.NET use a component based web architecture. What this means is that the developer's markup language, whether it is aspx, jsp, or facelets, doesn't directly define HTML. Rather it describes a tree of components. The HTML is generated after the component tree has been built and populated, either by asking the components to render themselves, or by having a separate renderer that traverses the component tree.

Component based architecture attempts to leverage the fact that almost all modern programming languages are object oriented. Components are just objects. By using a paradigm that most developers are familiar with the hope is that it'll be easy to use, maintain, extend, etc.

As you know, the web is stateless. The browser and web server interact by sending messages back and forth. However, the client (browser) and server are two completely disconnected entities. The conversation between them might not actually even be between two entities. A user might save a hyperlink from one page, and then follow the link from a different browser on an entirely different computer. Because of load balancing, or server restarts, the server that handles one message in a conversation isn't necessarily the same as the one that handled the previous message. And even if its the same server, the server might be in a very different state due to handling requests from other clients or other external changes like database state changes.

This presents a myriad of challenges to web developers. Web frameworks provide tools to overcome the above difficulties and still write useful applications. JSF and ASP.NET protect the developers from all of the ugliness through their use of components. Through clever use of cookies, hidden form variables, and session information, these frameworks provide the illusion that you have objects which persist from one page request to the next and that links, form fields, and buttons change the state and or perform actions on these objects.

The problem is that the long lived object model is a leaky abstraction over the reality of stateless messages being sent back and forth. My previous complaints about ASP.NET are caused by trying to paper over this leak.

Component Web Mismatch
To correctly process any form submission (for example login), the first thing the webserver has to do is recreate the exact same component tree structure that was used to generate the page. The appropriate components then get loaded with whatever form data data was submitted, and the component corresponding to the button that was clicked then can have its action called. The challenge is ensuring that the component tree structure that was used when the page was created is the EXACT SAME as the component structure on the form submission. For the simple cases, this is easy. But imagine that the component tree is based on a database query (e.g. listing of items on sale). It's possible that the database has changed between page load and form submission (e.g. items sold to another user). If you naively build the component tree the same way you did originally (e.g. by iterating over results from database), you will get a different component tree which can cause all types of wonky behavior. For example if the user tried to buy the 5th item on the list, they may end up purchasing the wrong item, since the 5th item they saw on the screen isn't the same as the 5th item currently returned by the database.

To solve this problem, when a form is created JSF saves the component tree so that it can be recreated when the form is submitted. This can be stored in a hidden field in the form sent to the client, but since this can be quite large it typically isn't done. By default, JSF saves this in a server session variable. However, if the session times out, this data is lost, and you can no longer recreate the state. This is the cause of the javax.faces.application.ViewExpiredException that happens in the scenario at the top of this post.

We need web frameworks that don't try to recreate the exact same state on two different requests. There are some very smart people who have (and continue to) work on JSF and ASP.NET. The fact that their solutions are so complex and still tend to drop people through the cracks (based on the questions out there on message boards) is indicative that the approach is flawed, rather than just the implementations. Frameworks like Ruby on Rails which ease the process of writing web apps without hiding the fact that each web request is its own independent action are the long term future.


fsilber said...

Is page time-out indeed such an issue? Suppose a user downloads a form and submits it six months later? You're going to have to have some sort of time-out, so the only issue is whether the component-oriented approach's need for server-side state forces you to time-out pages sooner than demanded by business considerations. Even if that's the case, it's still merely one of many considerations.

The conventional approach forces the programmer to view all form data as one big object that is global to the entire page. For simple applications that is fine, but for more complex applications your display logic cannot be designed for loose coupling and high cohesion. It's like being forced to program in COBOL.

Michael Haddox-Schatz said...

To address your first point, why do you need a timeout? If I have a login page open from 6 months ago, and I submit a username and password, why shouldn't it work?

I think it is wrong to give a user an error for a situation where the user did nothing wrong.

Michael Haddox-Schatz said...

As for your second point, I partially agree. It is important to be able to make up a page out of reusable parts. For some apps, that may even mean a single form may need to be in parts. However, I am not sure what you mean by a conventional approach, and I don't think I agree that you can't use modules/components with other frameworks.

It may be the case that people don't tend to break up their data as much with "conventional" frameworks. And certainly work can be done to make this easier. However, the solution is not to have a system where the server has to know its exact state at a previous time in order to be able to parse whatever input the user sends. That is too complex and fragile of a solution.