Monday, July 11, 2011

Attack of the Clones

As I mentioned, I was recently at ICSE. One of the big topics there was Clones. How do you detect them? How do you remove them? The talk that I found most interesting on the topic was one where they tried to measure how bad clones really are.

First, the definition for those of you aren't familiar with the term clone in the context of software engineering. (I wasn't before this conference). A clone is just what it sounds like, duplicated code. Clones are most often created via cut and paste.

The idea of removing duplicated code through common modules, whether they are libraries, classes, functions, procedures, or macros is an old one. Probably about as old as programming. In recent years, removing duplicated code has even got a hip acronym: DRY. The idea that clones are bad was basically a given at this conference. I have pretty much internalized that idea, so this didn't seem strange to me at all.

So how bad are they? Well, as I mentioned above, there was research that tried to answer this question. Researchers analyzed the version control history of a few different projects and tracked all of the clones in these projects. They then tried to evaluate some of the problems. For example, one of the problems with clones is that if you change the code, you have to remember to do that change in multiple locations. What they found was that in the vast majority of the time, clones either never changed, or only changed once. Meaning that the majority of the time, the work of refactoring the code to remove the clones would be at least as large as the work of just maintaining them.

What about the situation where one clone is modified, but another copy doesn't get updated. Well, they found that this is also very rare, and a lot of the time when this does happen, its intentional. And even of the times when they are unintentionally out of sync, the resulting code is typically not a bug.

So what does all this mean? I don't know. Maybe their sample projects weren't representative of most software projects. Or maybe, just like most other engineering decisions, there are trade offs, and there isn't a 100% right answer, independent of context. I still find that clones offend my sense of software ascetics most of the time, but I am trying to be a little more open-minded and be critical about what and when I refactor.

No comments: