Monday, May 23, 2011

Moore's Law Doesn't Catch Imagination

Back in 1992 I had an opportunity to use an iPSC super computer. If I recall correctly, the particular computer I used had a total of 192 MB of RAM. This was just an outrageously huge amount. It was enough that I didn't really pay too much attention to the memory I was using on the program I was writing. When it was suggested that maybe my program was failing due to using too much memory, I was incredulous. But upon further inspection of my program, sure enough, I was trying to allocate arrays requiring a few hundred megabytes of memory.

Using a generalization of Moore's Law that says that processing power should double every 18 months (I know, that's not the real law), and given that approximately 12 18month periods have passed since then, computers should be 4,000 times as powerful. And sure enough, you can find supercomputers with terabytes of RAM, and the laptop I just bought has 8GB of RAM, which is 4,000 times what was common then. I am actually kind of amazed at how accurate this rule of thumb is.

However, my programming goals apparently grow just as fast. Just the other day, as I was practicing for programming contests, I tried to allocate a 4 terabyte array. Needless to say, my program failed. And for some reason, instead of just crashing, it ground my computer to a halt, eventually requiring a reboot. I suspect in another two decades I'll be crashing programs by trying to allocate exabyte arrays.

Conclusion
For many business or web apps, network or database latency is going to be your biggest bottleneck. Because of this we often treat memory as infinite and processing time as zero. The emphasis on writing readable, maintainable code as opposed to the most efficient is typically the right trade-off. However, memory isn't infinite, and processing time isn't zero. Before coding a module that requires pulling your entire database into memory, think about whether this is a couple megabytes of memory or a hundreds of terabytes.  Modern computers are powerful, but they still aren't as powerful as we'd all like.

6 comments:

Anonymous said...

Did your iPSC contain the i860 processor? I read that I was difficult to optimize the pipelines on that processor. Did you ever do some analysis to see if your code was anywhere near the theoretical peak performance of the i860 (80mflops)?

Michael Haddox-Schatz said...

You're asking me to remember something from 19 years ago! I am not sure what processor it had. What I do remember was that is was my first exposure to parallel programming and that most of my work was getting my algorithm to efficiently run on the multiple CPUs at once.

Anonymous said...

Were you running a separate process on each processor? what did you use to communicate and synchronize between processes? or did you just have 128 threads?
-Alan

Michael Haddox-Schatz said...

The machine I was on had 16 processors in a 4-d hypercube configuration. So I had 16 separate processes running, which communicated via the protocol provided. I don't remember the details, other than that it was a low level C library to make the calls and that nodes only communicated with their neighbors. (at least directly)

Anonymous said...

Let's see does hypercube config mean that each processor can only communicate directly with 4 of the other processors? I'm glad I saw that hypercube program you wrote years ago or I would not have been able to figure that out. You know if you want to do an entire post about the iPSC, that's fine with me!
-Alan

Michael Haddox-Schatz said...

That is what a 4D hypercube configuration means, yes.