The TopCoder Marathon Match that just finished was a poker problem. They defined a very simple game of poker and a probabilistic strategy their server would follow. My task was to write a program that played 10,000 hands of poker against a specific strategy and try to win. The problem description was couched in terms of trying to learn the server's strategy.
My solution observed the play of every hand and built tables which calculated the probability of the other player's actions. Using these probabilities, my program chose actions that maximized expected value at every point. When I tested this approach and allowed it to play millions of hands, this worked great. Unfortunately, 10,000 hands was not enough hands to get good enough information about the other players actions in all situations.
I spent the rest of the match trying to figure out what short cuts I could make. Things like, "well I haven't seen this betting pattern exactly, but I've seen similar actions, so I will use what I saw in the similar (but not exact) situation as a predictor of the computer's strategy." The other thing I did was hard code a couple rules. I noticed that it took a long time for my solution to learn to fold the bad hands. Over the course of 10 million hands it would learn to fold about 30% of them. In the first 10,000 hands it only folded about 1% of them. I've played enough poker to know that not folding enough is the mistake of most weak players. To address this, I just hard coded the program to fold the weak hands.
This combination of strategies were enough to let me finish in around 60th place. Since the top 200 advance this round, this is good enough.
So what does this have to do with ruts? Well it turns out that a number of the people who did better than me did not create learning strategies. Instead of trying to figure out how to observe and adapt appropriately, they spent their time coming up with an algorithm that would be good against all strategies. Some of them were able to come up with some very simple strategies which were, nevertheless, able to do better, on average, than my more complicated learning strategy. If we played millions of hands rather than 10,000, mine would almost definitely be better, but that wasn't the problem we had to solve.
I got stuck in the rut of figuring out a learning algorithm, I just never realized it. While it occurred to me to figure out a good fixed strategy, I only thought of that in the context of what my learning program should do until it learned the opponent. Since this felt like just a tweak to my approach, I never got around to doing it.
Moral of the story: sometimes it can be a good idea to try different ideas even if you don't realize you are in a rut.
1 comment:
Appreciate your blog ppost
Post a Comment