Since the beginning, I’ve been trying various machine learning models as it’s almost impossible to predict which is going to provide the best results. And since the beginning, Random Forest has been out-performing other models. I started to like Random Forest and didn’t really care much about the others. That’s generally a bad idea unless you’re 100% confident. I wasn’t.
I tend to think of data mining as a two-stage process: data acquisition, and preprocessing is the more boring and often quite time-consuming part. And then there are the actual predictions. That’s where we use machine learning. If you don’t know anything about that, you might want to scroll through this ridiculously beautiful introduction. However, this is also the part where you might realize you did something wrong earlier and so it usually iterates between tuning the data and training models.
I purposefully did not include data about reviews and overall reception to see if games are predictable based on what kind of games they are. It’s hard to tell what the result should be. Some might say it’s impossible to predict anything from that but I definitely expected at least some correlation.