When I started my research, I had only one measure of success available – the average number of players from Steam Charts. I later obtained data about owners from Steam Spy hoping to simply infer owner numbers from player numbers. However, there is not enough correlation between these two types of metric. I don’t believe it’s due to an error. There are, for example, games sold in bundles or just very cheap games. These get high sale figures but not necessarily profit or player base.
Since the beginning, I’ve been trying various machine learning models as it’s almost impossible to predict which is going to provide the best results. And since the beginning, Random Forest has been out-performing other models. I started to like Random Forest and didn’t really care much about the others. That’s generally a bad idea unless you’re 100% confident. I wasn’t.