Closer Look at Available Data

Let’s look at the data we have available. On October 17th, 2015, I downloaded info about more than 6,000 games released on Steam. Thankfully, Steam offers an API which was enough to get most of the info I wanted. Sadly, there are some missing or even incorrect pieces such as release dates. This led to some games being left out of the dataset and some having an incorrect release date attached to them.

Besides the API, I parsed user tags directly from HTML. This is one of the questionable attributes which changes a lot after a games’s release but I’ll get to that at some point. In order to retrieve launch prices, I had to use something else than Steam as it only shows the current price. has really good historical data regarding sales and I might actually find more use for that.

I filtered out free-to-play and Early Access titles, and left games released since August 1st, 2012 till June 31st, 2015. I needed data from which was launched in July, 2012. After some cleaning, I obtained a dataset of 3,021 games. There are some games missing but that should have close to no effects on the results.

