Summary: Predicting Success of Steam Games

This is a summary of my academic research which aimed to show whether the success of Steam Games can be evaluated before their release, without knowing anything about their reception. The idea was to evaluate a concept rather than a nearly-finished or even a released game.

One could probably get very nice predictions of a game’s sales after several days or weeks following its release and looking at YouTube, Twitch, reviews, social etc. But I wanted to know whether it’s possible to estimate a game’s potential early on, making it possible to e.g. suggest changes during development which would make the final product more successful.


The research was conducted on a dataset of games released on Steam containing nearly 10,000 entries until July, 2016. The data was eventually reduced to games released since September, 2013 when Steam Greenlight changed its policy to allow more games on the platform.

Free-to-play and Early Access titles were excluded as they gain profit in different ways than traditional premium games. After all filtering, over 4,200 games were eventually left in the dataset. The information about each game was mostly downloaded from the Steam website. I had two choices of a measure of success: owners from Steam Spy and concurrent players from Steam Charts. Unfortunately, historical data from Steam Spy isn’t available and hence I decided to use Steam Charts to calculate the average concurrent players in the first 2 months after release. Not as useful as owners but reasonably available. The key attributes describing each game are:

  • Name
  • Developer
  • Publisher
  • Age Requirements
  • Release Date
  • Price
  • Description
  • Platforms
  • Game/Steam Features
  • HW Requirements
  • Languages
  • Genres
  • Thumbnail
  • Screenshots
  • User Tags
  • Concurrent Players (


Let’s look at the premium games for which I had data from Steam Charts (August 2012 – July 2016, roughly 4,600 games). In the past years, around 30 % games had less than 1 player on average after release while only a few reached over 10,000 players on average, as seen below.

Average number of players in games (released August 2012 – July 2016)

Since Greenlight, we’ve been seeing more and more games virtually no one plays. As shown below, 2013 was still a reasonable year but then Steam got flooded with hundreds of low-quality games. Unfortunately, I don’t have data covering the whole 2016 but it would likely show a similar trend of an increasing number of games no one ever plays.


I looked at games released in 2015 and compared values of some attributes with how the games were successful. It showed that Multi-Player, Trading Cards, and Achievements are far more often present in the more successful games. For obvious reasons, games with higher hardware requirements and price generally sell better. Higher budget simply helps a lot.

Steam genres are way too generic and don’t show any significant differences. User tags, on the other hand, are far more interesting. Those that stand out as unsuccessful include: 2D, pixel graphics, platformer, point & click, puzzle, retro. Games with these tags are generally doing pretty poorly.

Games with the following tags have been mostly successful: third-person (shooter), first-person (shooter), open world, sandbox, survival, story-rich, fantasy, sci-fi, zombies.


The dataset went through a process of adjusting and deriving new attributes, totaling at over 200 attributes. I then used machine learning methods to predict the average concurrent players. I tried both regression (predicting the exact number) and classification (dividing the games into two groups). SVM and Random Forest were generally providing the best results.

Regression gave me 0.7 correlation (0 means no correlation and 1 means perfect fit) and 72 % RRSE (100 % is bad, the less the better). For classification, I tried detecting games with more than 10 players on average (to see if the “better” ones can be separated). I only detected 39 % percent of them while getting 19 % wrong (formally: 81 % precision and 39 % recall, the overall accuracy was 86 % with 79 % baseline).

These aren’t exactly amazing results, although it shows that there definitely is a correlation between metadata about a game and its success. But I decided to find a subset of games on which the predictions would work better.

Such subset turned out to be games from developers/publishers having at least two games already released. When evaluating an upcoming game from such a developer/publisher, I know how their previous games were doing, which itself isn’t enough to make an accurate prediction but it significantly helps.

With this criterion, I can cover around 33 % games. For regression, this means an improvement to 0.82 correlation and 58 % RRSE. Classification gave me 79 % precision and 55 % recall, meaning the algorithm was slightly more wrong about games having more than 10 players on average but missed less games (overall accuracy 83 % with 72 % baseline). I definitely liked regression more as dividing games into groups means there are a lot of games close to the the other group on both sides.

The following features turned out to have the largest impact on the predictions (careful: presence or higher value doesn’t necessarily mean higher prediction):

  • Minimum and maximum of average players across previous games by the same developer/publisher, and Gini index of these numbers
  • GPU and storage requirements
  • Tags: Open World, Third-Person, Sandbox, Story-Rich
  • Support for Spanish, French, Polish, Italian, Russian, German, Portuguese-Brazil and the total number of supported languages
  • Genres: Indie and Casual
  • Presence of DRM in any form and presence of EULA
  • Description’s length
  • Launch price
  • Age requirements
  • Whether the game is a sequel
  • Average saturation and number of distinct colors of the first displayed screenshot
  • Presence of multi-player

There is an application for predictions available but I wouldn’t recommend using it for any serious business decisions because I had to keep the predicted intervals pretty wide (and it can still be inaccurate). Think of it more of as a toy showing how e.g. adding some features increases the predicted number of players. (Also, I was’t doing this research to create an end-user application so apologies if it crashes on some inputs).


While it’s not possible to make exact predictions about a game’s success knowing only basic descriptive information about it, there IS a strong correlation. If I were to give some general advice according to my research, it would be the following:

  • If you haven’t released anything successful on Steam yet, you’re gonna have a hard time. It doesn’t mean you have no chance, you’ll just need to put a lot of effort into your game and inform the right people about it. And maybe have a bit of luck.
  • Multi-player is pretty good.
  • Shiny 3D games are pretty good, both 3-rd person and 1-st person.
  • Players like open world.
  • Don’t make another 2D retro-inspired platformer. Seriously, don’t.

One thought on “Summary: Predicting Success of Steam Games

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s