Try out the Game Recommender Engine!

Around the time when we figured out how to identify the most popular games for each gaming motivation, we realized we had solved almost all of the analytic components needed to provide personalized game recommendations to gamers who take the Gamer Motivation Profile. In our feedback surveys, this was also the most commonly requested feature, so it was always something we were working towards in the back of our heads.

At a high level, we needed to identify the gamers most similar to someone who takes the Gamer Motivation Profile, and tabulate the most frequently mentioned games in that gamer pool. The hardest part of this is having a sufficiently large pool of gamer data, which we now have thanks to the over 140,000 gamers who have taken the Gamer Motivation Profile.

A secondary problem is figuring out how to tabulate the game list properly: Without any adjustments, games that are highly popular (think World of Warcraft or GTA) would show up in everyone’s list. As we learned when identifying the most popular games for each gaming motivation, adjusting each game’s local frequency by its baseline popularity resolved this problem. This is the origin of the “QF score” (Quantic Foundry score). The QF score provides a metric for local popularity: A 2 means a game is twice as popular in the local sample compared with the baseline.

The hardest part of this is having a sufficiently large pool of gamer data, which we now have thanks to the over 140,000 gamers who have taken the Gamer Motivation Profile.

The Two Remaining Obstacles

So that left two implementation problems on the table. Thus far, we had been working with the raw open-ended text entries of people’s favorite games. The aggregate data was workable and interpretable, but it was too messy to surface in a user-facing web app. It also biased longer game titles to be less discoverable, since there were many more permutations (and errors) to how gamers listed them. For example, consider possible permutations of “World of Goo” vs. “Assassin’s Creed IV: Black Flag”.

The other problem was computational. Based on someone’s Gamer Motivation Profile (i.e., a list of 12 motivations), we first needed to find the 1000 most similar gamers in our pool of 140,000 gamers. And then we would need to tabulate up to 3000 game titles mentioned in that subsample, adjust by those game titles’ baseline popularities, and provide the sorted list back to the gamer in real-time. This was far more computation than any of our then existing web components, and we weren’t quite sure how long it would take us to optimize it for web use.

Mechanical Turk to the Rescue

We tackled the text entry problem first, since that would be useful to our data analysis regardless of whether we could figure out the Game Recommender Engine. In all, there were over 22,000 unique game title entries. And this was after converting all strings to lowercase and removing extraneous white spaces.

We recognized that matching a text entry to a canonical game title wasn’t a difficult problem, but was highly time-consuming. It felt like a problem that could be implemented as a task on Mechanical Turk, but we first needed to come up with a list of canonical game titles for Turkers to match to.

At first, we tried to generate one from Metacritic, but we noticed that many older games aren’t listed there. Then, we realized we didn’t need to explicitly provide a canonical list upfront.

There were over 22,000 unique game title entries.

Instead, we asked Turkers to Google the game title entry with “wikipedia game” added to the search term and then copy the title on the Wikipedia page for the game. If the text entry contained “franchise/trilogy/series/all of them” etc., this was entered with “(series)” added.

Each unique game entry was coded in triplicate (i.e., by 3 different Turkers), and a majority vote was used to identity the best title. We inspected all the codings and hand-coded the few entries without a majority vote. So that solved the text entry issue.

Making it Real-Time

The primary computational bottleneck was in identifying the 1000 nearest neighbors in a total pool of over 140,000 gamers quickly. Adjusting and sorting game titles, now that they were all cleaned titles, was much more straight-forward.

The primary computational bottleneck is identifying the 1000 nearest neighbors in a total pool of over 140,000 gamers.

Fortunately for us, this is a common and well-understood computer science problem. The typical solution is to pre-compute a data structure to facilitate the subsequent real-time search for nearest neighbors. As an analogy, the phonebook compiles and sorts everyone’s names to make individual lookups much more efficient. For us, our phonebook is known as a “ball tree” in computer science. Essentially, it frontloads the computation for nearest neighbor search. And the Game Recommender Engine is able to provide personalized recommendations in a few seconds.

From Game Recommender Engine to Game Together Engine

As we implemented the Game Recommender Engine, Nic and I often looked through our own game recommendations and we would point out similarities and differences between our lists. It then dawned on us that this Venn diagram of two gamers would be fairly straightforward to implement once we had the Game Recommender Engine in place.

The Game Together Engine identifies the overlapping games in two gamers’ individual recommendations, and shows these games sorted by the QF scores. In addition, it shows the remaining games for both gamers, effectively providing the Venn diagram of game recommendations.

Let Us Know What You Think

We’re happy that the Game Recommender Engine as well as the Game Together Engine are now live. We hope you have fun with these two engines. Use the comments below to let us know what you think and if you have any other ideas for web apps based on this data that you would like to see.