Player Churn Rate Prediction

Use Case by Industry

Featured use cases All use cases E-Commerce & Retail Technology Gaming Healthcare Financial Services Insurance
Higher Education Industry Agnostic

Player Churn Rate Prediction

Machine learning has become an integral part of the technology landscape, enabling businesses to extract valuable insights from their data. It has now become ubiquitous and forms the foundation of Artificial Intelligence (AI).

For example, you may have heard about something called ChatGPT, which has lately been doing several rounds on the Internet and is being both lauded for its (sometimes) uncanny capabilities, and also cursed since it is causing headaches for anything from schools and Universities to research paper publications to Q&A websites such as StackOverflow.

However, building machine learning models that are able to predict outcomes given some existing information (e.g. predicting whether a tumor is benign or malignant) traditionally requires significant expertise in coding and data science, making it inaccessible to many organizations. This is particularly true for smaller companies, which may be unable to hire data scientists or may not be able to purchase hardware that is powerful enough to run some of the more complex algorithms.

Low-code machine learning is emerging as a way to combat these issues. With the help of low-code platforms, businesses can build and deploy machine learning applications without needing a team of experts. Even larger companies could benefit from such platforms, since some tasks could be off-loaded to the platform to free up time for more complex tasks to be performed by data scientists and machine learning experts.

In this article, a low-code machine learning platform will be investigated to see what sort of tools are available, how easy it is to use, what sort of results could be achieved, and if it could transform the way businesses approach data-driven decision-making.

The Problem

There exist numerous gaming companies, whose aim is to not only get people to play their games, but also to keep them playing the game for as long as possible. This is particularly important for free-to-play games, which more often than not include some form of micro-transactions.

Perhaps one of the most common forms of micro-transactions is the purchase of in-game currency, which then allows players to purchase items that could facilitate level progression or enable customization. It is these in-game purchases that fund the game’s development.

It is thus vital to track the number of players that stop playing the game, also known as the churn rate. The higher the churn rate, the higher the number of players leaving the game and the less income that is generated.

In this article, we will be using a real-world dataset, containing information about each level that a person has played. For example, there’s information on the amount of time played, whether the player won or lost the level, the level number, and so on. The target feature that needs to be predicted in our dataset is ‘Churn’, having a value of 1 if a player has not played the game for more than 2 weeks after completing the level, and 0 otherwise.

The user ID’s have also been included, but they have been anonymised so as not to reveal the original players’ identities. Some fields have also been removed. However, it should provide a solid basis to see if Actable AI’s tools can be useful in trying to predict whether a player will churn. The web application, rather than the Google Sheets add-on, will be used in this article.

The dataset has a total of 789,879 rows (samples), which is quite substantial and should help to reduce effects such as model over-fitting.

The meaning of each feature is as follows:

`ServerTime`: the servers’ timestamp when the level was played
`EndType`: the reason why the level ended (mainly ‘Win’ if the player won the game, and ‘Lose’ if the player lost the game)
`LevelType`: type of the level
`Level`: the level number
`SubLevel`: sub-level number
`Variant`: level variant
`LevelVersion`: level version
`NextCar`: unused (included to see how the platform handles features having only 1 label, as discussed later)
`AddMoves`: additional moves available
`DoubleMana`: unused (included to see how the platform handles features having only 1 label, as discussed later)
`StartMoves`: number of moves available at the beginning of the level
`ExtraMoves`: extra moves purchased
`UsedMoves`: moves used by the player
`UsedChangedCar`: unused (included to see how the platform handles features having only 1 label, as discussed later)
`WatchedVideo`: whether a video was watched, providing extra moves
`BuyMoreMoves`: Number of times a player purchased more moves
`PlayTime`: time spent playing the level
`Scores`: score achieved by the player
`UsedCoins`: total coins used in the level
`MaxLevel`: maximum level reached by the player
`Platform`: device type
`UserID`: ID of the player
`RollingLosses`: number of successive losses by the player
`Churn`: 1 if the player has not played the game in more than 2 weeks, 0 otherwise

Exploratory Data Analysis

One of the first tasks that need to be done by data scientists - and perhaps one of the more tedious ones - involves exploratory data analysis (EDA).

EDA is a data analysis approach that involves summarizing, visualizing, and understanding the main characteristics of a dataset. The goal of EDA is to gain insights into the data and identify any patterns, trends, or anomalies that may be present.

In performing EDA, analysts often use statistical methods to calculate summary statistics such as the mean, median, and standard deviation. They also create visualizations such as histograms, scatterplots, and box plots to better understand the distribution and relationships between variables.

Clearly, EDA is an important step in the data analysis process as it can help analysts uncover potential issues with the data, such as missing values or outliers, and inform the selection of appropriate statistical models for subsequent analyses. It can also help in understanding relationships between features, and how to combine existing features to create new ones that are more descriptive, and which could potentially improve model performance.

As already mentioned, Actable AI provides a set of tools that allow visualizations and computation of statistics. However, before using any of these tools, the data must be provided to the platform. There are three ways to do this, namely by uploading a CSV file, by uploading an Excel file, or by connecting directly to a database:

‍

Various options to use data in Actable AI.

Since my data was stored in a .csv file, I chose the ‘upload CSV’ option after which I could view the data on the platform:

Viewing the uploaded CSV file in Actable AI. Image by author.

Rows can also be quickly filtered to focus on specific values by typing in the desired value in the top-most row for the desired column. The ‘statistics’ tab provides some statistics for each column in the dataset, where the mean and standard deviation of numerical values computed and the number (and percentage) of samples for each label in a class are given:

Statistics of some columns in the dataset. Image by author.

In the image above, we can see that the predominant cause of the level ending (represented by `EndType`) is due to the player losing the game (63.6%) versus 35.2% of players winning the game.

We can also see that each level is played on average 126.6 seconds +/- 133.2 seconds, and the `UsedChangeCar` column appears to be useless since it contains the same value for all rows.

Furthermore, our target value is highly imbalanced, with only 63 samples out of the first 10,000 rows (i.e. 0.6% of the data) having a churn value of 1 (i.e. a player has churned).

A useful type of analysis is the correlation between features, especially those of the predictor features with the target feature. This can be done using the ‘Correlation Analysis’ tool, where some useful insights can be obtained (which can be viewed directly on the Actable AI platform here):

‍

Positive and Negative correlations. Image by author.

‍

In the chart above, the blue bars indicate the correlation of a feature with the Churn when the value is equal to 1, while the orange bars indicate feature correlations when Churn is equal to 0. There are a number of takeaways, such as players that lose a level being more susceptible to churning, and conversely that players who win a level tend to keep on playing. This is also represented using heat maps:

‍

Heatmap of the level end type with respect to Churn. Image by author.

It is evident that the majority of samples where a player has lost the game correspond to the case when Churn is equal to 1. That said, it should also be noted that the Spearman correlations are fairly low, indicating that these features individually are quite weakly correlated with our target. This means that it will probably be necessary to perform feature engineering, whereby the existing features are used to create new ones that capture more salient information that would enable a model to perform more accurate predictions.

Training a Classification Model

In the real world, we often have to create new features that are more correlated with the target feature, thereby enabling more reliable predictions. However, before creating new features, it is worth seeing what sort of performance we can achieve using just the original features in our dataset. To do so, we need to use the classification analytic, since we would like to predict whether a player has churned or not. In other words, we want to assign one of two labels to each row, based on a set of features.

What this means is that we can predict whether a player will churn (stop playing) based on data such as the play time, the end type, how many moves were used to complete the level, and so on. It would thus enable the game developer to respond quickly if it is predicted that many people will stop playing the game. It can also help to determine if any levels are particularly troublesome, so that more engaging levels can be designed.

Launching the classification analytic in Actable AI presents us with several options, the most predominant of which are as follows:

Time Column: the column in our table corresponding to a date/timestamp. In this case, it’s called `SereverTime`.
Time Range: samples (rows) in our dataset can be filtered to only include those within a certain time range. Our data spans 4 months, from January 2019 to April 2019. However, it might be a good idea to exclude the first few weeks - say the first month - so that we can reserve some historical data which we could use to compute new time-based features. More on that in the next article.
Predicted target: the column to be predicted, which is `Churn` in our case.
Predictors: the columns to be used for predicting the target. Most of the columns can be used here, although we could exclude any features that are redundant, such as `UsedChangeCar` that we looked at in the previous article
Sensitive groups: The documentation states that this refers to “Groups that contain sensitive information and need to be protected from biases.” We’ll keep it simple and ignore the field in this article.
Extra columns: any columns that need to be included with the displayed results, which were not already chosen as part of the predicted target or predictors. We can choose `ServerTime` so that we can see the date-time of each sample.
Filters: filtering of the samples in our dataset. For example, we might be more interested to focus on players that have gone beyond level 100 of the game, so we can specify that here
Training time limit: how long the models should be trained. I have set this to 1 hour just to see what sort of performance can be attained in a relatively limited amount of time.
Explain predictions: whether to compute Shapley values that explain the contribution of each column of a sample on the predicted result. More on this shortly. It will be interesting to see this, so we can enable this option.
Optimize for quality: whether to tune the model for improved performance. It is not compatible with the ‘explain predictions’ option, but it might be worth training another model with this option selected to try and maximize performance.
Cross validation: whether to split data into several folds to perform a more robust evaluation. We can keep it simple and omit this.

‍

Options selected in the Actable AI web app. Image provided by author.

There are also some more advanced options (located in the ‘Advanced’ tab) as follows:

Optimized for: the metric used to optimize the model. A number of options are available, such as the traditional accuracy, micro/macro/weighted precision/recall/F1, etc. A detailed explanation of these metrics is beyond the scope of this article, but more information can be found in Actable AI’s glossary. In this case, we can use ROC AUC, which should be more suitable than most other metrics given the imbalanced distribution of the target as observed in the previous article.
Feature pruning: whether to allow less features to be used when training models, so that any features that may decrease performance or which are not useful can be omitted. This can make models simpler and could improve performance, so this is a useful option to enable.

Once done, we can go ahead and train the models by clicking on the ‘run’ button.

After a few minutes, the results are generated and displayed, which can also be viewed here. A number of different metrics are computed, which is not only good practice but pretty much necessary if we truly want to understand our model, given that each metric focuses on certain aspects of a model’s performance. The first metric that is displayed is the optimisation metric, with a value of 0.675.

This is not great, but recall that we did not optimize the model for quality. Moreover, during our EDA, we noticed that the features were quite weakly correlated with the target, so it is unsurprising that performance is perhaps not ground-breaking.

This result also highlights the importance of understanding the results; we would normally be very happy with an accuracy of 99.7%. However, this is largely due to the imbalanced nature of the dataset, so even a dummy classifier that simply selects the most common class can achieve a high score. A great article about this may be viewed here.

ROC and precision-recall curves are also shown, which again show that the performance is a-bit poor:

‍

‍

Precision-recall curve. Image by author.

‍

These curves are also useful to determine what threshold we could use in our final application. For example, if it is desired to minimize the number of false positives, then we can select a threshold where the model obtains a higher precision, and check what the corresponding recall will be like.

A confusion matrix is also displayed, which compares the predicted labels with the actual labels. However, this assumes a probability threshold of 0.5, which is not particularly useful in our case since we might want to use different thresholds as discussed above:

‍

Confusion Matrix, using a probability threshold of 0.5. Image by author.

Next, we can see the feature importance table, which is perhaps one of the more interesting features in the Actable AI app. This is because, as the name suggests, it demonstrates the importance of each feature for the best model obtained. P-values are also shown to determine the reliability of the result:

‍

Feature Importance Table. Image by author.

‍

Perhaps unsurprisingly, the most important feature is `EndType` (showing what caused the level to end, such as a win or a loss), followed by `MaxLevel` (the highest level played by a user, with higher numbers indicating that a player is quite engaged and active in the game). On the other hand, ‘UsedMoves’ (the number of moves performed by a player) is practically useless, and ‘StartMoves’ (the number of moves available to a player) could actually harm performance. This also makes sense, since the number of moves used by a player and available to a player by themselves aren’t highly informative; however, a comparison between them could be much more useful.

Next up is a table showing the estimated probabilities of each class (either 1 or 0 in this case), which are used to derive the predicted class (by default, the class having a probability over 0.5, i.e. the highest probability, is assigned as the predicted class).

‍

Table with original values, Shapley values, and predicted values. Image by author.

Since the ‘Explain predictions’ option was also selected, Shapley values are also shown. Essentially, these values show the contribution of the feature on the probability of the predicted class. For instance, in the first row, we can see that a `RollingLosses` value of 36 decreases the probability of the predicted class (0, i.e. that the person will keep playing the game) for that player.

Conversely, this means that the probability of the other class (1, i.e. that a player churns) is increased. This makes sense, because higher values of `RollingLosses` indicate that the player has lost many levels and is thus more likely to stop playing the game. On the other hand, low values of `RollingLoss` generally improve the probability of the negative class (i.e. that a player will not stop playing).

More advanced information can also be viewed in the ‘Leaderboard’ tab, showing details on all the models trained, including their type (e.g. XGBoost, LightGBM, etc.), their metric score, training type, prediction time, hyperparameters used, and the features (columns) used. All of these models were trained automatically without us having to deal directly with writing of code, including specification of the data and packages to import, handling of data types, option settings, and so on.

‍

Information on the models trained. Image by author.

We can then also use the best model to generate new predictions, using one of two methods: we can manually fill in values for each column and set the probability threshold to determine which class to choose, or we could also use the provided API to use the model on our own dataset.

‍

Live model page, where values of each column can be inserted to provide a prediction using the trained model. Image by author.

At this point, we could try improving the performance of the model. Perhaps one of the easiest ways is to select the ‘Optimize for quality’ option, and see how far we can go. I did just that, and got the following results (which you can also view here):

‍

Evaluation Metrics when using the ‘Optimize for quality’ option. Image by author.

Focusing on the ROC AUC metric that we selected as the optimisation metric, performance improved from 0.675 to 0.709. This is quite a nice increase for such a simple change. But is there something else that we can do to improve performance further?

There is indeed a way to do this, but involves some added complexity in the form of feature engineering. This involves creating new features from existing ones, which are able to capture stronger patterns and are more highly correlated with the variable to be predicted.

Creating New Features

Normally, at this point we would try to create some new features that might be more useful in predicting our target. This is admittedly an advanced step that is done by professionals, but it goes to show that the Actable AI platform can handle both simple use cases and more complex ones.

In our case, it might be very useful to summarize records over time. For example, we can create columns where each row is updated based on a player’s past play history. This can be done using two methods in the Actable AI platform, namely (1) calculated columns or (2) SQL Lab.

Let’s start with calculated columns. You are presented with an interface where you can provide the name to the new column, and an SQL expression that is used to create the new column. Other options are also provided, such as specification of the column’s data type.

‍

SQL Lab, as the name suggests, also involves writing SQL expressions. However, you are now given a console where you can write several queries. This provides greater flexibility when creating multiple columns and when more advanced options, such as filter windows, need to be used. The query can then be run, following which the generated table can be saved as a new dataset on which the desired tools and analytics can be applied.

‍

If you’re not hugely familiar with SQL, you could try using something like ChatGPT to generate the queries for you. In my limited experimentation, it’s a-bit hit-and-miss though, so I suggest checking the results manually to verify that the desired output is being computed correctly. Thankfully, this can be easily done by checking the table that is displayed after the query is run in SQL Lab. Here’s the SQL code I used to generate the columns:

SELECT

SUM("PlayTime") OVER UserLevelWindow AS "time_spent_on_level",

(a."Max_Level" - a."Min_Level") AS "levels_completed_in_last_7_days",

COALESCE(CAST("total_wins_in_last_14_days" AS DECIMAL)/NULLIF("total_losses_in_last_14_days", 0), 0.0) AS "win_to_lose_ratio_in_last_14_days",

COALESCE(SUM("UsedCoins") OVER User1DayWindow, 0) AS "UsedCoins_in_last_1_days",

COALESCE(SUM("UsedCoins") OVER User7DayWindow, 0) AS "UsedCoins_in_last_7_days",

COALESCE(SUM("UsedCoins") OVER User14DayWindow, 0) AS "UsedCoins_in_last_14_days",

COALESCE(SUM("ExtraMoves") OVER User1DayWindow, 0) AS "ExtraMoves_in_last_1_days",

COALESCE(SUM("ExtraMoves") OVER User7DayWindow, 0) AS "ExtraMoves_in_last_7_days",

COALESCE(SUM("ExtraMoves") OVER User14DayWindow, 0) AS "ExtraMoves_in_last_14_days",

AVG("RollingLosses") OVER User7DayWindow AS "RollingLosses_mean_last_7_days",

AVG("MaxLevel") OVER PastWindow AS "MaxLevel_mean"

FROM (

SELECT

MAX("Level") OVER User7DayWindow AS "Max_Level",

MIN("Level") OVER User7DayWindow AS "Min_Level",

SUM(CASE WHEN "EndType" = 'Lose' THEN 1 ELSE 0 END) OVER User14DayWindow AS "total_losses_in_last_14_days",

SUM(CASE WHEN "EndType" = 'Win' THEN 1 ELSE 0 END) OVER User14DayWindow AS "total_wins_in_last_14_days",

SUM("PlayTime") OVER User7DayWindow AS "PlayTime_cumul_7_days",

SUM("RollingLosses") OVER User7DayWindow AS "RollingLosses_cumul_7_days",

SUM("PlayTime") OVER UserPastWindow AS "PlayTime_cumul"

FROM "game_data_levels"

WINDOW

User7DayWindow AS (

PARTITION BY "UserID"

ORDER BY "ServerTime"

RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW

User14DayWindow AS (

PARTITION BY "UserID"

ORDER BY "ServerTime"

RANGE BETWEEN INTERVAL '14' DAY PRECEDING AND CURRENT ROW

UserPastWindow AS (

PARTITION BY "UserID"

ORDER BY "ServerTime"

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

)

) AS a

WINDOW

UserLevelWindow AS (

PARTITION BY "UserID", "Level"

ORDER BY "ServerTime"

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

PastWindow AS (

ORDER BY "ServerTime"

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

User1DayWindow AS (

PARTITION BY "UserID"

ORDER BY "ServerTime"

RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND CURRENT ROW

User7DayWindow AS (

PARTITION BY "UserID"

ORDER BY "ServerTime"

RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW

User14DayWindow AS (

PARTITION BY "UserID"

ORDER BY "ServerTime"

RANGE BETWEEN INTERVAL '14' DAY PRECEDING AND CURRENT ROW

)

ORDER BY "ServerTime";

‍

Most of the feature names should be self-explanatory; for example, we are now computing the number of total amount of time that a user has played the game (i.e. not just for the current session), the number of coins used in the last day, week and 2 weeks, and so on. All of these are intended to provide some historical context as to the player’s journey in the game.

Once satisfied with the features created, we can then save the table as a new dataset, and run a new model that should (hopefully) attain better performance.

‍

Training a new (hopefully improved) Classification Model

Time to see if the new columns are any useful. We can repeat the same steps as before, with the only difference being that we now use the new dataset containing the additional features. The same settings are used to enable a fair comparison with the original results, with the following results (which can also be viewed here):

‍

Evaluation Metrics using the new columns. Image by author.

Again focusing on the AUC, this is much improved compared with the original value of 0.675; it’s even better than the model optimized for quality (0.709). This demonstrates the importance of understanding your data and creating new features that are able to provide richer information.

It would now be interesting to see which of our new features were actually the most useful; again, we could use the feature importance table:

‍

Feature importance table of the new model. Image by author.

It looks like the total number of losses in the last 2 weeks is quite important, which makes sense because the more often a player loses a game, it is potentially more likely for them to become frustrated and stop playing. The average maximum level across all users also seems to be important, which again makes sense because it can be used to determine how far off a player is from the majority of other players (much higher than the average indicates that a player is well immersed in the game, while values that are much lower than the average could indicate that the player is still not well motivated).

These are only a few simple features that we could have created. There are other features that we can create, which could improve performance further. I will leave that as an exercise to the reader to see what other features could be created.

Training a model optimized for quality with the same time limit as before did not improve performance. However, this is perhaps understandable because a greater number of features is being used, so more time might be spent for optimisation. As can be observed here, increasing the time limit to 6 hours indeed improves performance to 0.923 (in terms of the AUC):

Evaluation metric results when using the new features and optimizing for quality. Image by author.

Conclusions

In conclusion, we have used the Actable AI platform to load a dataset, explore its features, create new features, and train classification models. The feature importances and the contribution of each feature value on the probability of the predicted class are also computed, which can be studied to determine what useful insights could be derived in order to improve model performance. They could also be used to help the game developer in improving level design to keep players more engaged, decrease the churn rate, and in turn increase revenue.

As mentioned above, there’s much more that can be done. The models can be trained for longer, more features can be created, different filters and optimisation metrics can be used, and other ways to define churn can be considered.

Other tools could also be used, such as counterfactual analysis (to determine the effect on the outcome when one of the inputs changes), time-series forecasting (to determine if there are any seasonal trends), and causal inference (which considers interactions among features to determine whether a change in the predictor variables actually causes changes to the outcome).

Optimize your results & drive more impact with Actable AI

The Problem

Exploratory Data Analysis

Training a Classification Model

Creating New Features

‍

Training a new (hopefully improved) Classification Model

Conclusions

Product

solutions

Company