Skip to content

Build a model predicting which customers are likely to cancel subscription by analysis of their usage and habbits

Notifications You must be signed in to change notification settings

LaurentVeyssier/Minimizing-Churn-of-subscription-product-through-usage-analysis

Repository files navigation

Minimizing-Churn-of-subscription-product-through-usage-analysis

Build a model predicting which customers are likely to cancel subscription by analyzing the use of the application and habbits.

This project illustrates how machine learning can optimize digital app's revenues. It can be looked at in parallel with Directing customers to subscription and identify most important drivers, which I covered separately.

Problem statement

The main source of revenues for many digital applications comes from subscription. Subscriptions provide a continuous flow of revenues making it possible to finance growth and new developments. Subscription-based companies want to reduce customer churn as much as possible and retain customers life-time value. Customer acquisition is far more expensive than keeping existing customers. So, to be able to keep the customer is extremely valuable.

To retain customers, companies must identify behavorial patterns giving early warnings of customer disengagement (such as customers starting to lose interest in the service). Being able to identify early signs and catalyst to disengagement represents a tremendous opportunity to re-engage these customers with the product or service.

This business case proposes to build a model predicting which customers are likely to churn so that the company can focus on retaining them. For example, the company can develop new features that customers would be interested in or provide information on value-added services the user did not realize or has forgotten about.

This situation can apply to a wide variety of business cases and subscription services. It assumes the company has data available on how the customer is using the service: What functionalities he uses, how the customer interacts with the various services, etc...

Business case

In this project, the business case involves a digital app offering financial / banking services such as loans, credit cards, purchases and deposits to name a few. We assume that, by subscribing, customers have provided data on their financial situation and how they use the service. Demographic information is also likely to be available as it is acquired during the sign-up process.

Behavioral data (usage log)

Information available are not time-bound. This includes a log of activity. It combines usage data (number of deposits made, number of purchases made, credit card used, app usage patterns and frequency...) and customer specific information (income and revenues, referrals, type of mobile platform, housing situation...).

Tha dataset available includes activity logs for 27,000 customers. It also includes the response variable whether the customer churned or not.

Steps in the Notebook

  • Dataset exploration, cleaning and normalization
  • Construct machine learning model
  • train the model and evaluate performance
  • examine key variables influencing churn, driving it or reducing it

Results

A logistic regression model is fit on the training data. We use sklearn module. The model detects 74% of customers cancelling their subscription (recall metric), an acceptable performance considering our objective to detect leavers and act upon them. The performance of the model is weaker when it comes to sensitivity metric (amongst predicted leavers, what is the percentage of users who have really left). This means many "false positive" are generated by the model resulting into unnecessary re-engagement actions. Yet the model achieves a good performance on our main objective to identify 74% of the potential leavers.

What does the model do? The model identifies an hyperplane which is able to separate leavers from non-leavers with the lowest error. In a multidimensional space with the decision variables as dimensions (activity log and demographic information), key variables are combined to form the best decision boundary. Out of all 41 variables available, the project identifies 20 variables, without performance deterioration, able to predict which users are starting to lose interest in the app. This allows to focus on a limited number of key variables, meaning that more than half of the variables have no predictive performance.

Analyzing the key drivers for cancelling subcription or keeping customer engaged can bring a lot of valuable information to refine and improve the value proposition of the company as well as the user experience of the app, both with a direct impact on customer satisfaction and retention.

About

Build a model predicting which customers are likely to cancel subscription by analysis of their usage and habbits

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published