Calculating churn probability is an important part of fighting churn because of three key use cases:

- Evaluating which behaviors are most important for engagement
- Calculating customer lifetime value
- Segmenting customers for high cost interventions

In this post I give an introduction to logistic regression: Logistic Regression is the most common and versatile way to calculate the churn probabilities. I assume you have calculated metrics and created a data set, following the instructions in my previous posts (or in the book, Fighting Churn With Data.)

### Model of Churn Probability

The logistic regression model for churn consists of two key concepts.

- More engagement causes a higher chance of retention. But there are diminishing returns for both low and high levels of engagement.
- Engagement follows a multiplicative model. That means you multiply behavioral metrics by a “weight” (or coefficient) and add the results to get engagement.

First, the concept of diminishing returns to engagement (and disengagement) is shown in the picture below. The relationship is “S” shaped because of the diminishing returns.

Next, the multiplicative model for calculating engagement from behaviors and weights is shown in the next diagram. Don’t worry if it looks complicated – you don’t have to keep track of all that multiply yourself (That’s what computers are for!)

Of course, you are probably wondering how you determine the engagement weights. Finding the weights is what the Logistic Regression algorithm is for. The code in python the code is shown below.

```
from sklearn.linear_model import LogisticRegression
retain_reg = LogisticRegression(fit_intercept=True,
solver='liblinear', penalty='l1')
retain_reg.fit(churn_data, churn_outcomes)
engagement_weights = retain_reg.coef[0]
```

For details, check out the source code (listing 8.2 from Fighting Churn With Data) in my Github repository.

### Churn Prediction Case studies

With this in mind, let’s look at some real examples. The table below shows a result from a case study of Broadly. Broadly is a SaaS product for businesses. Companies use Broadly to manage their customers and reviews. As I explain in my post on behavioral correlation, metrics are in groups into related areas. The top two groups for Broadly are metrics for 1) customers & messages; and 2) reviews. The results show that the engagement weight for reviews is higher: 0.55 compared to 0.16. That illustrates using logistic regression to find the most engaging behavior areas on your own product.

The next trick is to use histograms to visualize the churn probabilities of all your customers. Histograms count how many customers have a churn probability in a sequence of ranges. The count for each range is shown as the height of a bar in the plot. To illustrate this, three examples of churn probability histograms are shown below. Note that the exact churn rates are hidden for confidentiality reasons.

As you can see, there are at least three patterns to customer churn probability. Unfortunately, explaining all the details would be too much information for a blog post! In future posts I will say more about using churn probability forecasts to segment customers, and calculate customer lifetime value.

If you want more information and details about logistic regression and calculating churn probabilities, check out Chapter 8 of my book, Fighting Churn With Data. You can now get the entire book in e-book edition through my publisher Manning’s Early Access Program (MEAP)