Customer Behavior Correlation - Fighting Churn WIth Data

This post explains how to understand customer behavior correlation and how it impacts your churn reduction strategies.

Why Should you Care About Correlation Between Customer Behavior?

My last post was on understanding customer behavior and churn. I demonstrated that customer metrics show you a lot about customer health.

But many customer behaviors are closely related. So metrics of those behaviors will have a similar relationship to churn. And when you do cohort churn analysis on a typical company’s database of events and metrics you would probably have dozens or more.

To fight churn effectively with your data, you need to do more than understand how individual customer behaviors are related to churn. You need to understand that customer behaviors have correlation to each other. When you do that, then instead of looking at how indidividual behaviors are related to churn you can look at how groups of behaviors are related to churn. That way you turn the problem of having too much data into an asset because groups of behaviors often show a clearer relationship to churn than individual behaviors alone.

Customer Behavior Correlation Scatter Plots

One of the easiest ways to understand correlation between behaviors is with scatter plots like the ones shown below. These show examples of metrics with different degrees of correlation from company case studies:

Klipfolio : Klipfolio is a Software as a Service (SaaS) company that allows businesses to create online dashboards of their key metrics. Klipfolio’s metrics include editing, saving and viewing dashboards and klips.
Broadly: Broadly is a SaaS company that helps businesses manage their online presence. This includes metrics of adding customers and transactions, asking for reviews, and getting promoters.
Versature: Versature is a provider of Cloud-based business communication solutions and their metrics include calls and other services.

(A big Thank You to these companies for allowing me to share their case study data with you!)

Scatter plots and correlation measurements show when customers who engage in one behavior often engage in another behavior. Two metrics or behaviors are correlated when a customer who has a high value in the first metric usually also has a high value in the second metric. And a different customer with a low value on the first metric usually also has a low value on the second. You can also describe correlation by saying that an increase in the first metric is associated with an increase in the second metric.

Scatter plots like that are easy to create in python:

plt.scatter(metric1_series, metric2_series, marker=‘.’)

plt.title(‘Correlation = {}’.format( met1_series.corr(met2_series)))

For more details on the code, see my GitHub repository.

Customer Behavior Correlation Heatmaps

Scatter plots are useful for understanding the relationships between pairs of metrics that you are interested in, but they are an inefficient way to investigate the correlations between the pairs in a large set of metrics. That’s because if you have even a moderate number of metrics there will be a much larger number of combinations. A correlation matrix is a table of all of the pairwise correlation coefficients between the metrics in a data set. This is a much more efficient way to look at a large number of correlations.

Below is an example from the Klipfolio case study. That shows the correlation matrix with the metrics organized into correlated groups.

(Thank you to Klipfolio for allowing me to share their case study data in my book and this post!)

Making a correlation matrix in python is also pretty easy, if you already have the dataset:

churn_data = pd.read_csv(data_set_path)

corr = churn_data.corr()

corr.to_csv(save_name)

But note that I don’t recommend making heatmaps in Python. If there are more than fifteen or twenty metrics, its too much. Either the heatmap image must be enormous, or the metric names and correlation values are too small to read. So it’s not practical to explore large correlation heatmaps in static images. You should definitely inspect the correlation heatmap closely, but it is usually easier to view it in a spreadsheet application. You fix the metric name row and column to make the matrix scrollable and use conditional formatting features to add the heatmap colors. For presentations you can export various formatted versions.

Grouping Correlated Customer Behaviors

So how do you handle it when you have a lot of correlated customer behaviors? I recommending grouping them and calculating average scores for the group. The concept is illustrated in the figure below. The diagram shows metrics for a small group of customer metrics for behaviors that have been converted into scores (also known as normalized values). The customer metrics are correlated and can be averaged together with a loading matrix.

After metrics have been grouped together with a loading matrix, it is useful to run churn metric cohort analyses on the grouped metrics. The picture below shows the examples result from the Klipfolio case study.

This shows that grouped metrics can be a very effective way to understand how correlated customer behaviors relate to churn. For more detailed examples you’ll have to check out my book, Fighting Churn with Data, chapter 6.

Grouping Correlated Behaviors vs. Dimension Reduction

If you have a formal training in data science or statistics you’ll probably recognize that this post covers the idea and practice of dimension reduction. But because of the need to communicate the concepts to business people, I refer to it as Behavioral Grouping. That describes the key result in plain English. If you are formally trained you’ll also think that I stick to a very basic kind of dimension reduction. However, I caution you against thinking of this as a “dumbed down” approach. The approach taken is deliberately simple, true. But while it is not “optimal” in the usual statistical sense, it is optimized for explain-ability, but not only that: It is also excellent for robustness and out of sample predictive performance in the face of messy data and a problem that never stops changing (i.e. churn is “non-stationary”).

How do you find groups of correlated customer behaviors?

At this point you are wondering how to discover groups of correlated behaviors, like in the case study above. Unfortunately, that’s too much for one blog post. I’ll explain my techniques in a future post, or you’ll see it when chapter 6 of my book comes out in the early access electronic edition of Fighting Churn With Data. (At the time of this post, chapters 1-4 are in the ebook, and chapter 5 is coming out soon…)

Happy Churn Fighting!