How To Create a Churn Dataset

Creating A Churn Dataset in the process of fighting churn with data Creating A Churn Dataset in the process of fighting churn with data

The scenario in this blog post assumes you have already created some behavioral metrics (as described in User Metrics 101) and calculated some kind of churn rate measurement (How to Calculate Churn With SQL.)  This post describes a preparation step for the churn analysis: You are going to create a churn dataset!  That means collecting observations of customer metrics at  times when customers churned or continued on the service, and combine them in a single table.  Creating a data set is a necessary step before you can analyze churn.

A Dataset of Churn Experiments

The essence of fighting churn with data is learning from the “natural experiments” that occur every time a customer chooses to stay with or churn from the service.   A natural experiment in this context means a situation that tests an outcome you are interested in, but you didn’t set it up like a formal experiment. These experiments are the churns and renewals that have already occurred, and the results are waiting for you in your data warehouse!  You might be asking yourself why aren’t you learning from the results already? Well, observing these “experiments” and reading the results can be a little tricky if you’ve never done it before. This post talks about the right way to observe the customer experiments that have already taken place in your own data, and combining them to create a churn dataset that makes them useful for the analysis.

As in most of these scenarios, the challenge is partly due to complexity and partly due to logistical considerations.  The complexity challenge in observing a lot of customers is that they are all at different points in their journey with your product.  So it doesn’t make sense to just look at all your customers right now, or at any single fixed point in time. You want to observe them all at the same point (or points) relative to their own lifecycle on the product, which makes them comparable.  If you do this incorrectly and observe at the wrong times it might distort your analysis and be counterproductive in the fight against churn. This post will teach you how to pick appropriate observation points in the customer life cycle.

This combined set of customer snapshots is called a dataset of customer churn and renewal observations, or simply the churn dataset.  In case you are not already familiar with the term, a dataset is a term used in data science and statistics for a collection of data assembled for a particular analysis.  When a collection of data is called a “dataset” it implies that the data is organized in a table having the same number of columns for every row, and in which every row contains complete information for one instance or observation of the phenomena in question (meaning, separate rows are separate observations.)

Churn Leading & Lagging Behaviors

Leading and Lagging Behavior Creating a Churn Dataset

In order to create a churn dataset you need to observe the “natural experiments” that occur when customers churn or continue to use a product. So you need to start by asking when to make the observation.  Easy, right? Observe a customer when they have churned – isn’t that the point?  Not quite. Ask this – what will a customers behavioral metrics for a media sharing app will be when the customer has churned:  Logins? Zero. Downloads? Zero. Likes? Zero. Because they’ve churned, all their behaviors on the product will have stopped.

So observe customers before they churned.   Right! I call this observing customers with a lead time in making the observation, which means making the observation before the thing you are really interested in (the renewal or churn).  But how long before the churn should you observe a customer – a day before they churned? Maybe, but most likely you should observe what customers were doing even longer before they churned.  That’s because often the behavior of customers change in the time immediately before they are going to churn. This is illustrated in the drawing above with a hypothetical example for a media sharing service:   If someone is planning to churn, some behaviors are likely to be reduced in the period right before the churn while others may actually increase. In the example of a hypothetical file sharing service, uploads may completely stop before churn since the customer doesn’t want to waste time contributing anything else; instead they focus on downloading content before their access to the service ends.  Logins may even increase in the period before churn, before going to zero.

For some products, these kinds of changes in behavior may make likely churners easy to spot in the period before churn.  But behaviors brought on by imminent churn are still not what you want to observe, because that’s not going to tell us why the customer chose to churn in the first place.   You want to observe what the customer was like in the time before they decided to churn, because then you are observing what a customer looks like when they are still making up their mind.  This is important because when the customer is still making up their mind is when you still have the best chance to influence them!   I will emphasize the point:

TAKEAWAY The goal of churn analysis is to identify and understand customers who are still making up their mind about churn, because that is when you have the best chance of influencing them.

How do you know when customers are still making up their mind about churning or continuing the product?  You can’t know exactly, so you have to observe customers at a time when it is reasonable to expect them to be thinking about their next renewal: Not immediately after the last renewal, and not right before the upcoming renewal where they might churn.  The exact amount of time depends on the service, but generally the longer the commitment and the more expensive the service, the longer the lead time should be,

  • For a monthly subscription, observing customers 1-2 weeks before the monthly renewal, or about 1/2 to 3/4 of the way through the current month.
  • For annual subscriptions with consumers or small businesses, observe customers about one month before the annual renewal.
  • For annual subscriptions with large businesses, observe anywhere from  two to four months before the renewal; ninety days is typical.

Renew, Renew, Renew, Renew, Churn!

Of course you don’t just want to observe customers who churn – you also want to observe customers who renew. That way you can compare churns and renewals and see the difference.  In fact, you don’t just want to pick a few renewals – for the purpose of the analysis, you want to pick enough renewals to observe so that you observe renewals in your data set is in proportion to the true retention rate.

TAKEAWAY When you create a churn dataset, try to make the renewals in your data set in proportion to the true retention rate. Churns should be in your data set in proportion to the true churn rate.

So for example if you have a 5% churn rate and a 95% retention rate you want to make a set of observations that are also around 5% churns and 95% renewals.    That may sound complicated to arrange, but in fact it’s very easy: You just observe every renewal for every account, as well as the churns. This will naturally lead to you having about the same proportion of renewals and churns in your observations as your true churn rate.

If the subscription does not have a fixed term or it automatically renews, observation should be made based on when each payment is due.  Payments are typically due at fixed periods of time after the subscription begins, every month for most consumer subscriptions. For consistency with the churn observations that have a lead time, you should also apply the same lead time before each renewal or payment.  This scenario is illustrated in the drawing below: A subscription has periodic (e.g. monthly) payments and continues until cancelled. The observation dates selected will be the lead time before each payment is due. The subscription finally ends after the last paid month ends, and the churn observation is made the lead time before the end of that month – presumably that was when the customer was making the final determination in their mind to churn or renew.   This is why the title of this section is “Sequences of Renewals and a Churn” : typically you observe each account many times as they renew, and then only once when they churn.

Periodic Observations Creating A Churn Dataset

What if your product has subscriptions that are on different renewal or payment cycles?  For example, many products have both monthly and annual plans. There are actually multiple ways to handle this, but my advice is to observe all customers at the same frequency by assuming they are all on the same renewal or payment cycle.  The observation frequency to choose is the payment or renewal cycle which is the most common. Normally this is the time period that you use to quote your churn, so the choice should be obvious.

  • For a consumer subscription that reports a monthly churn rate, observe customers every month even if some renew or pay on annual contracts
  • For a business subscription that reports an annual churn rate, observe customers every year even if some pay or renew on monthly and quarterly

Remember: when in doubt, if it makes the most sense to quote your churn based on a particular time period (monthly, quarterly or annual) then that is probably the right time period to use when observing subscribers throughout their lifetime on the product.

If you think about it, in a way it doesn’t make sense to observe an annual customer mid way through a year because they don’t actually have a chance to churn at that point, and they are probably not really thinking about it.  But if you only observe the annual customers once a year, it complicates reproducing the churn and renewal rate in your data and will also make it harder to interpret the impact of customers being on the less common plan, a subject I will return to when you learn how to analyze the churn impact of plans in a later post.

Steps to Create a Churn Dataset

Here is an outline of the procedure I teach in the book to create a churn dataset for analysis.  The details and code are too much for a blog post, so check out chapter 4 of my book for all the details and github for the code samples. The main steps are as follows:

  1. Identify periods of time when each customer is subscribed to one or more subscriptions and when these periods end in churn. These are called active periods ending in churn.
  2. Identify periods of time when customers are subscribed to one or more subscriptions that are still ongoing and at the present point in time (no churn, yet).  These are called active periods that are ongoing
  3. Using these active periods, pick sequences of observation dates for each customer using the payment or renewal cycle and lead times as described in the last section.  Keep track of which of these observations are made in the lead time before an actual churn.
  4. The sequences of observation dates are used to pick metrics metrics saved in the data warehouse (as described in User Metrics 101).  The metric values along with the churn and observation details are selected in a single churn dataset, one observation per customer per line of the file.

Thats all for now!  Happy Churn Fighting! For more details on this subject and many more relating to churn and data analysis check out the electronic early edition of my book, Fighting Churn With Data.