📌 TLDR

In this article, we have demonstrated the power of the beta-geometric distribution in predicting customer lifetime value, tackling the inherent challenges posed by variable churn rates and changing customer behaviors. Moreover, we not only estimated the customer retention rates but also forecasted the lifetime value, a crucial metric for any subscription-based business. The predictive accuracy, as indicated by MAPE, shows that the BG model can be a reliable tool in CLV predictions, offering businesses a more precise lens to view and strategize their customer retention and revenue generation strategies.

📊 Customer Lifetime Value (CLV or LTV)

Customer Lifetime Value (CLV or LTV) is a metric that represents the total net profit a company can expect to earn from a single customer throughout their time as a customer. It is an important metric as it helps businesses understand how much value different customers bring over their lifetime and can guide resource allocation, marketing strategies, and pricing policies. Here are some basic elements often used to calculate LTV

Metric Description Formula
Average Purchase Value (APV) It is calculated as the total revenue divided by the number of purchases during a specific period \[ \frac{\text{Total Revenue}}{\text{Number of Purchases}} \]
Average Purchase Frequency (APF) This is the average number of purchases made by a customer during a specific period \[ \frac{\text{Number of Purchases}}{\text{Number of Unique Customers}} \]
Customer Value (CV) This is calculated as the average purchase value multiplied by the average purchase frequency \[ \text{APV} \times \text{APF} \]
Average Customer Lifespan (ACL) This is the average number of periods a customer continues to buy from the company \[ \frac{\text{Sum of Customer Lifespans}}{\text{Number of Customers}} \]

Combining these elements, we can define the Customer Lifetime Value using the following formula:

\[ \text{CLV} = \text{CV} \times \text{ACL} \]

Alternatively, we can utilize several crucial business metrics to ascertain the lifetime value of a customer.

\[CLV = (ARPU \times \text{Gross Margin}) / \text{Churn Rate}\]

Metric Description Formula
Average Revenue per User This metric provides an understanding of the revenue generated per user or customer \[\frac{\text{Total Revenue}}{\text{Number of Users (or Customers)}}\]
Gross Margin Representing the profitability, it is the proportion of revenue retained as profit after accounting for the direct costs of producing goods and services \[\frac{\text{Total Revenue - Cost of Goods Sold}}{\text{Total Revenue}}\]
Churn Rate It indicates the percentage of customers that stopped using a company’s product or service during a certain timeframe \[\frac{\text{Number of Customers Lost in a Period}}{\text{Number of Customers at the Start of the Period}}\]

By incorporating ARPU and Gross Margin, we essentially derive a metric representing the net revenue per user, which is then divided by the churn rate to get a value representing the lifetime value of a customer.

🚫 Challenges

Calculating CLV isn’t a direct endeavor. The formulas mentioned earlier are based on several simplifying assumptions:

  1. The churn rate is consistent month-to-month.

  2. All customers exhibit similar churn behaviors.

  3. The company’s offerings, encompassing both pricing and competitive dynamics, remain uniform.

Nevertheless, the shifted-Beta-Geometric model presents a more sophisticated approach for determining and forecasting CLV.

🔬 Shifted-Beta-Geometric Model

A hypothetical firm selected a user cohort, for instance, users of a specific subscription (e.g. monthly subscription) hailing from a specific country, to analyze their monthly subscription retention rates over the recent months and refine its Customer Lifetime Value (CLV) predictions based on recent monthly subscription retention rates. To enhance the precision of our CLV prediction, it is beneficial to formulate more detailed cohorts by incorporating additional attributes of subscribers that can potentially influence churn rates.

The retention rates were recorded as follows:

Month Retention
0 100%
1 41%
2 24%
3 17%
4 12%
5 9%
6 6%
7 5%

To accurately forecast customer retention and churn, we’re utilizing the proven techniques of esteemed scholars Pete Fader and Bruce Hardie, incorporating their shifted-beta-geometric (sBG) model into our analytical toolbox.

Therefore, we will define a few functions, including churnBG and survivalBG, that are built upon the concepts of beta-geometric distribution to calculate churn and survival probabilities, respectively. The MLL function defines the maximum likelihood estimation which helps in estimating the parameters α and β for our BG model.

📉 Churn Rate

# Define a function to calculate churn using the Shifted Beta Geometric (sBG) distribution
churnBG <- Vectorize(function(alpha, beta, period) {
        # Calculate the churn probability for the first period
        t1 = alpha / (alpha + beta)
        result = t1
        # Recursively calculate the churn probability for later periods
        if (period > 1) {
                result = churnBG(alpha, beta, period - 1) * (beta + period - 2) / (alpha + beta + period - 1)
        }
        return(result)
        # The vectorize.args parameter specifies which argument(s) to vectorize over
}, vectorize.args = c("period"))

The churnBG function systematically refines the prediction of the current rate of customer churn by taking a close look at the history of customer behavior, step by step, back to the very beginning. This way, it considers the entire journey and not just recent trends, offering a richer, more detailed picture that can help foresee customer churn with higher precision. It’s a smart approach, learning from the past to predict the future, enhancing the reliability of its predictions over time.

📈 Survival Rate

# Define a function to calculate survival using the sBG distribution
survivalBG <- Vectorize(function(alpha, beta, period) {
        # Calculate the survival probability for the first period
        t1 = 1 - churnBG(alpha, beta, 1)
        result = t1
        # Recursively calculate the survival probability for later periods
        if(period > 1){
                result = survivalBG(alpha, beta, period - 1) - churnBG(alpha, beta, period)
        }
        return(result)
        # The vectorize.args parameter specifies which argument(s) to vectorize over
}, vectorize.args = c("period"))

The survivalBG function calculates the “survival probability,” a metric that shows the likelihood of customers continuing their subscription over a certain period.

Here’s the process:

  1. In the initial period, it identifies what portion of customers have stayed on.

  2. For subsequent periods, it constantly refines this figure, updating it based on the most recent data to offer a current and informed view of customer retention.

In essence, it builds a dynamic picture of customer retention, getting more precise with each period, helping us strategize more effectively for the future.

🎯 Maximum Log-Likelihood

# Define a function to calculate the maximum log-likelihood
MLL <- function(alphabeta) {
        # Check that the activeCust and lostCust vectors are the same length
        if(length(activeCust) != length(lostCust)) {
                # Stop the function and print an error message if they are not
                stop("Variables activeCust and lostCust have different lengths: ",
                     length(activeCust), " and ", length(lostCust), ".")
        }
        # Define t as the number of periods
        t = length(activeCust) 
        # Define alpha and beta from the input vector alphabeta
        alpha = alphabeta[1]
        beta = alphabeta[2]
        # Calculate the negative log-likelihood
        return(-as.numeric(
                # Sum of the log-likelihoods for each period, considering churned customers
                sum(lostCust * log(churnBG(alpha, beta, 1:t))) +
                # Log-likelihood for the last period, considering the active customers
                activeCust[t]*log(survivalBG(alpha, beta, t))
        ))
}

The MLL function is a crucial tool in refining our understanding of customer retention patterns over time. Its mission is to find the optimal alpha and beta parameters that are most aligned with our existing data, based on a beta distribution.

The function computes the Maximum Log-Likelihood, a statistic representing how well the chosen alpha and beta parameters explain the observed customer behavior in our dataset. It uses the churn and survival probabilities - derived from the beta distribution - in its calculations. The beta distribution here is a mathematical function used to model the random behavior of our customer base, with alpha and beta shaping its characteristics to mirror real-world scenarios accurately.

🔮Retention Predicition

We transformed the data to have additional columns indicating active and lost customers.

month_lt retention_rate activeCust lostCust
0 1.000000 10000.00 0.00
1 0.412302 4123.02 5876.98
2 0.241252 2412.52 1710.50
3 0.170753 1707.53 704.99
4 0.124047 1240.47 467.06
5 0.093038 930.38 310.09
6 0.066704 667.04 263.34
7 0.052063 520.63 146.41

Now we are using historical data to anticipate customer retention rates over a period of 7 months. After each month, we adjust our predictive parameters to better represent our historical trends, and employ a survival function methodology to forecast the customer retention for every subsequent month.

# Initiating a list to store predictions for each month from 1 to 7
ret_data_preds <- vector('list', 7)
# Looping over each month from 1 to 7 to generate predictions for each month
for (i in c(1:7)) {
        # Filtering the data to include only the rows where 'month_lt' is between 1 and i 
        # and where 'example' column equals 'ret_data'
        ret_df_filt <- ret_data_df %>%
                filter(between(month_lt, 1, i) == TRUE)
        # Extracting 'activeCust' and 'lostCust' columns as vectors to use in the MLL function
        activeCust <- c(ret_df_filt$activeCust)
        lostCust <- c(ret_df_filt$lostCust)
        # Optimizing the alpha and beta parameters by minimizing the MLL function 
        opt <- optim(c(1, 1), MLL)
        # Using the optimized parameters to predict retention using the survival function
        retention_pred <- round(c(1, survivalBG(alpha = opt$par[1], beta = opt$par[2], c(1:7))), 3)
         
        # Creating a data frame to store the month, 'ret_data' as example, the number of months used in prediction, 
        # and the predicted retention
        df_pred <- data.frame(month_lt = c(0:7),
                              fact_months = i,
                              retention_pred = retention_pred)
        # Storing the data frame in the list
        ret_data_preds[[i]] <- df_pred
}
# Combining all the data frames in the list into a single data frame
ret_data_preds<- as.data.frame(do.call('rbind', ret_data_preds))

By merging our original dataset with the predicted retention data, we calculate the Mean Absolute Percentage Error (MAPE) for each observation. This is essentially a measure of the accuracy of our predictions, calculated as the absolute difference between the actual and predicted values, expressed as a percentage of the actual values.

Finally, we visualized these average MAPE values, depicting how the prediction error changes after observing data over time. The visualization also underscores a quick improvement in accuracy after the first month, offering an indication of improved prediction accuracy after every additional month and data collected from it.

It is a positive sign that the MAPE is on a declining trend, as it suggests the model is becoming progressively more reliable in its predictions. Despite the encouraging trend, it is essential to continually monitor and fine-tune the model to maintain or even enhance its predictive accuracy further.

💰Lifetime Value Prediction

Finally, let’s imagine that we want to know average CLV of a user in his/her first 7 months. We can utilized the predicted retention rates to estimate the Lifetime Value (LTV) over a period of time, calculating both the monthly and cumulative LTV.

# Defining a subscription price
p = 10
# Filtering the data frame to only include the data for the 'ret_data' example and for month_lt between 1 and i (where i is dynamically specified in the loop)
ret_df_filt <- ret_data_df %>%
                filter(between(month_lt, 1, i) == TRUE )
# Assigning the activeCust and lostCust vectors based on the filtered data frame         
activeCust <- c(ret_df_filt$activeCust)
lostCust <- c(ret_df_filt$lostCust)
# Finding the optimal parameters alpha and beta for the shifted beta geometric (sBG) distribution using the optim function and the Maximum Log-Likelihood (MLL) function
opt <- optim(c(1, 1), MLL)
# Predicting the retention rates for the 3rd to the 7th month based on the optimal parameters found
retention_pred <- round(c(survivalBG(alpha = opt$par[1], beta = opt$par[2], c(3:7))), 3)
# Creating a data frame to store the predicted retention rates along with the corresponding month_lt values
data_df_pred <- data.frame(month_lt = c(3:7),
                      retention_pred = retention_pred)
# Creating a new data frame to calculate the LTV
data_df_ltv <- ret_data_df %>%
        # Filtering the data frame to include data for the 'ret_data' example and month_lt between 0 and 2
        filter(between(month_lt, 0, 2) == TRUE) %>%
        # Selecting only the month_lt and retention_rate columns
        select(month_lt, retention_rate) %>%
        # Adding the predicted data for the months 3 to 7 to the existing data for months 0 to 2
        bind_rows(., data_df_pred) %>%
        # Creating new columns: 
        # - 'retention_rate_calc': combining actual and predicted retention rates
        # - 'ltv_monthly': calculating the monthly LTV based on the retention rate and a fixed revenue value of 10
        # - 'ltv_cum': calculating the cumulative LTV over the months
        mutate(retention_rate_calc = ifelse(is.na(retention_rate), retention_pred, retention_rate),
               ltv_monthly = retention_rate_calc * p,
               ltv_cum = round(cumsum(ltv_monthly), 2))

At the end of 7 months, the cumulative LTV is found to be 21.43 USD, which suggests that a customer is expected to bring a value of 21.43 USD on average over a span of 7 months:

month_lt retention_rate retention_pred retention_rate_calc ltv_monthly ltv_cum
0 1.000000 NA 1.000000 10.00000 10.00
1 0.412302 NA 0.412302 4.12302 14.12
2 0.241252 NA 0.241252 2.41252 16.54
3 NA 0.159 0.159000 1.59000 18.13
4 NA 0.115 0.115000 1.15000 19.28
5 NA 0.088 0.088000 0.88000 20.16
6 NA 0.070 0.070000 0.70000 20.86
7 NA 0.057 0.057000 0.57000 21.43

🔍 Model evaluation

In the final stage, we utilized the Mean Absolute Percentage Error (MAPE) to gauge the accuracy of our BG model’s predictions. To determine the preciseness of the retention rate predictions crafted by our model, we computed the MAPE for the projected values spanning the 3rd to 7th month, leveraging the data from the 1st and 2nd month.

# MAPE of predicted
MAPE(data_df_ltv$retention_pred[4:8],ret_data[4:8])
## [1] 0.06803042

In our case, the MAPE value is 0.06803042 or 6.803042% This indicates that, on average, our predictions are deviating from the actual values by a margin of approximately 6.8%. A MAPE value below 10% is generally considered to represent a good fit of the model to the data, suggesting that our predictive model is performing fairly well, with a relatively low error rate. This low MAPE score signifies a high level of accuracy in the predictions generated for the retention rates from the 3th to the 7th months.

🌟 Applications and Importance of Predicted Customer Lifetime Value (CLV)

By understanding the long-term worth of a customer, organizations can make more informed decisions in several areas:

  • A/B Testing: With a reliable estimate of CLV, businesses can execute more accurate A/B testing. When understanding the potential long-term revenue from different user segments or product features, they can prioritize tests that may have a significant impact on CLV, leading to more strategic and impactful testing decisions.

  • User Acquisition Strategy: Knowing the average CLV allows companies to determine how much they can responsibly spend on acquiring a new customer. For instance, if a customer’s predicted CLV is $100, a business might decide it’s feasible to spend up to $30 or $40 to acquire that customer, ensuring a positive return on investment.

  • Resource Allocation: Predicted CLV can guide businesses on where to allocate resources. Segments with higher CLVs might receive more attention, personalized services, or exclusive offers to ensure their continued loyalty.

  • Forecasting and Budgeting: With an understanding of the average CLV, businesses can make more accurate forecasts regarding future revenue. This is especially useful for budgeting and financial planning, ensuring that projections are grounded in data-driven insights.

In essence, a robust prediction of CLV acts as a compass, guiding businesses in making decisions that are not just focused on immediate gains, but on long-term profitability and growth. Embracing such analytical approaches can pave the way for sustained business growth and customer satisfaction.

Sources

  1. Fader, Pete & Hardie, Bruce. (2007). HOW TO PROJECT CUSTOMER RETENTION. Journal of Interactive Marketing.

  2. Bryl’, Serhii. (2018). LTV prediction for a recurring subscription with R. AnalyzeCore.

For more projects and information about me, check out my website.