Beryl Analytics Blog

Predicting Customer Lifetime Value: Models That Guide Where to Spend

By Beryl Analytics • 2 June 2026 • 8 min read

Customer lifetime value is the number that should sit behind almost every spending decision in a business, and yet most teams either ignore it or reduce it to a crude average that quietly misleads them. Averaging lifetime value across all customers tells you very little, because a small fraction of customers are usually worth many times the rest, and the cost of acquiring them differs just as widely. The useful version of CLV is predicted at the individual level: for this specific customer, how much value are they likely to generate, and how confident are we. This article covers the models that produce that prediction and, more importantly, how to spend against it.

What predicted CLV actually answers

Historical lifetime value looks backward and tells you what a customer has already spent. That is useful for reporting but useless for budgeting, because by the time the value has accrued, the spending decision is long past. Predicted CLV looks forward. It estimates the future value a customer will generate, which is the only thing you can actually act on when deciding how much to pay to acquire them or how hard to fight to keep them.

A good CLV prediction has three components: how likely the customer is to still be active, how often they will buy, and how much they will spend per purchase. Keeping these separate matters, because a customer who buys frequently but is drifting away is a very different problem from one who buys rarely but reliably. Collapsing them into one number hides the lever you would pull.

The probabilistic approach: BG/NBD and Gamma-Gamma

For businesses with non-contractual, repeat-purchase behavior, where customers can lapse silently without cancelling anything, a well-established pair of models does the heavy lifting. They are particularly strong for retail, e-commerce, and consumer apps.

BG/NBD for the buying pattern

The BG/NBD model (Beta-Geometric / Negative Binomial Distribution) predicts two things from a customer's purchase history: how many transactions they are likely to make in a future window, and the probability that they are still active at all. It only needs three inputs per customer: recency (how long since their last purchase), frequency (how many repeat purchases they have made), and the length of time you have observed them. From that, it separates customers who are simply slow buyers from those who have quietly churned, which an average can never do.

Gamma-Gamma for the monetary value

BG/NBD tells you how often a customer will buy; the Gamma-Gamma model tells you how much each purchase will be worth. It assumes that a customer's average spend varies around a personal mean and estimates that mean while accounting for the fact that customers with few orders give you a noisier signal. Multiply the predicted number of future transactions by the predicted value per transaction, discount for time, and you have a probabilistic CLV for every customer with an interpretable basis.

The appeal of this approach is that it works with minimal data, requires no labeled training set, and produces explainable outputs. The limitation is that it ignores everything except purchase timing and value. It does not know that a customer browsed three times last week, opened every email, or filed a support complaint.

The machine learning approach

When you have richer behavioral data, a supervised machine learning model can outperform the probabilistic approach by incorporating signals it cannot see. Here you frame CLV as a prediction problem: given everything you know about a customer in their first weeks, predict their value over the next 12 or 24 months. Useful features go well beyond transactions.

Behavioral signals: session frequency, feature usage, email engagement, app opens.
Acquisition context: the channel, campaign, and first product that brought them in, since these are strongly predictive of long-term value.
Early-purchase signals: first-order size, time to second purchase, and category breadth.
Support and satisfaction: ticket volume, resolution time, survey scores.

Gradient-boosted tree models tend to work well here because they handle mixed feature types and capture interactions without heavy tuning. The trade-off is that they need history to train on and ongoing care to avoid drifting as customer behavior changes. A practical pattern is to use the probabilistic model as a baseline and the ML model as the production system once you have enough labeled history, comparing the two so you always know what the added complexity buys you.

Spending against predicted CLV

A CLV model that does not change a budget is a research project, not an analytics asset. The point of the prediction is to move money toward customers worth keeping and away from those who are not.

Steer acquisition spend

Most marketing teams optimize toward cost per acquisition, which treats every new customer as equal. They are not. Feeding predicted CLV back into acquisition lets you bid more for the customers and channels that produce high-value buyers and pull back from channels that deliver cheap but worthless signups. The right ceiling on what you pay to acquire a customer is a fraction of their predicted value, not a flat number applied to everyone.

Target retention where it pays

Retention budgets are finite, so you want to spend them on customers who are both valuable and at risk. Crossing predicted CLV with churn risk gives you a simple, powerful grid: high-value and high-risk customers get a personal call or a meaningful offer, while low-value high-risk customers get a cheap automated touch or nothing at all. This is where CLV prediction and predictive analytics for churn work best together, because value without risk and risk without value each tell only half the story.

Common mistakes to avoid

Using revenue instead of margin. A high-revenue customer who returns half their orders or floods support may be unprofitable. Build CLV on contribution margin.
Ignoring the confidence of the estimate. A prediction for a customer with one purchase is far less reliable than for one with twenty. Act more cautiously on thin evidence.
Treating CLV as fixed. It is a live estimate that should update as the customer behaves, not a label assigned once at signup.

Takeaways

Predict CLV per customer, not on average; the distribution is the whole point.
BG/NBD plus Gamma-Gamma is a strong, low-data baseline for repeat-purchase businesses.
Machine learning wins when you have rich behavioral and acquisition data to feed it.
The payoff comes from spending against the prediction: bid more for high-value acquisition and aim retention at valuable customers who are also at risk.

FAQ

How much history do I need before CLV predictions are reliable?

The probabilistic models can produce useful estimates with a few months of transaction data, though they sharpen as customers accumulate repeat purchases. ML models generally need at least a full value window of labeled history, so 12 to 24 months, to learn the relationship between early signals and long-term value.

Should CLV be one number or a range?

Report a range. Two customers with the same expected value but very different certainty deserve different treatment, and a single point hides that. At minimum, flag low-confidence predictions so teams do not over-commit budget on thin evidence.

customer lifetime valueCLV predictionLTV modelpredictive marketing analytics

Want analytics that actually moves the number?

Beryl Analytics builds predictive models, data pipelines, and dashboards that drive decisions for businesses across New Zealand and Australia. We ship to production and prove the return.

Talk to Beryl Analytics