Beryl Analytics Blog

Feature Engineering: The Highest-Leverage Step Most Analytics Teams Skip

By Beryl Analytics • 21 May 2026 • 8 min read

When a predictive model underperforms, the instinct is to reach for a more powerful algorithm. More often the real lever is the data going into the model rather than the model itself. Feature engineering, the work of turning raw business data into informative inputs, is consistently the highest-leverage step in building a useful model, and it is the step most teams skip in their rush to try the latest architecture. This article explains why feature work usually beats a fancier model, walks through the core techniques for tabular business data, and covers the single most dangerous mistake: target leakage.

Why features beat models for business data

For most tabular business problems (churn, demand, fraud, propensity), the relationship between the raw data and the outcome is not obvious to any algorithm in its raw form. A timestamp, a customer ID, and a transaction amount carry enormous signal, but only once you transform them into something a model can use. A more powerful algorithm cannot recover signal that is not present in the inputs. A well-engineered feature makes the signal explicit, and at that point even a simple model performs well. This is why experienced practitioners spend the majority of their time on features and a minority on model selection. The leverage is in the inputs.

The core techniques

Encodings: making categories usable

Models work on numbers, so categorical fields like country, product category, or plan tier need encoding. The naive approach, assigning each category an arbitrary integer, implies a false ordering and usually hurts. Better options include one-hot encoding for low-cardinality fields (a column per category) and target-aware encodings for high-cardinality fields like postal code or product ID, where you replace the category with a summary of the outcome for that category. Target-aware encodings are powerful and also the most common source of leakage, so they must be computed carefully, a point we return to below.

Aggregations: turning history into signal

The single most valuable class of features in business data is aggregations over a customer's or entity's history. The raw fact that an order happened is weak. The fact that a customer placed twelve orders in the last 90 days, averaging a certain value, with a declining trend, is enormously predictive of churn. Build features like counts, sums, averages, minimums, maximums, and recency over relevant windows. For a churn model, features like days since last purchase, number of support tickets in the last month, and change in monthly spend often carry more signal than any demographic field. Aggregations convert a flat table into a behavioural profile, and behaviour predicts behaviour.

Time-based features: extracting structure from timestamps

A raw timestamp is nearly useless to a model, but it is dense with signal once decomposed. Extract day of week, hour of day, month, and whether it is a weekend or holiday. Compute durations: time since signup, time since last activity, average gap between purchases. Build trend features that capture acceleration or deceleration in behaviour. For forecasting, lag features (the value last week, last month) and rolling statistics are often the backbone of the whole model. Time is where most business signal lives, and most of it is locked up until you engineer it out.

Target leakage: the mistake that ruins models silently

Target leakage is when a feature contains information that would not be available at prediction time, or that is derived from the outcome you are trying to predict. It is catastrophic precisely because it looks like success: your model scores brilliantly in testing and then fails completely in production, because the leaked signal it relied on does not exist when you actually need to make a prediction.

The classic case: predicting churn using a feature like account closed date. Of course it predicts churn perfectly; it is the churn. Anything causally downstream of the outcome leaks.
The subtle case: computing a target-aware encoding or an aggregate using the full dataset, including the rows you are trying to predict. The feature then quietly encodes the answer. Compute such features using only data available before the prediction point, and respect time order when you build training and test splits.
The temporal case: splitting data randomly when the problem is a forecast. If future rows leak into training, your evaluation is fiction. For any time-dependent problem, split by time, training on the past and testing on the future.

The defence is a simple question asked of every feature: at the moment I make this prediction in production, will I genuinely have this value, computed only from information available at that moment? If the answer is no, the feature leaks. Building this discipline in from the start saves teams from the demoralising experience of a model that aces every test and fails every customer. If you want a second set of eyes on a model that looks too good to be true, Beryl Analytics audits feature pipelines specifically for leakage and production validity.

A practical workflow

Approach feature engineering as a loop. Start by understanding the prediction moment: what is known, and when. Build a first set of features grounded in domain knowledge (the aggregations and time features a human expert would look at). Validate against a time-respecting split. Inspect which features the model relies on, and be suspicious of any single feature that dominates, because dominance is often a leakage smell. Iterate. In our experience this loop, run patiently, produces far larger gains than swapping algorithms, and it produces models that survive contact with production rather than collapsing on their first real prediction.

Takeaways

For tabular business data, feature engineering usually beats reaching for a fancier model.
Encode categories thoughtfully, aggregate history into behavioural features, and decompose timestamps.
Aggregations over an entity's history are the single most valuable feature class for churn and demand.
Target leakage looks like success and ruins models in production; interrogate every feature for it.
Split by time for any time-dependent problem, and distrust any single dominant feature.

Frequently asked questions

How do I know if I have leakage? A model that scores far better than seems plausible, or a single feature with overwhelming importance, are the two strongest warning signs. Trace each top feature back to its source and ask whether it would exist at prediction time.

Can automated feature tools replace this work? They help generate candidates, but they do not understand your prediction moment or business context, and they can manufacture leakage at scale. Human judgement on what is available when remains essential. Contact us if you want help building a leakage-safe pipeline.

feature engineeringmachine learning featuresdata preparationpredictive features

Want analytics that actually moves the number?

Beryl Analytics builds predictive models, data pipelines, and dashboards that drive decisions for businesses across New Zealand and Australia. We ship to production and prove the return.

Talk to Beryl Analytics