Demand Forecasting at SKU Level: Methods, Accuracy, and Failure Modes
Rolling averages fail at the edges. ML models surface assumptions you didn't know you were making. A guide to building SKU-level demand forecasts that actually hold up in production.
Why Aggregate Forecasts Hide the Problem
Most enterprises forecast at the category or product family level, then allocate forecasts to individual SKUs using historical ratios. This approach works acceptably when demand patterns are stable and homogeneous within categories. It fails when individual SKUs have distinct demand drivers, seasonal patterns, or promotional elasticities.
SKU-level forecasting is harder — more models to build, more data to manage, more edge cases to handle — but it is the level at which replenishment decisions are actually made. Accurate aggregate forecasts that produce inaccurate SKU-level allocations don't reduce stockouts or overstock; they just push the inaccuracy downstream.
Method Selection by Demand Characteristic
- Continuous, high-volume SKUs: gradient boosting or neural models with rich feature sets including external signals
- Seasonal products: models that explicitly represent seasonal patterns — SARIMA, Prophet, or gradient boosting with Fourier features
- Intermittent demand (slow movers, long-tail SKUs): Croston's method or its variants, or Tweedie loss function models
- New products without history: analogous product bootstrapping, curve-fitting to adoption patterns, or causal models based on attributes
External Signals That Move the Needle
The biggest accuracy gains in demand forecasting often come not from more sophisticated models but from better input signals. Promotional calendars, price change schedules, weather forecasts, macroeconomic indicators, and competitor out-of-stock events are all signals that change demand but are absent from historical sales data alone.
Promotional uplift modelling deserves particular attention: the impact of promotions on demand is typically one of the largest and most predictable sources of demand variation, yet many forecasting systems treat it as noise rather than signal. A model that explicitly represents promotional uplift will outperform one that doesn't, regardless of the base algorithm.
Measuring Forecast Accuracy the Right Way
MAPE (mean absolute percentage error) is the most commonly used forecast accuracy metric and one of the most misleading. It overweights low-volume SKUs (where percentage errors are inherently higher) and is undefined for zero-demand periods.
Use WMAPE (weighted mean absolute percentage error) as your primary accuracy metric, weighting each SKU by its revenue or volume. This gives high-volume SKUs appropriate weight in the accuracy calculation. Supplement with bias metrics (are you systematically over- or under-forecasting?) and service level metrics (what fraction of demand is you meeting without stockout?) to get a complete picture of forecast quality.
Ready to Apply This in Your Organisation?
SmartPath AI builds and deploys production AI systems for enterprises. Schedule a strategy session to discuss your specific use case.
Schedule Strategy Session