Product demand forecasting has always been critical to decide how much inventory to buy, especially for brick-and-mortar grocery stores. More accurate forecasting with machine learning could prevent overstock of perishable goods or stockout of popular items.
This study aims for forecasting store sales for Corporación Favorita, a large Ecuadorian-based grocery retailer.
https://www.kaggle.com/competitions/store-sales-time-series-forecasting/data
store data:
- store_nbr: the store at which the products are sold
- family: the type of product sold
- sales: the total sales for a product family at a particular store at a given date
- onpromotion: the total number of items in a product family that were being promoted at a store at a given date.
- cluster: a grouping of similar stores
holiday data:
- type: holiday type
- locale: scope of the holiday
macro indicator:
- oil price: Ecuador is an oil-dependent country and it's economical health is highly vulnerable to shocks in oil prices.
- Data merging and cleaning (filling in missing values)
- Data visualisation
- Feature engineering (transforming categorical features)
- Modelling and prediction
- Multivariate Time Series model
- XGBoost algorithm
- The Normalised Root Mean Square Error (RMSE)for XGBoost is 0.005 which indicate that the simulated and observed data are close to each other showing a better accuracy.
- Sales are predicted for test dataset (outof-sample)
Most sales are made on Sunday:
Most sales are made in quarter 2:
Refrence: Big thanks to Kashish Rastogi: for the data visualisation dashboard.