How to Forecast Demand in Supply Chain with Python

Telmo Silvaon December 5, 2024

Last updated on January 14, 2026

Imagine preparing for the holiday season, only to find warehouses filled with unsold goods or customers frustrated by stockouts. Demand forecasting is a tool that helps to eliminates such risks, turning uncertainty into actionable insights. It’s a critical tool for supply chain management, enabling businesses to predict inventory needs, optimize resources, and stay competitive in dynamic markets.

Demand forecasting refers to studying historical and current data to understand the internal and external factors affecting demand. The trend equation is then used to predict or ‘forecast’ what the market would be like in the short or long term.

There are several ways to determine demand forecasts, depending on the factors prioritized. Some organizations rely on historical data alone, while others incorporate real-time factors like promotions or market conditions. This article will explore how to implement demand forecasting models, focusing on the latest strategies and technologies that enable businesses to stay resilient amid market uncertainties.

pile of boxes interconnected in a warehouse

Building a Demand Forecasting Model with Python

Step 1 – Data Collection for Supply Chain Demand Forecasting

To effectively forecast demand, the first step is to gather historical sales data. The quality and richness of the data you collect will directly impact the accuracy of the model. The data should include key attributes such as:

Product IDs: Identifies the individual products whose demand will be forecasted.
Sales Quantities: The number of units sold per product on each date.
Dates: The date or time period associated with each sales record (daily, weekly, monthly, etc.).
Other Relevant Features: Additional variables that might affect demand, such as promotions, holidays, or special events. These external factors can significantly influence product demand and should be incorporated if available.

Data can be collected from various sources like sales databases, inventory management systems, and customer orders. Ensuring the data is accurate and comprehensive is essential before moving to the next stage.

Step 2 – Data Preprocessing and Cleaning

After collecting the data, preprocessing is the next critical step. The quality of the data plays a huge role in the model’s effectiveness. This stage includes:

Handling Missing Values: Incomplete data can skew the forecasting model’s performance. You can handle missing values in various ways:
- Forward Fill: This method fills missing values with the previous available data point (common for time series).
- Imputation: Replacing missing values with the mean, median, or a model-based approach.
- Dropping Missing Data: If a small number of data points are missing, you may choose to remove them, though this is only viable if the missing data is not too large.

Python

import pandas as pd

data = pd.read_csv("sales_data.csv", parse_dates=["Date"])
data = data.sort_values("Date")

print(data.info())
print(data.head())

import pandas as pd

data = pd.read_csv("sales_data.csv", parse_dates=["Date"])
data = data.sort_values("Date")

print(data.info())
print(data.head())

Encoding Categorical Variables: For categorical data such as product IDs or promotional events, encoding might be needed. Common methods include:
- One-Hot Encoding: Converts categorical variables into binary columns for each category.
- Label Encoding: Assigns a unique integer to each category (useful for ordinal data).

Normalizing/Standardizing Numerical Features: For numerical features such as sales volume, normalization or standardization can help ensure that the data is on the same scale, which is especially important for machine learning algorithms like gradient boosting or neural networks.

Data preprocessing ensures that the dataset is clean, complete, and in the right format for analysis.

Step 3 – Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is a crucial stage where you visualize and analyze the dataset to gain insights that can inform the forecasting process. The goal is to identify patterns, trends, and anomalies that can impact the demand forecasting model.

Plotting Sales Trends Over Time

One of the first steps in EDA is to visualize how the demand fluctuates over time. This helps in identifying trends, seasonality, and periodic behaviors.

Python

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.plot(data["Date"], data["Quantity"])
plt.title("Sales trends over time")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.show()

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
plt.plot(data["Date"], data["Quantity"])
plt.title("Sales trends over time")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.show()

Identifying Seasonal Patterns

Seasonality refers to periodic fluctuations in demand that occur at regular intervals (e.g., higher demand in summer or around holidays). Detecting seasonality allows you to adjust your forecasting model accordingly.

You can identify seasonality through:

Visual Inspection: Look for patterns in the plots over multiple periods.
Autocorrelation: Check for correlation between the demand at different time lags.

Checking for Outliers

Outliers are extreme data points that deviate significantly from other observations. These can distort the model’s performance, so it’s important to identify and decide how to handle them (e.g., remove or adjust them).

Outliers can be detected using:

Box Plots: Visualize the distribution and detect any extreme values.
Statistical Tests: Use methods like Z-scores to flag outliers.

Python

plt.figure(figsize=(10, 6))nplt.boxplot(data['Demand'])nplt.title('Demand Distribution')nplt.ylabel('Demand')nplt.show()

plt.figure(figsize=(10, 6))nplt.boxplot(data['Demand'])nplt.title('Demand Distribution')nplt.ylabel('Demand')nplt.show()

Interdependence of Data Exploration and Preprocessing

It’s important to note that the stages of data exploration and preprocessing are not independent of each other. Insights gathered from EDA can directly influence how you clean and transform the data. For example, if seasonal trends are observed during EDA, you may decide to:

Impute missing values differently (e.g., based on seasonal averages).
Remove outliers that could distort seasonal patterns.
Create new features like lag variables or rolling averages to better capture seasonality.

Thus, both stages work in tandem to ensure the data is fully prepared for modeling.

Step 4 – Feature Engineering for Better Forecast Accuracy

Feature engineering is a critical stage in building a demand forecasting model. It involves creating new features that help the model capture hidden patterns or trends in the data, which can improve forecasting accuracy. Below are key types of features commonly engineered for demand forecasting:

Lag Features

Lag features capture past sales data and use them as input for predicting future demand. These are crucial for time series models as they help incorporate the effect of historical demand on future sales.

Python

data["lag_7"] = data["Quantity"].shift(7)

data["lag_7"] = data["Quantity"].shift(7)

Here, shift(1) moves the demand column down by one row, allowing the model to use the demand from the previous day as a feature.

Rolling Averages

Rolling averages smooth out fluctuations in the demand data and are useful for capturing trends over time, especially for seasonal products. They can be created with a specific window size (e.g., 7 days or 30 days).

Python

data["rolling_avg_7"] = data["Quantity"].rolling(7).mean()

data["rolling_avg_7"] = data["Quantity"].rolling(7).mean()

This creates a new feature that represents the average demand over the last 7 days, which can help the model understand long-term trends.

Promotional Events

Promotions can significantly affect demand. You can create a binary feature that indicates whether a promotion was running on a specific date (1 for a promotion, 0 for no promotion).

Python

data["Promo"] = data["Date"].isin(promo_dates).astype(int)
data["Holiday"] = data["Date"].isin(holiday_dates).astype(int)

data["Promo"] = data["Date"].isin(promo_dates).astype(int)
data["Holiday"] = data["Date"].isin(holiday_dates).astype(int)

Here, promo_dates would be a list of dates when promotions occurred.

Holidays

Holidays often lead to increased or decreased demand, depending on the type of product. You can add a feature that flags holidays, as demand can vary during these periods.

Here, holiday_dates is a list of dates marked as holidays.

By creating these features, you enhance the model’s ability to capture important relationships in the data, making the forecasting model more robust.

Choosing the Right Supply Chain Forecasting Model

Selecting the right model is crucial for accurate demand forecasting. Depending on the nature of the data, different types of models may be appropriate. Here’s a breakdown of common model types:

Time Series Models

ARIMA	SARIMA	Prophet
Stands for AutoRegressive Integrated Moving Average	Stands for Seasonal ARIMA	Developed by Facebook
This model is used for univariate time series data with clear trends, leveraging past values and forecast errors to predict future values.	This model accounts for seasonality in time series data.	This model handles seasonality, holidays, and missing data.
Parameters: p (lag order), d (degree of differencing), q (order of moving average).	Parameters: p, d, q (ARIMA components) and P, D, Q (seasonal components) with the period of seasonality.	Parameters: Change points, seasonalities, holidays.
Use case: Good for stationary data without seasonal components.	Use case: Effective when the data shows seasonal patterns (e.g., monthly or quarterly sales).	Use case: Great for data with strong seasonal effects and missing values.

Machine Learning Models

Random Forest	Gradient Boosting
A random forest is an ensemble model that uses multiple decision trees to make predictions. It is suitable for capturing non-linear relationships in the data	These models build trees sequentially, with each tree trying to correct the errors of the previous one. They are powerful for time series forecasting when features are engineered. (e.g., XGBoost, LightGBM)
Parameters: Number of trees (n_estimators), Maximum depth (max_depth), Minimum samples split (min_samples_split), Minimum samples leaf (min_samples_leaf)	Parameters: Learning rate (eta), Maximum depth (max_depth), Subsample (subsample), Number of estimators (n_estimators)
Use case: Can be used for both regression and classification tasks.	Use case: Effective for complex, non-linear relationships in large datasets.

Deep Learning Models

LSTM (Long Short-Term Memory)	GRU (Gated Recurrent Unit)
LSTMs are a type of recurrent neural network (RNN) designed for sequence prediction. They are powerful for time series data that involves long-term dependencies.	GRU is a variation of LSTM with fewer parameters, making it faster to train. It’s also suitable for time series data.
Parameters: Number of layers, Hidden units, Dropout rate, Sequence length	Parameters: Number of layers, Hidden units, Dropout rate, Sequence length
Use case: Effective for capturing long-term dependencies and complex patterns in sequential data.	Use case: Good for problems where LSTM might be overkill or too slow.

Training and Evaluating Your Demand Forecasting Model

Once the features are engineered and the model is selected, the next step is to train the model and evaluate its performance.

Train/Test Split

The data is typically split into training and test sets to evaluate the model’s performance on unseen data. Common split ratios include 80/20 or 70/30, with 70-80% of the data used for training and the rest for testing.

Training the Model

Once the data is split, the model is trained using the training data. For example, if using XGBoost, it might look like this:

Python

from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

features = ["lag_7", "rolling_avg_7", "Promo", "Holiday"]
data = data.dropna()

train, test = train_test_split(data, test_size=0.2, shuffle=False)

model = XGBRegressor(n_estimators=200, learning_rate=0.05, max_depth=5)
model.fit(train[features], train["Quantity"])

preds = model.predict(test[features])
mae = mean_absolute_error(test["Quantity"], preds)
print(f"MAE : {mae:.2f}")

from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

features = ["lag_7", "rolling_avg_7", "Promo", "Holiday"]
data = data.dropna()

train, test = train_test_split(data, test_size=0.2, shuffle=False)

model = XGBRegressor(n_estimators=200, learning_rate=0.05, max_depth=5)
model.fit(train[features], train["Quantity"])

preds = model.predict(test[features])
mae = mean_absolute_error(test["Quantity"], preds)
print(f"MAE : {mae:.2f}")

Evaluation Metrics

To evaluate the model’s performance, we use metrics such as:

Root Mean Squared Error (RMSE): The square root of MSE, which gives error in the original units.
Mean Absolute Error (MAE): The average of the absolute differences between the predicted and actual demand values.
Mean Squared Error (MSE): The average of the squared differences between the predicted and actual values. Higher penalties are applied for larger errors.

Python

plt.figure(figsize=(10, 5))
plt.plot(test["Date"], test["Quantity"], label="Actuals")
plt.plot(test["Date"], preds, label="Forecast")
plt.legend()
plt.title("Demand forecasting with GXBoost")
plt.show()

plt.figure(figsize=(10, 5))
plt.plot(test["Date"], test["Quantity"], label="Actuals")
plt.plot(test["Date"], preds, label="Forecast")
plt.legend()
plt.title("Demand forecasting with GXBoost")
plt.show()

Forecasting with Python

Once the model is trained and evaluated, you can use it to make future demand predictions. This is done by providing the model with the most recent data and allowing it to predict future demand values.

Example of forecasting future demand:

Inventory Optimization Based on Forecasts

The final step involves using the forecasted demand to optimize inventory levels. Accurate demand forecasts allow businesses to avoid stockouts (which lead to lost sales) or overstocking (which leads to excess storage costs).

Inventory Adjustment

Safety Stock: Based on forecasted demand and lead time, safety stock can be calculated to buffer against uncertainties.
Reorder Points: Set reorder points based on forecasted demand and the time it takes to receive new stock.

Forecasting and Reorder Points in Python

Python

lead_time = 7  # days
safety_stock = preds.std() * 1.65  # 95% service level
reorder_point = preds.mean() * lead_time + safety_stock

lead_time = 7  # days
safety_stock = preds.std() * 1.65  # 95% service level
reorder_point = preds.mean() * lead_time + safety_stock

With accurate demand forecasting, inventory optimization strategies can be applied to balance inventory levels, reducing costs and improving service levels.

Conclusion

Mastering demand forecasting in the supply chain is vital for optimizing operations and ensuring that businesses can meet customer needs while minimizing costs. By combining robust Python-based forecasting techniques with tools like ClicData, businesses can gain deeper insights into demand patterns and improve their decision-making processes.

ClicData’s integration capabilities allow analysts to seamlessly bring Python forecasting models into dynamic BI dashboards, providing real-time, actionable insights. This combination of powerful machine learning and BI tools empowers companies to respond quickly to fluctuations in demand, adapt their supply chain strategies, and maintain a competitive edge in the market. As supply chains become more complex, leveraging advanced, AI-driven tools like ClicData ensures that businesses stay ahead of the curve, maintaining accuracy, agility, and cost-efficiency in their operations.

How to Forecast Demand in Supply Chain with Python

Building a Demand Forecasting Model with Python

Step 1 – Data Collection for Supply Chain Demand Forecasting

Step 2 – Data Preprocessing and Cleaning

Step 3 – Exploratory Data Analysis (EDA)

Plotting Sales Trends Over Time

Identifying Seasonal Patterns

Checking for Outliers

Interdependence of Data Exploration and Preprocessing

Step 4 – Feature Engineering for Better Forecast Accuracy

Lag Features

Rolling Averages

Promotional Events

Holidays

Choosing the Right Supply Chain Forecasting Model

Time Series Models

Machine Learning Models

Deep Learning Models

Training and Evaluating Your Demand Forecasting Model

Train/Test Split

Training the Model

Evaluation Metrics

Forecasting with Python

Inventory Optimization Based on Forecasts

Inventory Adjustment

Forecasting and Reorder Points in Python

Conclusion

Table of Contents

Share this Blog

Other Blogs

When Dashboards Are Ignored: A 6-Step Adoption Plan

Automated Dashboard Alerts: How to Build a BI System That Notifies You Before Problems Escalate

Claude is Writing Your Code, But Who is Running Your Data Platform?