Plans & PricingSignup for Free

How to Forecast Demand in Supply Chain with Python

By Telmo Silva on December 5, 2024

Imagine preparing for the holiday season, only to find warehouses filled with unsold goods or customers frustrated by stockouts. Demand forecasting is a tool that helps to eliminates such risks, turning uncertainty into actionable insights. It’s a critical tool for supply chain management, enabling businesses to predict inventory needs, optimize resources, and stay competitive in dynamic markets.

Demand forecasting refers to studying historical and current data to understand the internal and external factors affecting demand. The trend equation is then used to predict or ‘forecast’ what the market would be like in the short or long term.

There are several ways to determine demand forecasts, depending on the factors prioritized. Some organizations rely on historical data alone, while others incorporate real-time factors like promotions or market conditions. This article will explore how to implement demand forecasting models, focusing on the latest strategies and technologies that enable businesses to stay resilient amid market uncertainties.

pile of boxes interconnected in a warehouse

Implementing Demand Forecasting Model Using Python

1. Data Collection

To effectively forecast demand, the first step is to gather historical sales data. The quality and richness of the data you collect will directly impact the accuracy of the model. The data should include key attributes such as:

  • Product IDs: Identifies the individual products whose demand will be forecasted.
  • Sales Quantities: The number of units sold per product on each date.
  • Dates: The date or time period associated with each sales record (daily, weekly, monthly, etc.).
  • Other Relevant Features: Additional variables that might affect demand, such as promotions, holidays, or special events. These external factors can significantly influence product demand and should be incorporated if available.

Data can be collected from various sources like sales databases, inventory management systems, and customer orders. Ensuring the data is accurate and comprehensive is essential before moving to the next stage.

2. Data Preprocessing

After collecting the data, preprocessing is the next critical step. The quality of the data plays a huge role in the model’s effectiveness. This stage includes:

  • Handling Missing Values: Incomplete data can skew the forecasting model’s performance. You can handle missing values in various ways:
    • Forward Fill: This method fills missing values with the previous available data point (common for time series).
    • Imputation: Replacing missing values with the mean, median, or a model-based approach.
    • Dropping Missing Data: If a small number of data points are missing, you may choose to remove them, though this is only viable if the missing data is not too large.
Python
data.fillna(method='ffill', inplace=True)  # Forward fill
  • Encoding Categorical Variables: For categorical data such as product IDs or promotional events, encoding might be needed. Common methods include:
    • One-Hot Encoding: Converts categorical variables into binary columns for each category.
    • Label Encoding: Assigns a unique integer to each category (useful for ordinal data).
Python
data = pd.get_dummies(data, columns=['Product_ID'])
  • Normalizing/Standardizing Numerical Features: For numerical features such as sales volume, normalization or standardization can help ensure that the data is on the same scale, which is especially important for machine learning algorithms like gradient boosting or neural networks.
Python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data['Demand'] = scaler.fit_transform(data[['Demand']])

Data preprocessing ensures that the dataset is clean, complete, and in the right format for analysis.

3. Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is a crucial stage where you visualize and analyze the dataset to gain insights that can inform the forecasting process. The goal is to identify patterns, trends, and anomalies that can impact the demand forecasting model.

Plotting Sales Trends Over Time

One of the first steps in EDA is to visualize how the demand fluctuates over time. This helps in identifying trends, seasonality, and periodic behaviors.

Python
import matplotlib.pyplot as plt
# Plot demand over time
plt.figure(figsize=(10, 6))
plt.plot(data['Date'], data['Demand'], label='Demand')
plt.title('Demand Over Time')
plt.xlabel('Date')
plt.ylabel('Demand')
plt.show()

Identifying Seasonal Patterns

Seasonality refers to periodic fluctuations in demand that occur at regular intervals (e.g., higher demand in summer or around holidays). Detecting seasonality allows you to adjust your forecasting model accordingly.

You can identify seasonality through:

  • Visual Inspection: Look for patterns in the plots over multiple periods.
  • Autocorrelation: Check for correlation between the demand at different time lags.

For instance, using autocorrelation in Python:

Python
from pandas.plotting import lag_plot

lag_plot(data['Demand'])
plt.title('Autocorrelation of Demand')
plt.show()

Checking for Outliers

Outliers are extreme data points that deviate significantly from other observations. These can distort the model’s performance, so it’s important to identify and decide how to handle them (e.g., remove or adjust them).

Outliers can be detected using:

  • Box Plots: Visualize the distribution and detect any extreme values.
  • Statistical Tests: Use methods like Z-scores to flag outliers.
Python
plt.figure(figsize=(10, 6))
plt.boxplot(data['Demand'])
plt.title('Demand Distribution')
plt.ylabel('Demand')
plt.show()

Interdependence of Data Exploration and Preprocessing

It’s important to note that the stages of data exploration and preprocessing are not independent of each other. Insights gathered from EDA can directly influence how you clean and transform the data. For example, if seasonal trends are observed during EDA, you may decide to:

  • Impute missing values differently (e.g., based on seasonal averages).
  • Remove outliers that could distort seasonal patterns.
  • Create new features like lag variables or rolling averages to better capture seasonality.

Thus, both stages work in tandem to ensure the data is fully prepared for modeling.

4. Feature Engineering

Feature engineering is a critical stage in building a demand forecasting model. It involves creating new features that help the model capture hidden patterns or trends in the data, which can improve forecasting accuracy. Below are key types of features commonly engineered for demand forecasting:

Lag Features

Lag features capture past sales data and use them as input for predicting future demand. These are crucial for time series models as they help incorporate the effect of historical demand on future sales.

Python
# Create a lag feature of the previous day's demand
data['lag_1'] = data['Demand'].shift(1)
# Create a lag feature of the past 7 days
data['lag_7'] = data['Demand'].shift(7)

Here, shift(1) moves the demand column down by one row, allowing the model to use the demand from the previous day as a feature.

Rolling Averages

Rolling averages smooth out fluctuations in the demand data and are useful for capturing trends over time, especially for seasonal products. They can be created with a specific window size (e.g., 7 days or 30 days).

Python
# Create a 7-day rolling average feature
data['rolling_avg_7'] = data['Demand'].rolling(window=7).mean()

This creates a new feature that represents the average demand over the last 7 days, which can help the model understand long-term trends.

Promotional Events

Promotions can significantly affect demand. You can create a binary feature that indicates whether a promotion was running on a specific date (1 for a promotion, 0 for no promotion).

Python
# Create a 'Promotion' column based on event dates
data['Promotion'] = data['Date'].apply(lambda x: 1 if x in promo_dates else 0)

Here, promo_dates would be a list of dates when promotions occurred.

Holidays

Holidays often lead to increased or decreased demand, depending on the type of product. You can add a feature that flags holidays, as demand can vary during these periods.

Python
# Add a column for holidays
data['Holiday'] = data['Date'].apply(lambda x: 1 if x in holiday_dates else 0)

Here, holiday_dates is a list of dates marked as holidays.

By creating these features, you enhance the model’s ability to capture important relationships in the data, making the forecasting model more robust.

5. Model Selection

Selecting the right model is crucial for accurate demand forecasting. Depending on the nature of the data, different types of models may be appropriate. Here’s a breakdown of common model types:

Time Series Models

ARIMASARIMAProphet
Stands for AutoRegressive Integrated Moving AverageStands for Seasonal ARIMADeveloped by Facebook
This model is used for univariate time series data with clear trends, leveraging past values and forecast errors to predict future values.This model accounts for seasonality in time series data.This model handles seasonality, holidays, and missing data.
Parameters: p (lag order), d (degree of differencing), q (order of moving average).Parameters: p, d, q (ARIMA components) and P, D, Q (seasonal components) with the period of seasonality.Parameters: Change points, seasonalities, holidays.
Use case: Good for stationary data without seasonal components.Use case: Effective when the data shows seasonal patterns (e.g., monthly or quarterly sales).Use case: Great for data with strong seasonal effects and missing values.

Machine Learning Models

Random ForestGradient Boosting
A random forest is an ensemble model that uses multiple decision trees to make predictions. It is suitable for capturing non-linear relationships in the dataThese models build trees sequentially, with each tree trying to correct the errors of the previous one. They are powerful for time series forecasting when features are engineered. (e.g., XGBoost, LightGBM)
Parameters: Number of trees (n_estimators), Maximum depth (max_depth), Minimum samples split (min_samples_split), Minimum samples leaf (min_samples_leaf)Parameters: Learning rate (eta), Maximum depth (max_depth), Subsample (subsample), Number of estimators (n_estimators)
Use case: Can be used for both regression and classification tasks.Use case: Effective for complex, non-linear relationships in large datasets.

Deep Learning Models

LSTM (Long Short-Term Memory)GRU (Gated Recurrent Unit)
LSTMs are a type of recurrent neural network (RNN) designed for sequence prediction. They are powerful for time series data that involves long-term dependencies.GRU is a variation of LSTM with fewer parameters, making it faster to train. It’s also suitable for time series data.
Parameters: Number of layers, Hidden units, Dropout rate, Sequence lengthParameters: Number of layers, Hidden units, Dropout rate, Sequence length
Use case: Effective for capturing long-term dependencies and complex patterns in sequential data.Use case: Good for problems where LSTM might be overkill or too slow.

6. Model Training and Evaluation

Once the features are engineered and the model is selected, the next step is to train the model and evaluate its performance.

Train/Test Split

The data is typically split into training and test sets to evaluate the model’s performance on unseen data. Common split ratios include 80/20 or 70/30, with 70-80% of the data used for training and the rest for testing.

Python
from sklearn.model_selection import train_test_split
# Split the data into training and test sets
train_data, test_data = train_test_split(data, test_size=0.2, shuffle=False)

Training the Model

Once the data is split, the model is trained using the training data. For example, if using XGBoost, it might look like this:

Python
import xgboost as xgb

# Train XGBoost model
model = xgb.XGBRegressor()
model.fit(train_data[features], train_data['Demand'])

Evaluation Metrics

To evaluate the model’s performance, we use metrics such as:

  • Root Mean Squared Error (RMSE): The square root of MSE, which gives error in the original units.
  • Mean Absolute Error (MAE): The average of the absolute differences between the predicted and actual demand values.
  • Mean Squared Error (MSE): The average of the squared differences between the predicted and actual values. Higher penalties are applied for larger errors.
Python
from sklearn.metrics import mean_squared_error
import numpy as np
# Predict on test data
predictions = model.predict(test_data[features])
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(test_data['Demand'], predictions))
print(f"RMSE: {rmse}")

7. Forecasting

Once the model is trained and evaluated, you can use it to make future demand predictions. This is done by providing the model with the most recent data and allowing it to predict future demand values.

Example of forecasting future demand:

Python
# Predict future demand (for the next 7 days)
future_predictions = model.predict(future_data[features])

8. Inventory Optimization

The final step involves using the forecasted demand to optimize inventory levels. Accurate demand forecasts allow businesses to avoid stockouts (which lead to lost sales) or overstocking (which leads to excess storage costs).

Inventory Adjustment

  • Safety Stock: Based on forecasted demand and lead time, safety stock can be calculated to buffer against uncertainties.
  • Reorder Points: Set reorder points based on forecasted demand and the time it takes to receive new stock.
Python
# Calculate reorder points
lead_time = 7  # in days
demand_forecast = model.predict(future_data[features])

# Reorder point formula: demand during lead time + safety stock
reorder_point = demand_forecast * lead_time + safety_stock

With accurate demand forecasting, inventory optimization strategies can be applied to balance inventory levels, reducing costs and improving service levels.

Conclusion

Mastering demand forecasting in the supply chain is vital for optimizing operations and ensuring that businesses can meet customer needs while minimizing costs. By combining robust Python-based forecasting techniques with tools like ClicData, businesses can gain deeper insights into demand patterns and improve their decision-making processes.

ClicData’s integration capabilities allow analysts to seamlessly bring Python forecasting models into dynamic BI dashboards, providing real-time, actionable insights. This combination of powerful machine learning and BI tools empowers companies to respond quickly to fluctuations in demand, adapt their supply chain strategies, and maintain a competitive edge in the market. As supply chains become more complex, leveraging advanced, AI-driven tools like ClicData ensures that businesses stay ahead of the curve, maintaining accuracy, agility, and cost-efficiency in their operations.

Table of Contents

Share this Blog

Other Blogs

AI Governance: How to Build Trust and Compliance

AI is making important decisions in various industries, like who gets approved for a loan, who gets hired, and even who gets flagged for fraud. But can we trust these…

The evolution of AI: From Chatbots to Autonomous AI Agents

If you are like me, your use of AI is limited to asking a question on a chat box, potentially refining it a few times and then getting an answer.…

How to Choose the Right Data File Format

The file format you choose for your data is crucial for the effectiveness of your analytics processes. Think of it as the foundation of a building. If it's flawed, everything…
All articles

Privacy is important.

Essential Cookies
Required for website functionality such as our sales chat, forms, and navigation. 
Functional & Analytics Cookies
Helps us understand where our visitors are coming from by collecting anonymous usage data.
Advertising & Tracking Cookies
Used to deliver relevant ads and measure advertising performance across platforms like Google, Facebook, and LinkedIn.
Accept AllSave OptionsReject All