How to Forecast Demand in Supply Chain with Python

Table of Contents

    Imagine preparing for the holiday season, only to find warehouses filled with unsold goods or customers frustrated by stockouts. Demand forecasting is a tool that helps to eliminates such risks, turning uncertainty into actionable insights. It’s a critical tool for supply chain management, enabling businesses to predict inventory needs, optimize resources, and stay competitive in dynamic markets.

    Demand forecasting refers to studying historical and current data to understand the internal and external factors affecting demand. The trend equation is then used to predict or ‘forecast’ what the market would be like in the short or long term.

    There are several ways to determine demand forecasts, depending on the factors prioritized. Some organizations rely on historical data alone, while others incorporate real-time factors like promotions or market conditions. This article will explore how to implement demand forecasting models, focusing on the latest strategies and technologies that enable businesses to stay resilient amid market uncertainties.

    pile of boxes interconnected in a warehouse

    Implementing Demand Forecasting Model Using Python

    1. Data Collection

    To effectively forecast demand, the first step is to gather historical sales data. The quality and richness of the data you collect will directly impact the accuracy of the model. The data should include key attributes such as:

    • Product IDs: Identifies the individual products whose demand will be forecasted.
    • Sales Quantities: The number of units sold per product on each date.
    • Dates: The date or time period associated with each sales record (daily, weekly, monthly, etc.).
    • Other Relevant Features: Additional variables that might affect demand, such as promotions, holidays, or special events. These external factors can significantly influence product demand and should be incorporated if available.

    Data can be collected from various sources like sales databases, inventory management systems, and customer orders. Ensuring the data is accurate and comprehensive is essential before moving to the next stage.

    2. Data Preprocessing

    After collecting the data, preprocessing is the next critical step. The quality of the data plays a huge role in the model’s effectiveness. This stage includes:

    • Handling Missing Values: Incomplete data can skew the forecasting model’s performance. You can handle missing values in various ways:
      • Forward Fill: This method fills missing values with the previous available data point (common for time series).
      • Imputation: Replacing missing values with the mean, median, or a model-based approach.
      • Dropping Missing Data: If a small number of data points are missing, you may choose to remove them, though this is only viable if the missing data is not too large.
    Python
    data.fillna(method='ffill', inplace=True)  # Forward fill
    • Encoding Categorical Variables: For categorical data such as product IDs or promotional events, encoding might be needed. Common methods include:
      • One-Hot Encoding: Converts categorical variables into binary columns for each category.
      • Label Encoding: Assigns a unique integer to each category (useful for ordinal data).
    Python
    data = pd.get_dummies(data, columns=['Product_ID'])
    • Normalizing/Standardizing Numerical Features: For numerical features such as sales volume, normalization or standardization can help ensure that the data is on the same scale, which is especially important for machine learning algorithms like gradient boosting or neural networks.
    Python
    from sklearn.preprocessing import MinMaxScaler
    scaler = MinMaxScaler()
    data['Demand'] = scaler.fit_transform(data[['Demand']])

    Data preprocessing ensures that the dataset is clean, complete, and in the right format for analysis.

    3. Exploratory Data Analysis (EDA)

    Exploratory Data Analysis (EDA) is a crucial stage where you visualize and analyze the dataset to gain insights that can inform the forecasting process. The goal is to identify patterns, trends, and anomalies that can impact the demand forecasting model.

    Plotting Sales Trends Over Time

    One of the first steps in EDA is to visualize how the demand fluctuates over time. This helps in identifying trends, seasonality, and periodic behaviors.

    Python
    import matplotlib.pyplot as plt
    # Plot demand over time
    plt.figure(figsize=(10, 6))
    plt.plot(data['Date'], data['Demand'], label='Demand')
    plt.title('Demand Over Time')
    plt.xlabel('Date')
    plt.ylabel('Demand')
    plt.show()

    Identifying Seasonal Patterns

    Seasonality refers to periodic fluctuations in demand that occur at regular intervals (e.g., higher demand in summer or around holidays). Detecting seasonality allows you to adjust your forecasting model accordingly.

    You can identify seasonality through:

    • Visual Inspection: Look for patterns in the plots over multiple periods.
    • Autocorrelation: Check for correlation between the demand at different time lags.

    For instance, using autocorrelation in Python:

    Python
    from pandas.plotting import lag_plot
    
    lag_plot(data['Demand'])
    plt.title('Autocorrelation of Demand')
    plt.show()

    Checking for Outliers

    Outliers are extreme data points that deviate significantly from other observations. These can distort the model’s performance, so it’s important to identify and decide how to handle them (e.g., remove or adjust them).

    Outliers can be detected using:

    • Box Plots: Visualize the distribution and detect any extreme values.
    • Statistical Tests: Use methods like Z-scores to flag outliers.
    Python
    plt.figure(figsize=(10, 6))
    plt.boxplot(data['Demand'])
    plt.title('Demand Distribution')
    plt.ylabel('Demand')
    plt.show()

    Interdependence of Data Exploration and Preprocessing

    It’s important to note that the stages of data exploration and preprocessing are not independent of each other. Insights gathered from EDA can directly influence how you clean and transform the data. For example, if seasonal trends are observed during EDA, you may decide to:

    • Impute missing values differently (e.g., based on seasonal averages).
    • Remove outliers that could distort seasonal patterns.
    • Create new features like lag variables or rolling averages to better capture seasonality.

    Thus, both stages work in tandem to ensure the data is fully prepared for modeling.

    4. Feature Engineering

    Feature engineering is a critical stage in building a demand forecasting model. It involves creating new features that help the model capture hidden patterns or trends in the data, which can improve forecasting accuracy. Below are key types of features commonly engineered for demand forecasting:

    Lag Features

    Lag features capture past sales data and use them as input for predicting future demand. These are crucial for time series models as they help incorporate the effect of historical demand on future sales.

    Python
    # Create a lag feature of the previous day's demand
    data['lag_1'] = data['Demand'].shift(1)
    # Create a lag feature of the past 7 days
    data['lag_7'] = data['Demand'].shift(7)

    Here, shift(1) moves the demand column down by one row, allowing the model to use the demand from the previous day as a feature.

    Rolling Averages

    Rolling averages smooth out fluctuations in the demand data and are useful for capturing trends over time, especially for seasonal products. They can be created with a specific window size (e.g., 7 days or 30 days).

    Python
    # Create a 7-day rolling average feature
    data['rolling_avg_7'] = data['Demand'].rolling(window=7).mean()

    This creates a new feature that represents the average demand over the last 7 days, which can help the model understand long-term trends.

    Promotional Events

    Promotions can significantly affect demand. You can create a binary feature that indicates whether a promotion was running on a specific date (1 for a promotion, 0 for no promotion).

    Python
    # Create a 'Promotion' column based on event dates
    data['Promotion'] = data['Date'].apply(lambda x: 1 if x in promo_dates else 0)

    Here, promo_dates would be a list of dates when promotions occurred.

    Holidays

    Holidays often lead to increased or decreased demand, depending on the type of product. You can add a feature that flags holidays, as demand can vary during these periods.

    Python
    # Add a column for holidays
    data['Holiday'] = data['Date'].apply(lambda x: 1 if x in holiday_dates else 0)

    Here, holiday_dates is a list of dates marked as holidays.

    By creating these features, you enhance the model’s ability to capture important relationships in the data, making the forecasting model more robust.

    5. Model Selection

    Selecting the right model is crucial for accurate demand forecasting. Depending on the nature of the data, different types of models may be appropriate. Here’s a breakdown of common model types:

    Time Series Models

    ARIMASARIMAProphet
    Stands for AutoRegressive Integrated Moving AverageStands for Seasonal ARIMADeveloped by Facebook
    This model is used for univariate time series data with clear trends, leveraging past values and forecast errors to predict future values.This model accounts for seasonality in time series data.This model handles seasonality, holidays, and missing data.
    Parameters: p (lag order), d (degree of differencing), q (order of moving average).Parameters: p, d, q (ARIMA components) and P, D, Q (seasonal components) with the period of seasonality.Parameters: Change points, seasonalities, holidays.
    Use case: Good for stationary data without seasonal components.Use case: Effective when the data shows seasonal patterns (e.g., monthly or quarterly sales).Use case: Great for data with strong seasonal effects and missing values.

    Machine Learning Models

    Random ForestGradient Boosting
    A random forest is an ensemble model that uses multiple decision trees to make predictions. It is suitable for capturing non-linear relationships in the dataThese models build trees sequentially, with each tree trying to correct the errors of the previous one. They are powerful for time series forecasting when features are engineered. (e.g., XGBoost, LightGBM)
    Parameters: Number of trees (n_estimators), Maximum depth (max_depth), Minimum samples split (min_samples_split), Minimum samples leaf (min_samples_leaf)Parameters: Learning rate (eta), Maximum depth (max_depth), Subsample (subsample), Number of estimators (n_estimators)
    Use case: Can be used for both regression and classification tasks.Use case: Effective for complex, non-linear relationships in large datasets.

    Deep Learning Models

    LSTM (Long Short-Term Memory)GRU (Gated Recurrent Unit)
    LSTMs are a type of recurrent neural network (RNN) designed for sequence prediction. They are powerful for time series data that involves long-term dependencies.GRU is a variation of LSTM with fewer parameters, making it faster to train. It’s also suitable for time series data.
    Parameters: Number of layers, Hidden units, Dropout rate, Sequence lengthParameters: Number of layers, Hidden units, Dropout rate, Sequence length
    Use case: Effective for capturing long-term dependencies and complex patterns in sequential data.Use case: Good for problems where LSTM might be overkill or too slow.

    6. Model Training and Evaluation

    Once the features are engineered and the model is selected, the next step is to train the model and evaluate its performance.

    Train/Test Split

    The data is typically split into training and test sets to evaluate the model’s performance on unseen data. Common split ratios include 80/20 or 70/30, with 70-80% of the data used for training and the rest for testing.

    Python
    from sklearn.model_selection import train_test_split
    # Split the data into training and test sets
    train_data, test_data = train_test_split(data, test_size=0.2, shuffle=False)

    Training the Model

    Once the data is split, the model is trained using the training data. For example, if using XGBoost, it might look like this:

    Python
    import xgboost as xgb
    
    # Train XGBoost model
    model = xgb.XGBRegressor()
    model.fit(train_data[features], train_data['Demand'])

    Evaluation Metrics

    To evaluate the model’s performance, we use metrics such as:

    • Root Mean Squared Error (RMSE): The square root of MSE, which gives error in the original units.
    • Mean Absolute Error (MAE): The average of the absolute differences between the predicted and actual demand values.
    • Mean Squared Error (MSE): The average of the squared differences between the predicted and actual values. Higher penalties are applied for larger errors.
    Python
    from sklearn.metrics import mean_squared_error
    import numpy as np
    # Predict on test data
    predictions = model.predict(test_data[features])
    # Calculate RMSE
    rmse = np.sqrt(mean_squared_error(test_data['Demand'], predictions))
    print(f"RMSE: {rmse}")

    7. Forecasting

    Once the model is trained and evaluated, you can use it to make future demand predictions. This is done by providing the model with the most recent data and allowing it to predict future demand values.

    Example of forecasting future demand:

    Python
    # Predict future demand (for the next 7 days)
    future_predictions = model.predict(future_data[features])

    8. Inventory Optimization

    The final step involves using the forecasted demand to optimize inventory levels. Accurate demand forecasts allow businesses to avoid stockouts (which lead to lost sales) or overstocking (which leads to excess storage costs).

    Inventory Adjustment

    • Safety Stock: Based on forecasted demand and lead time, safety stock can be calculated to buffer against uncertainties.
    • Reorder Points: Set reorder points based on forecasted demand and the time it takes to receive new stock.
    Python
    # Calculate reorder points
    lead_time = 7  # in days
    demand_forecast = model.predict(future_data[features])
    
    # Reorder point formula: demand during lead time + safety stock
    reorder_point = demand_forecast * lead_time + safety_stock

    With accurate demand forecasting, inventory optimization strategies can be applied to balance inventory levels, reducing costs and improving service levels.

    Conclusion

    Mastering demand forecasting in the supply chain is vital for optimizing operations and ensuring that businesses can meet customer needs while minimizing costs. By combining robust Python-based forecasting techniques with tools like ClicData, businesses can gain deeper insights into demand patterns and improve their decision-making processes.

    ClicData’s integration capabilities allow analysts to seamlessly bring Python forecasting models into dynamic BI dashboards, providing real-time, actionable insights. This combination of powerful machine learning and BI tools empowers companies to respond quickly to fluctuations in demand, adapt their supply chain strategies, and maintain a competitive edge in the market. As supply chains become more complex, leveraging advanced, AI-driven tools like ClicData ensures that businesses stay ahead of the curve, maintaining accuracy, agility, and cost-efficiency in their operations.