Sales analytics are key to understanding the impact of independent variables on consumer demand of a product or service. How does seasonality affect sales? What is the optimal price point for a particular product? How do competitors and broader market trends influence a products potential profit?
In this project, we will explore Mercedes C class sales between May 2006 and March 2009, investigating the impact of MSRP price changes on demand and the significance of seasonality. Utilizing the key insights from the investigation we will identify optimization opportunities to the pricing structure and its bottom line impact on net profitability.
What data did we use?
Mercedes sales data is available from Kaggle here, the data itself is somewhat sparse providing only MSRP and Units for particular models of car. Additional variables will be produced during the data exploration and feature engineering stage.
Exploratory Data Analysis
By undertaking exploration, we can begin to understand the data better and create additional features through one hot encoding dummy variables and third-party sources.
Missing Data Points
As we can see certain points within the data set are missing values, there are a couple of different ways we can progress when encountering missing data:
- Pairwise Deletion: When a variable is found to be missing data points the variable itself is removed from the analysis. In cases where there are a multitude of variables, this may be an option depending on the significance of the relationship between it and other variables
- Listwise Deletion: When a record is missing a data point the record is removed from the analysis. In cases where a large sample set is available removing a small percentage of records should not have a statistically significant impact on the veracity of the data
- Imputation: In certain cases, it may be possible to replace the missing data points through educated guesses. If there is an identifiable trend it may be plausible to impute the missing value.
By implementing time series interpolation we can fill in the blanks with our best guesses for the missing values.
The number of units sold on a monthly basis ranges between 3,032 and 7,646 reaching its peak in October 2007 declining to its lowest point in January 2009. An increased sale trend is apparent between late 2007 and late 2008 this coincides with the release of a new model of C class (w204).
During the analysis period the MSRP of C class models ranges from $36,650 to $38,917 the reduced price in 2007 may have been implemented to clear end of production older models prior to the release of their replacement.
Prior to modeling the data, we will introduce additional variables to attempt to explain more of the variance in our dependent variable (units).
Seasonal Dummy Variables
We create monthly dummy variables by introducing one hot encoding, this creates individual variables for 11 of the 12 months of the year giving them a value of 1 if the row date contains that variables month (see opposite).
New Model Variable
An additional binary variable will be included to account for the release of the new W204 model in late 2007.
By implementing linear regression models against this data we will attempt to quantify the impact of independent variable (price, model,
seasonality) against the dependent variable units. The coefficients of the independent variables allow us to identify the proportion of that impact.
Given the small sample size, it is challenging to produce a model which explains a majority of the variance in the dependent variable (price) improvements to the model may be achieved with greater granularity of time intervals or availability of actual retail price in place of MSRP.
The coefficients of the monthly seasonal dummies provide an insight into the variance in demand throughout the year. A significant dip in unit sales occurs in the first three months of the year.
A negative coefficient for the independent variable price indicates a negative relationship between price and demand, the more extreme a negative value of the coefficient is associate with a larger reduction in demand of a product.
An exponential transformation of the price variable is implemented prior to modeling which results in an exponential price curve y = x^2 * -0.000006 as the price increases an accelerated decrease in demand is forecast, and conversely a price decrease is forecast to provide a decelerated increase in demand.
When combined with other financial factors price elasticity of demand can be utilized to calculate what-if scenarios. If we assume an average return on sales at 4% we can estimate the average net profit, combining this with the estimated monthly units we can see its impact on total net profit.
An ARIMA model is used to provide a 12 month forecast of sales.
Click on the dashboard below to see it live:
All charts and dashboard are powered by ClicData data visualization and data management software.
About the author
Analysis by Ciaran Carroll, Senior Business Analyst with a strong background in marketing and pricing strategy.