FORECASTING IN EXCEL – PART 2
Classical Multiplicative Decomposition
Time series data is usually decomposed or grouped into three components: the trend (long term direction), the seasonality (systematic movements in time groupings e.g., quarterly), and irregularities (unsystematic short-term fluctuations). In our dataset, the impact of COVID-19 on SCOM’s share price would be the primary cause of irregularities.
We have to estimate and remove seasonal effects from the time series since such effects may hide the true movement of the item under study. One could argue that in our case, it would be inappropriate and misleading to perform seasonal adjustment as there are irregular components caused by the 2020 data. In the end, we might end up introducing an artificial seasonal element. Still, we will perform this step to illustrate how seasonality can be estimated.
When we used the moving average, we smoothened out the seasonality and irregularities as you can see in the two series. Now we need to extract the seasonality and irregularity from the original (Closing Price) series/data, a process known as classical decomposition.
In this process, we assume that the seasonal component is constant each year. There are two types of classical decomposition: additive and multiplicative decomposition. For this article, we will use the multiplicative decomposition in which the seasonal and irregular variations follow the movement direction of the trend component. In this case, the original or observed closing price (time series) is expressed as a product of the three components of the time series components, i.e:
[latex] O_t = T_t ast S_t ast I_t [/latex]
where T is the trend, S is the seasonality, I irregularity, and t denotes any given point in time.
The seasonally adjusted series thus becomes:
[latex] {SA}_t = O_t/S_t= T_t ast I_t [/latex]
With this model, both the trend and the original series have the same units whereas the seasonality and irregularity are unitless factors whose values hover around 1.
Back to our Excel sheet, we need to insert a new column t, as in the data point. Do this just before the Year column and let the first value be 1 and the last 626, representing individual points in the time series. Your chart should look like this:
In our data, the moving average becomes our trend (since it smoothened out the seasonality and irregularity) while the closing price is our original or observed values. We use these values to calculate the product of St and It. From the first equation, we can do this by dividing the observed value with the trend. On column H, add a header for StIt. We begin calculating the values for this volume from row 16 by dividing cell F16 by cell G16 and then flash fill the formula to the entire column.
Essentially, the values in column H tell us by how much the seasonality and irregularity components are above or below the observed values. For example, on day 15 (row 16), 1.92% of the closing price was caused by the S and I in the upward direction.
Seasonality Index
At this point, we have the product of the seasonality and irregularity components. However, we want to extract only the seasonality. If we were using data that does not include distortions as a result of a global pandemic, it would be possible to visually identify seasonality trends by simply plotting the time series of the stock’s price. However, due to these disruptions, we cannot clearly say that there is any discernible trend.
On the other hand, we can find averages for individual months through the periods we have (say, average January price in 2019, 2020, and 2021) and do the same for the quarterly prices. The quarterly averages show some tendency of seasonality as in the below image. I have used the AVERAGEIFS function to calculate both the overall and yearly averages.
From the image below, we conclude that there is a quarterly seasonality trend in which SCOM price is highest in the fourth quarter. This is true in both 2019 and 2020, and by looking at the first two quarterly performances of 2021, we can assume it will be true for 2021 as well. Thus, we have four seasons whose seasonality index we need to get from the product of S and I.
The idea here is, by getting the averages of StIt for each quarter in our entire time period, we are ridding the data of the irregularity component. To calculate these averages, you need to create a helper table on the side with each quarter listed. Then use the formula as follows AVERAGEIFS(StIt, quarter, “Q1”).
Now we need to go back to our main table and insert the seasonality component. For an observation in which the quarter is, say, Q1, we will insert 1.0038 which is our Q1 seasonality index. We use VLOOKUP to implement this.
(The reason we merged our cells was so that we can group our x-axis by time period. However, for the AVERAGEIFS and VLOOKUP, we need each cell to have its value. You can duplicate the table in a new sheet and unmerge the year, quarter, and month columns).
For the VLOOKUP, the table range is wherever you’ve calculated your quarterly seasonality indices and the index is 2 since you’re looking for actual value.
At this point, your duplicated table looks like so:
De-Seasonalize the Share Price
Now that we have our quarterly seasonal index, we need to seasonally adjust our data. There are several statistical reasons for de-seasonalizing data. A common reason is when you need to understand the secular trend as opposed to seasonal trend. In our case, we are doing this to perform a regression analysis which we will then use to create de-seasonalized future predictions and eventually go back to the seasonalized output.
Linear Regression
We need to perform a simple linear regression where the de-seasonalized data is our y variable while the t variable that we created earlier on becomes the x variable. Once again you need the Excel data analysis toolpak to do this so ensure it is enabled. Also, leave the confidence level as 95%.
Here is the regression output.
Understanding the Excel Linear Regression Output
Although we are only interested in the intercept and t coefficients, many of the other values are significant in helping us understand the viability of our model. For instance, the Adjusted R squared (R2) value of 67.25% tell us how much of our data’s variability can be explained by the t component. Generally, a higher R2 value is more desirable, although not always the case. In our case, for a time series analysis, 67.25% is well above the desirable value. Additionally, the significance value F, tells us how reliable the model that we have built is.
Given that our confidence level is 95%, a significance value less than 0.05 tells us that our model is statistically significant and, thus, we should go ahead and make our forecasts. Finally, the p-value of our independent variable, t, also tells us whether it is statistically significant in our analysis.
If you perform a multiple regression with the variables t, year, quarter, and month, you’ll notice that although the adjusted R2 value increases, the p-value for month goes above 0.05 – this shows that month is not statistically significant to our model as an independent variable. On the other hand, the quarter variable has the smallest p-value showing that it is the most significant independent variable to the model. In simpler terms, this model is saying that Safaricom’s share price is least influenced by the month we are in and more affected by the particular quarter.
Trend
We will use the y intercept and slope from the simple linear regression output to calculate the trend component. We use the formula y = mx +c to calculate our y; where m is the slope (coefficient of t = 0.0205), c is the y intercept (23.9185), and x is the value of t. You should reference m and c values from the regression output and lock them. The x value is to be referenced from the t column.
This is where we are:
Forecasting – Finally
We have come to the end and everything we have done so far was to prepare us for this. What we need in order to forecast is to multiply the seasonal component with the trend component. That is it. And then we drag it further into the future for as long as we need it to go. In this case, we will be predicting Safaricom’s share price for the rest of the year, i.e, from July to December 31, 2021.
When flash filling the dates to 31st December, click the autofill options and select only the weekdays. We could also choose to eliminate the holidays as there are only four more public holidays according to the website trading hours, but I find this to be a rather tedious and unnecessary process for what is an illustrative article. Flash fill for columns B (t), C (year), D (quarter), and E (month), such that t is now extended to 757. Then fill out the seasonality and trend component values for the rest of the year. With these values in place, you then forecast to the end of the year. Your table should look like below. I have given the forecasted section a shade of green to denote that these are not actual values.
Visualizing the Forecast
We need to see how well our forecasts are performing against the observed values. To do this, we add another series to our line chart. Assuming you have been doing your calculations on the unmerged table in a different sheet, you need to copy back these calculations (but as values) to the table from which the chart gets its values. You also need to ensure that the forecasted periods are merged as we did previously. You can do this by going to the view menu and running the macros. Once you table is filled, follow the same process where you select data and select the column containing our forecast from the first day 2019 and to the last day in 2021. (I will attach my workbook for a better understanding of this)
The chart output follows the trend of the data into the future but does not really predict the seasonal stock price. This kind of output was always expected for the type of data we worked with.
Conclusion
You will find that that there are very many ways to forecast in Excel apart from the tiringly (let’s admit it) lengthy process we have had here. In fact, Excel has a FORECAST and TREND function that can achieve nearly similar results. The choice of procedure, however, will always depend on the type of data you’ve got and how you choose to treat the three components of a time series data.
There is also a back and forth between the tables with merged and unmerged cells. I found that both tables are necessary – one for calculating most of the values and the other for plotting. If you happen to come across a simpler way of achieving this without working with two tables with practically the same data, please share it with me, I’ll appreciate it.
Also, this is the only time we work with Excel. Hence forth, I will be using Python and maybe sometimes R. I have included references where you can further explore many of the concepts covered here. I have also attached the workbook used here. Finally, honest feedback is always welcome on whichever platform you can find me – including the comment section here.
References
[1] W. Kenton, “Understanding Time Series,” Investopedia, Mar. 31, 2020. https://www.investopedia.com/terms/t/timeseries.asp.
[2] J. Fernando, “Moving Average (MA),” Investopedia, 2019.
[3] “6.4.2. What are Moving Average or Smoothing Techniques?,” www.itl.nist.gov.
[4] “6.4. Introduction to Time Series Analysis,” Nist.gov, 2019.
[5] 6.3 Classical decomposition | Forecasting: Principles and Practice (2nd ed).