One of the most popular applications of data analytics is in financial markets. In this series of posts, we will show you how Time Series Forecasting in Excel is done. We will be going through the different applications of data analysis (and data science) in financial forecasting.
There are several tools for this kind of work, but in this case, we will work with Excel and Python. Part 1 of this series focuses on data collection and exploratory data analysis using Excel. It then concludes with a primer on financial indicators and forecasting.
This post does not delve deep into the entire scope of data collection as that would require an entire series on its own. Instead, it will only focus on collecting financial data – specifically for Kenyan companies. The task at hand will be to forecast the share price of Safaricom for the last two quarters of 2021.
Where to get data
Usually, getting the kind of data we need here (ohlcv data) is a painless process because there are nearly hundreds of data portals including Nasdaq, Yahoo Finance, Google Finance API, WSJ, and MarketWatch. Access to this kind of information locally is only possible upon payment of a fee to the Nairobi Securities Exchange.
On the other hand, only a few of the data aggregation portals mentioned above have data for other markets, especially in developing economies. WSJ has the most complete data where you can customize your start and end dates.
Data Collection and Preparation for forecasting using Excel
You can download the dataset from WSJ and set the start date as 01/01/2019 and end date as 30/06/2021 (if you want to have the exact results in this post) or any other date (if you are reading this far past the date of posting). Below is an overview of what you would expect from the downloaded dataset.
OHLCV refers to:
- Open – the price at which the share open that day.
- High – the highest price it sold for on that day.
- Low – its lowest price.
- Close – the closing price
- Volume – the amount of shares that were traded on that day.
These are the columns (or attributes) for our data. In addition to the date on which the stock trade at that particular price, we will only use the closing price for our analysis and forecast. Therefore, the first thing we need to do is get rid of the other columns. You will only have two columns left: The Date column and Close which gets shifted to Column B.
WSJ provides the data with the most recent stock price going back. We need to sort it the other way. To do so, you need to select column A (click right above the column) and go to Sort and Filter. Select Sort Oldest to Newest. It will give you a prompt like below. This prompt is essentially asking whether the sort should also affect the next column. The default value is yes so press Enter. Ignore the chart you see to the right.
Performing Time Series Analysis
What we are doing here is known as a time series analysis. We have a variable that is observed over a certain successive period and is recorded nearly daily, except for weekends and public holidays when trading is closed. The end goal is to identify any long-term and seasonal trends with Safaricom’s share price. We have data from the beginning of 2019 to June 2021 (30 months or ten quarters) and we are trying to predict what it will look like 6 months from now.
First, we will group our periods into years, quarters, and months. In Cell C1, enter a new column title and name it Year. On cell C2, enter a new formula, =YEAR(A2). We are extracting the year from the date in the corresponding cell in column A. On Column D, input the new column header as Quarter. Then insert the formula, = “Q”&ROUNDUP(MONTH(A2)/3,0), in cell D2 and once again flash fill.
This formula does two things:
- Date manipulation
- Text manipulation.
First, we are extracting the month number from the date in cell A2 then dividing that number by 3 because a quarter is 3 months. After that, we’re rounding that quotient to zero decimal place. Finally, we concatenate the result of that operation to the letter Q which stands for quarter. For instance, the date on cell A2 is 2nd January, 2019. The MONTH function returns 1 which divided by 3 is 0.333. When we round that up to no decimal place we get 1 which is then appended to Q to give Q1. With that done, move to column E where we will extract the month name using the formula TEXT(A2, “mmmm”). Flash fill these formulas in their corresponding columns to the end of the dataset on row 627.
If any of these formulas seem strange or complicated to you, Sunesis offers a self-paced online Excel course that covers some of the basic to intermediate excel utilities (including the date and text manipulation functions used here) for just $20.
With the years, quarters and months extracted, the data is not visually appealing since there is a lot of repetition. More importantly, there is a programming necessity for having these cells merged. While we can use merge and center to do this, it would take long to achieve this since we have three columns each with hundreds of data. We will merge and center these cells using VBA code. Hit the ALT + F11 key combinations to open the MS Visual Basic for Applications window. Then select Insert > Module as in the screenshot below.
In the new console window, insert the following code:
Sub MergeSameCell() 'Updateby Extendoffice Dim Rng As Range, xCell As Range Dim xRows As Integer xTitleId = "KutoolsforExcel" Set WorkRng = Application.Selection Set WorkRng = Application.InputBox("Range", xTitleId, WorkRng.Address, Type:=8) Application.ScreenUpdating = False Application.DisplayAlerts = False xRows = WorkRng.Rows.Count For Each Rng In WorkRng.Columns For i = 1 To xRows - 1 For j = i + 1 To xRows If Rng.Cells(i, 1).Value Rng.Cells(j, 1).Value Then Exit For End If Next WorkRng.Parent.Range(Rng.Cells(i, 1), Rng.Cells(j - 1, 1)).Merge i = j - 1 Next Next Application.DisplayAlerts = True Application.ScreenUpdating = True End Sub
This article is not about VBA programming so I will not delve into what the above code does. All you need to know is that it merges all the cells with similar data given a range. To run it, hit F5; it will prompt you for a range. Select all the three columns that we just added: Year, Quarter, and Month. Once you hit Enter, your excel should look like below.
A moving average (MA) is one of the basic stock indicators commonly used by financial analysts and traders for technical analysis. In its simplest form, a moving average subdivides a data points into subsets and calculates the averages of these subsets. The point of obtaining these averages is to smoothen out the data and make it less responsive to temporary price fluctuations.
The price of a stock often experiences random variations (some of which might end up being outliers). When correctly utilized, MAs reduce the noise in the variations and reveal underlying seasonal and annual trends. Although the commonly used number of days for calculating the mean is 200 and 50, this is very much an arbitrary choice that depends on what you want to achieve. In our case, we have data spanning 30 months recorded every trading day (626 data points in total). This is a rather small dataset for illustrative purposes only and we will thus use 15 days to get the moving averages.
There are two types of moving averages: Exponential (EMA) moving averages and Simple MA (SMA), which is what is used in this article. I have attached an article in the reference page where you can read further on how MA is used for technical analysis.
In our dataset, the first 15 days begin from 2nd January and end on 22nd January, 2019 (or rows 2 to 16). The next 15 days data point start on 3rd and end on the 23rd. Fundamentally, to get our subsequent data points, we remove the first day from the current subset and add the next day after the subset.
So the first three data points are as follows:
- Rows 2 to 16,
- Rows 3 to 17, and
- Rows 4 to 18.
On a new column, enter the title as SMA (15) to mean simple moving average over 15 days. When calculating the moving average, we want to put the result on the last day of the subset, which is row 16 for the first subset.
We could calculate the averages manually by using the excel average function or use the data analysis toolpak’s moving average. The Data Analysis Toolpak must be enable for you to use it. Once you’ve selected the moving average option, you need to select the range, interval, and output.
The range is our entire close column, interval is 15, and for output select the first cell of the SMA column, F2. Do not select chart output or standard errors. There will be NAs for the first 14 values because there are not enough previous data points. Delete the cells with N/As to have a clean sheet. If you select any of the cells in the SMA column, you’ll find they are also using the AVERAGE function.
Data Analysis in Excel.
The first thing we want to do is have an overall view of the closing price over the duration. For this, we need to use a line chart. First, shift the close column to be the second last so that our period columns or independent variables (Date, Year, Quarter, Month) are stacked together and our dependent values (Close and SMA) are on the other side. You can do this by cutting the whole column B and pasting it before column F. Your columns now look like this:
Go to Insert menu and in the Charts group select line. If excel has automatically populated your chart with some values clear those out by using the Chart Design tab group (towards the end of the menu bar) and clicking Select Data. In the pop up, remove every value from both horizontal axis and legend entry tabs. Then click add on the left tab (Legend Entries Series) and select the close price values, i.e, cells E2 to E627. For the horizontal axis, we need to visualize the yearly, quarterly and monthly trends so select these tabs, i.e. cells B2 to D626. Once done, your data source needs to look like the screenshot below. The chart data range selects all the values for both horizontal and vertical axes.
Once you have a chart output, right click on the vertical axis and change the minimum value from 0.0 to 15.0 to enhance the trend output.
Some optional modifications that I have made to my chart are as follows:
- I have copied it into a new sheet so as to scale it out.
- The chart style is set to 7. I like this style because it also enhances the trends we are trying to visualize. Here is the final output.
Using data that includes 2020 performance was a risky choice, particularly for illustrative work. That was an abnormal year for most companies and abnormalities lead to outliers. However, as a data analyst or scientist, you will never have perfect data. We all wish we do but that is never the case. There’s a long running joke that 90% of data science work is data cleaning and preprocessing. It is not a joke.There are some subtle observations to be made from just the times series plot of the close price but I will choose to do this with the MA plot. Once again click on Select Data and add the MA column to the Legend series as we did Closing Price. Your data source should now look like this:
Whereas the chart output looks like this:
(Part 2 is coming soon)