Time series analysis is an approach to analyse timely historical data to extract meaningful characteristics and generate other useful insights applied in businesses. Generally, time-series data is a sequence of observations stored in time order. It helps understand time based patterns of set of data points which are critical for any business. Techniques of time series forecasting could answer business questions like what level of inventory to maintain, how much website traffic can you expect in your e-store, to how many products will be sold in the next month. All of these are important time series problems to solve. For an instance, large organisations like Facebook and Google must engage in capacity planning to allocate scarce resources and goal setting with respect to high increase of their users. The basic objective of time series analysis usually is to determine a model that describes the pattern of the time series and could be used for future forecasting.
Classical time series forecasting techniques are built on statistical models which require a lot of effort to tune models in order to get high accuracy. The person has to tune the parameters of the method with regards to the specific problem when a forecasting model doesn’t perform as expected. Tuning these methods requires a thorough understanding of how the underlying time series models work. It’s difficult for some organisations to handle that level of forecasting without data science teams. And it might not seem profitable for an organisation to have a bunch of expects on board if there is no need a build a complex forecasting platform or other services.
Facebook developed "Prophet", an open source forecasting tool available in both Python and R. It provides intuitive parameters which are easy to tune. Even someone who lacks a deep expertise in time-series forecasting models can use this to generate meaningful predictions for a variety of problems in business scenarios.
Excerpt from Facebook Prophet website:
“ Producing high quality forecasts is not an easy problem for either machines or for most analysts. We have observed two main themes in the practice of creating a variety of business forecasts:
Prophet builds a model by finding a best smooth line which can be represented as a sum of the following components:
y(t) = g(t) + s(t) + h(t) + ϵₜ
In this blog post, we will see some of the useful functions present in the library
fbprophet by training a basic prophet model using an example data set. In the following tutorial, the following topics will be covered.
Since Python is used as the programming language here, the ways how the prophet package can be installed in the Python environment are mentioned below.
Just like every Python library, you can install
fbprophet using pip. The major dependency that Prophet has is
pystan.Install pystan with pip before using pip to install fbprophet.
pip install pystan
pip install fbprophet
You can also install prophet in your conda environment.
conda install -c conda-forge fbprophet
After installation, let’s get started!
After setting up your Python environment with the dependencies installed, let’s import the required Python libraries including fbprophet which will be useful on our way to do the future forecasting.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from fbprophet import Prophet
The dataset is then loaded as a pandas dataframe. Here the dataset contains daily page views for the Wikipedia page for Peyton Manning. The dataset has been modified for the representation purposes for this article. You can access the dataset and the source code here.
df = pd.read_csv("peyton_manning.csv")
When the following commands executed in order to see the first 10 and last 10 tuples of the dataframe, it appears as follows. As you see, it consists of two columns “Date” and “Views” where number of views of each date has been recorded. This dataset has records from year 2007 to 2016. The number of rows and columns in the dataset can be obtained using the Python command and it outputs as
First the date column should be converted into “Datetime” format before fitting out dataset into the model.
df['Date'] = pd.to_datetime(df['Date'])
Date datetime64[ns] Views int64 dtype: object
We can visually represent the variation of data using the plot function in Matplotlib.
df.plot(x = 'Date')
Taking the date column as the x axis, the above variation can be obtained which is not stationary by the appearance. The curve is more rightly skewed and the data does not look much cleaner. In order to fit the data into the model, there should be a stationary variation of data in the data set. This can be achieved in mainly in two ways.
In this tutorial, the log transformation has been applied to all the values in Views column.
df['Views'] = np.log(df['Views'])
When the plot is obtained again, the data appears to be stationary.
Before fitting our model using the peyton manning dataset, the ‘date’ and ‘views’ columns should be renamed as ‘ds’ and ‘y’ respectively. This is a standard that is introduced by prophet.
df.columns = ['ds','y']
When this is done, we are good to go ahead and train our prophet model.
We fit the model by instantiating a new
Prophet object. Any settings to the forecasting procedure are passed into the constructor. Then you call its
fit method and pass in the preprocessed dataset with historical data.
model = Prophet()
Predictions are then made on a dataframe with a column
ds containing the dates for which a prediction is to be made. You can get a suitable dataframe that extends into the future a specified number of days using the helper method
Prophet.make_future_dataframe. By default it will also include the dates from the history, so we will see the model fit as well. The number of future dates to be predicted can be specified by the parameter “periods”.
future_dates = model.make_future_dataframe(periods=365)
In the peyton manning dataset, it contains records from 2007 to 2016. If you examine the last tuples of the future_dates dataframe, it now consists dates from 2017 which are to be included in the forecast of the model.
predict method will assign each row in
future_dates a predicted value which it names
yhat. If you pass in historical dates, it will provide an in-sample fit. The
prediction object here is a new dataframe that includes a column
yhat with the forecast, as well as columns for components and uncertainty intervals.
prediction = model.predict(future_dates)
When you plot the prediction, it is illustrated as follows.
In the above figure, black dots are the actual datapoints. Dark blue colour area is the trend variation of the data which has been predicted for the 2016-2017 period (indicated by red arrow) by the prophet model. The light blue regions represent the range of bounding boxes yhat_upper and yhat_lower.
You can also see the forecast components using the
Prophet.plot_components method. By default you’ll see the trend, yearly seasonality, and weekly seasonality of the time series. If you include holidays, you’ll see those here, too.
Once the forecast is obtained from the model, the accuracy of the model has to be measured using a relevant performance metric. Prophet includes an inbuilt function in order to carry out cross validation to measure forecast error using historical data. The forecast horizon (
horizon), initial training period (
initial) and the spacing between cutoff dates (
period) should be specified.
Here cross-validation is done to assess prediction performance on a horizon of 365 days, starting with 730 days of training data in the first cutoff and then making predictions every 180 days. On this 8 year time series, this corresponds to 11 total forecasts. Thus the performance metrics can be calculated
from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(model, initial='730 days', period='180 days', horizon = '365 days')
Thus the performance metrics can be calculated. The statistics computed are mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), median absolute percent error (MDAPE) and coverage of the
from fbprophet.diagnostics import performance_metrics
df_p = performance_metrics(df_cv)
Cross validation performance metrics can be visualized with
plot_cross_validation_metric, here shown for MAPE. Dots show the absolute percent error for each prediction in
df_cv. The blue line shows the MAPE, where the mean is taken over a rolling window of the dots. We see for this forecast that errors around 5% are typical for predictions one month into the future, and that errors increase up to around 11% for predictions that are a year out.
from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='mape')
It can also be visualised for the other metrics such as rmse, mae and mse which have been already done in the complete code. You can access the source code for this tutorial here.
There are many time-series models such as ARIMA, exponential smoothing, snaive …etc which can be used for forecasting from historical data. From the practical example, it seems that Prophet provides completely automated forecasts just as its official document states. It’s fast and productive which would be very useful if your organisation doesn’t have a very solid data science team handing predictive analytics. It saves your time to answer internal stakeholder’s or client’s forecasting questions without spending too much effort to build an amazing model based on classic time-series modeling techniques.