Can we predict stock prices with Prophet?

Facebook recently released a API package allowing access to its forecasting model called prophet. According to the underling post:

It's not your traditional ARIMA-style time series model. It's closer in spirit to a  Bayesian-influenced generalized additive model, a regression of smooth terms. The model is resistant   to the effects of outliers, and supports data collected over an irregular time scale (ingliding presence of missing data) without the need for interpolation. The underlying calculation engine is Stan; the R and Python packages simply provide a convenient interface.  

After reading it, I got really curious about the predictive performance of this method for stock prices. That is, can we predict stock price movements based on prophet? In this post I will investigate this research question using a database of prices for the SP500 components.

Before describing the code and results, it is noteworthy to point out that forecasting stock returns is really hard! There is a significant body of literature trying to forecast prices and to prove (or not) that financial markets are efficient in pricing publicly available information, including historical prices. This is the so called efficient market hypothesis. I have studied it, tried to trade for myself for a while when I was a Msc student, advised several graduate students on it, and the results are mostly the same: it is very difficult to find a trade signal that works well and is sustainable in real life.

This means that most of the variation in prices is due to random factors that cannot be anticipated. The explanation is simple, prices move according to investor’s expectation from available information. Every time that new (random) information, true or not, reaches the market, investor’s update their beliefs and trade accordingly. So, unless, new information or market expectation have a particular pattern, price changes will be mostly random.

Even with a body of evidence against our research, it is still interesting to see how we could apply prophet in a trading setup.

The data

First, let’s download stock prices for some components of the SP500 index since 2010.

library(BatchGetSymbols)
## Loading required package: rvest
## Loading required package: xml2
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
set.seed(100)

all.stocks <- GetSP500Stocks()$Ticker
my.stocks <- sample(all.stocks, 20)

first.date <- as.Date('2015-01-01')
last.date <- as.Date('2019-01-01')
df.stocks <- BatchGetSymbols(my.stocks, 
                             first.date = first.date, 
                             last.date = last.date)[[2]]
## 
## Running BatchGetSymbols for:
## 
##    tickers =FTV, ZBH, OXY, C, XLNX, VZ, BEN, WY, ROL, VTR, TSN, ABMD, MKC, MDLZ, CNP, PVH, ADBE, EXPD, L, UNM
##    Downloading data for benchmark ticker
## ^GSPC | yahoo (1|1) | Found cache file
## FTV | yahoo (1|20) | Found cache file - Got 62% of valid prices | OUT: not enough data (thresh.bad.data = 75%)
## ZBH | yahoo (2|20) | Found cache file - Got 100% of valid prices | OK!
## OXY | yahoo (3|20) | Found cache file - Got 100% of valid prices | Got it!
## C | yahoo (4|20) | Found cache file - Got 100% of valid prices | You got it!
## XLNX | yahoo (5|20) | Found cache file - Got 100% of valid prices | OK!
## VZ | yahoo (6|20) | Found cache file - Got 100% of valid prices | Got it!
## BEN | yahoo (7|20) | Found cache file - Got 100% of valid prices | Feels good!
## WY | yahoo (8|20) | Found cache file - Got 100% of valid prices | Looking good!
## ROL | yahoo (9|20) | Found cache file - Got 100% of valid prices | Got it!
## VTR | yahoo (10|20) | Found cache file - Got 100% of valid prices | OK!
## TSN | yahoo (11|20) | Found cache file - Got 100% of valid prices | Feels good!
## ABMD | yahoo (12|20) | Found cache file - Got 100% of valid prices | Feels good!
## MKC | yahoo (13|20) | Found cache file - Got 100% of valid prices | Got it!
## MDLZ | yahoo (14|20) | Found cache file - Got 100% of valid prices | Feels good!
## CNP | yahoo (15|20) | Found cache file - Got 100% of valid prices | Got it!
## PVH | yahoo (16|20) | Found cache file - Got 100% of valid prices | Looking good!
## ADBE | yahoo (17|20) | Found cache file - Got 100% of valid prices | OK!
## EXPD | yahoo (18|20) | Found cache file - Got 100% of valid prices | You got it!
## L | yahoo (19|20) | Found cache file - Got 100% of valid prices | Good job!
## UNM | yahoo (20|20) | Found cache file - Got 100% of valid prices | Looking good!

Now, let’s understand how prophet works. I was happy to see that the interface is quite simple, you offer a time series with input y and a date vector with ds. If no further custom option is set, you are good to go. My only complain with prophet is that that the function outputs lots of messages. They really should add a quiet option, so that the user doesn’t have to use capture.output to silent it. Have a look in the next example with a dummy series:

library(prophet)
## Loading required package: Rcpp
## Loading required package: rlang
## 
## Attaching package: 'rlang'
## The following object is masked from 'package:xml2':
## 
##     as_list
df.est <- data.frame(y = rnorm(100), ds = Sys.Date() + 1:100)

m <- prophet(df = df.est)
## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

The next step is to think about how to structure a function for our research problem. Our study has two steps, first we will set a training (in-sample) period, estimate the model and make forecasts. After that, we use the out-of-sample data to test the accuracy of the model.

The whole procedure of estimating and forecasting will be encapsulated in a single R function. This is not the best way of doing it but, for our simple example, it will suffice. My function will take as input a dataframe and the number of out-of-sample forecasts. Based on the adjusted closing prices, we calculate returns and feed 1:(nrow(df)-nfor) rows for the estimation. The last nfor rows are used for testing the accuracy of the model. For example, if I have a vector with 1000 returns and nfor=5, I use observations from 1:995 for estimating the model and 996:1000 for testing the forecasts. The function returns a dataframe with the predictions for each horizon, its error, among other things. Here’s the function definition:

est.model.and.forecast <- function(df.in, nfor=5){
  # Estimated model using prophet and forecast it
  #
  # Args:
  #   df.in - A dataframe with columns price.adjusted and ref.date
  #   nfor - Number of out-of-sample forecasts
  #
  # Returns:
  #   A dataframe with forecasts and errors for each horizon.
  
  require(prophet)
  require(dplyr)
  
  my.ticker <- df.in$ticker[1]
  
  #cat('\nProcessing ', my.ticker)
  
  df.in <- df.in %>%
    select(ref.date, ret.adjusted.prices)
  
  names(df.in) <- c('ds', 'y')
  
  idx <- nrow(df.in) - nfor
  
  df.est <- df.in[1:idx, ]
  df.for <- df.in[(idx + 1):nrow(df.in), ]
  
  capture.output(
    m <- prophet(df = df.est)
  )
  
  # forecast 50 days ahead (it also includes non trading days)
  df.pred <- predict(m,
                     make_future_dataframe(m,
                                           periods = nfor + 50))
  
  df.pred$ds <- as.Date(df.pred$ds)
  df.for <- merge(df.for, df.pred, by = 'ds')
  df.for <- select(df.for, ds, y, yhat)
  
  # forecast statistics
  df.for$eps <- with(df.for,y - yhat)
  df.for$abs.eps <- with(df.for,abs(y - yhat))
  df.for$perc.eps <- with(df.for,(y - yhat)/y)
  df.for$nfor <- 1:nrow(df.for)
  df.for$ticker <- my.ticker
  
  return(df.for)
  
}

Let’s try it out using the by function to apply it for each stock in our sample. All results are later combined in a single dataframe with function do.call.

out.l <- by(data = df.stocks,
            INDICES = df.stocks$ticker, 
            FUN = est.model.and.forecast, nfor = 5)

my.result <- do.call(rbind, out.l)

Lets have a look in the resulting dataframe:

head(my.result)
##                ds            y         yhat         eps    abs.eps  perc.eps
## ABMD.1 2018-12-24 -0.031726969 -0.005299687 -0.02642728 0.02642728 0.8329596
## ABMD.2 2018-12-26  0.093781188 -0.002497698  0.09627889 0.09627889 1.0266333
## ABMD.3 2018-12-27  0.026769487 -0.005186820  0.03195631 0.03195631 1.1937587
## ABMD.4 2018-12-28  0.007919663 -0.001131558  0.00905122 0.00905122 1.1428795
## ABMD.5 2018-12-31  0.021592217 -0.003278144  0.02487036 0.02487036 1.1518206
## ADBE.1 2018-12-24 -0.017432945 -0.005505203 -0.01192774 0.01192774 0.6842070
##        nfor ticker
## ABMD.1    1   ABMD
## ABMD.2    2   ABMD
## ABMD.3    3   ABMD
## ABMD.4    4   ABMD
## ABMD.5    5   ABMD
## ADBE.1    1   ADBE

In this object you’ll find the forecasts (yhat), the actual values (y), the absolute and normalized error (abs.eps, perc.eps).

For ou first analysis, let’s have a look on the effect of the forecasting horizon over the absolute error distribution.

library(ggplot2)

p <- ggplot(my.result, aes(x=factor(nfor), 
                           y=abs.eps))
p <- p + geom_boxplot()

print(p)

We do find some positive dependency. As the horizon increases, the forecasting algorithm makes more mistakes. Surprisingly, this pattern is not found for nfor=5 and nfor=4. It might be interesting to add more data and check if this effect is robust.

Encopassing test

A simple and powerful test for verifying the accuracy of a prediction algorithm is the encompassing test. The idea is to estimate the following linear model with the real returns (\(R_t\)) and its predictions (\(\hat{R} _t\)).

\[y_t = \alpha + \beta\hat{y_t} + \epsilon _t\]

If the model provides good forecasts, we can expect that \(\alpha\) is equal to zero (no bias) and \(\beta\) is equal to 1. If both conditions are true, we have that \(R_t = \hat{R} _t + \epsilon _t\)$, meaning that our forecasting model provides an unbiased estimator of the predicted variable. In a formal research, we could use a Wald test to verify this hypothesis jointly.

First, lets find the result of the encompassing test for all forecasts.

lm.model <- lm(formula = y ~yhat, data = my.result)
summary(lm.model)
## 
## Call:
## lm(formula = y ~ yhat, data = my.result)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.055237 -0.012307 -0.000287  0.008360  0.086257 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 0.007954   0.004063   1.957   0.0533 .
## yhat        0.171995   1.132177   0.152   0.8796  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02675 on 93 degrees of freedom
## Multiple R-squared:  0.0002481,  Adjusted R-squared:  -0.0105 
## F-statistic: 0.02308 on 1 and 93 DF,  p-value: 0.8796

As you can see, it didn’t work very well. The constant is significant, which indicates a bias. The value of 0.1719945 is not very close to 1. But, it could be the case that the different horizon have different results. A longer horizon, with bad forecasts, will be affecting short horizons with good forecasts. Lets use dplyr to separate our model according to nfor.

models <- my.result %>%
  group_by(nfor) %>%
  do(ols.model = lm(data = ., formula = y ~ yhat ))

We report the results with texreg::screenreg.

texreg::screenreg(models$ols.model)
## 
## ==============================================================
##              Model 1    Model 2    Model 3  Model 4  Model 5  
## --------------------------------------------------------------
## (Intercept)  -0.04 ***   0.05 ***   0.01    -0.00     0.01 ***
##              (0.00)     (0.01)     (0.00)   (0.00)   (0.00)   
## yhat         -2.32 **   -0.14      -1.01    -0.19    -0.22    
##              (0.77)     (2.82)     (0.87)   (0.51)   (0.72)   
## --------------------------------------------------------------
## R^2           0.35       0.00       0.07     0.01     0.01    
## Adj. R^2      0.31      -0.06       0.02    -0.05    -0.05    
## Num. obs.    19         19         19       19       19       
## RMSE          0.01       0.02       0.01     0.01     0.01    
## ==============================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

Well, the R2 shows some evidence that shorter horizons have better results in the encompassing test. But, we got some negative betas! This means that, for some horizons, it might be better to take the opposite suggestion of the forecast!

Trading based on forecasts

In a practical trading applications, it might not be of interest to forecast actual returns. If you are trading according to these forecasts, you are probably more worried about the direction of the forecasts and not its nominal error. A model can have bad nominal forecasts, but be good in predicting the sign of the next price movement. If this is the case, you can still make money even though your model fails in the encompassing test.

Let’s try it out with a simple trading strategy for all different horizons:

  • buy in end of day t if forecast in t+1 is positive and sell at the end of t+1
  • short-sell in the end of day t when forecast for t+1 is negative and buy it back in the end of t+1

The total profit will be given by:

my.profit <- sum(with(my.result, (yhat>0)*y + (yhat<0)*-y))
print(my.profit)
## [1] -0.7584576

Not bad! Doesn’t look like much, but remember that we have a few trading days and this return might be due to a sistematic effect in the market. Let’s see how this result compares to random trading signals.

n.sim <- 10000

monkey.ret <- numeric(length = n.sim)
for (i in seq(n.sim)) {
  rnd.vec <- rnorm(length(my.result$y))
  
  monkey.ret[i] <- sum( (rnd.vec>0)*my.result$y + (rnd.vec<0)*-my.result$y )
  
} 

temp.df <- data.frame(monkey.ret, my.profit)
p <- ggplot(temp.df, aes(monkey.ret)) 
p <- p + geom_histogram()
p <- p + geom_vline(aes(xintercept =  my.profit),size=2)
p <- p + labs(x='Returns from random trading signals')
print(p)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The previous histogram shows the total return from randomnly generated signals in 10^{4} simulations. The vertical line is the result from using prophet. As you can see, it is a bit higher than the average of the distribution. The total return from prophet is lower than the return of the naive strategy in 99.72 percent of the simulations. This is not a bad result. But, notice that we didnt add trading or liquidity costs to the analysis, which will make the total returns worse.

The main results of this simple study are clear: prophet is bad at point forecasts for returns, specially for longer horizons, but does quite better in directional predictions. It might be interesting to test it further, with more data, adding trading costs, other forecasting setups, and see if the results hold.

Marcelo S. Perlin
Marcelo S. Perlin
Associate Professor

My research interests include data analysis, finance and cientometrics.

comments powered by Disqus