BatchGetSymbols is now parallel!

and faaaaast

BatchGetSymbols is my most downloaded package by any count. Computation time, however, has always been an issue. While downloading data for 10 or less stocks is fine, doing it for a large ammount of tickers, say the SP500 composition, gets very boring.

I’m glad to report that time is no longer an issue. Today I implemented a parallel option for BatchGetSymbols. If you have a high number of cores in your computer, you can seriously speep up the importation process. Importing SP500 compositition, over 500 stocks, is a breeze.

Give a try. The new version is already available in Github:

devtools::install_github('msperlin/BatchGetSymbols')

It should be in CRAN soon.

How to use parallel

Very simple. Just set you parallel plan with future::plan and use input do.parallel = TRUE in BatchGetSymbols. If you are not sure how many cores you have available, just run the following code to figure it out:

future::availableCores()
## system 
##     16
#devtools::install_github('msperlin/BatchGetSymbols')
library(BatchGetSymbols)

# get tickers from SP500
df.sp500 <- GetSP500Stocks()
tickers <- df.sp500$Tickers
  
future::plan(future::multisession, 
             workers = 10) # use 10 cores (future::availableCores())

# dowload data for 50 stocks  
l.out <- BatchGetSymbols(tickers = tickers[1:50], 
                         first.date = '2010-01-01', 
                         last.date = '2019-01-01',
                         do.parallel = TRUE, 
                         do.cache = FALSE)
## 
 Progress: ─────────────────────────────────────────           100%
 Progress: ───────────────────────────────────────────────     100%
 Progress: ─────────────────────────────────────────────────   100%
 Progress: ─────────────────────────────────────────────────── 100%
glimpse(l.out)
## List of 2
##  $ df.control: tibble [50 × 6] (S3: tbl_df/tbl/data.frame)
##   ..$ ticker              : chr [1:50] "MMM" "ABT" "ABBV" "ABMD" ...
##   ..$ src                 : chr [1:50] "yahoo" "yahoo" "yahoo" "yahoo" ...
##   ..$ download.status     : chr [1:50] "OK" "OK" "OK" "OK" ...
##   ..$ total.obs           : int [1:50] 2264 2264 1510 2264 2264 2264 2264 2264 2264 2264 ...
##   ..$ perc.benchmark.dates: num [1:50] 1 1 0.667 1 1 ...
##   ..$ threshold.decision  : chr [1:50] "KEEP" "KEEP" "OUT" "KEEP" ...
##  $ df.tickers:'data.frame':  106408 obs. of  10 variables:
##   ..$ price.open         : num [1:106408] 83.1 82.8 83.9 83.3 83.7 ...
##   ..$ price.high         : num [1:106408] 83.4 83.2 84.6 83.8 84.3 ...
##   ..$ price.low          : num [1:106408] 82.7 81.7 83.5 82.1 83.3 ...
##   ..$ price.close        : num [1:106408] 83 82.5 83.7 83.7 84.3 ...
##   ..$ volume             : num [1:106408] 3043700 2847000 5268500 4470100 3405800 ...
##   ..$ price.adjusted     : num [1:106408] 63.5 63.1 64 64.1 64.5 ...
##   ..$ ref.date           : Date[1:106408], format: "2010-01-04" "2010-01-05" ...
##   ..$ ticker             : chr [1:106408] "MMM" "MMM" "MMM" "MMM" ...
##   ..$ ret.adjusted.prices: num [1:106408] NA -0.006263 0.014182 0.000717 0.007047 ...
##   ..$ ret.closing.prices : num [1:106408] NA -0.006264 0.014182 0.000717 0.007046 ...
Marcelo S. Perlin
Marcelo S. Perlin
Associate Professor

My research interests include data analysis, finance and cientometrics.

comments powered by Disqus