BatchGetSymbols is now parallel!
and faaaaast
BatchGetSymbols is my most downloaded package by any count. Computation time, however, has always been an issue. While downloading data for 10 or less stocks is fine, doing it for a large ammount of tickers, say the SP500 composition, gets very boring.
I’m glad to report that time is no longer an issue. Today I implemented a parallel option for BatchGetSymbols. If you have a high number of cores in your computer, you can seriously speep up the importation process. Importing SP500 compositition, over 500 stocks, is a breeze.
Give a try. The new version is already available in Github:
devtools::install_github('msperlin/BatchGetSymbols')
It should be in CRAN soon.
How to use parallel
Very simple. Just set you parallel plan with future::plan
and use input do.parallel = TRUE
in BatchGetSymbols
. If you are not sure how many cores you have available, just run the following code to figure it out:
future::availableCores()
## system
## 16
#devtools::install_github('msperlin/BatchGetSymbols')
library(BatchGetSymbols)
# get tickers from SP500
df.sp500 <- GetSP500Stocks()
tickers <- df.sp500$Tickers
future::plan(future::multisession,
workers = 10) # use 10 cores (future::availableCores())
# dowload data for 50 stocks
l.out <- BatchGetSymbols(tickers = tickers[1:50],
first.date = '2010-01-01',
last.date = '2019-01-01',
do.parallel = TRUE,
do.cache = FALSE)
##
Progress: ───────────────────────────────────────── 100%
Progress: ─────────────────────────────────────────────── 100%
Progress: ───────────────────────────────────────────────── 100%
Progress: ─────────────────────────────────────────────────── 100%
glimpse(l.out)
## List of 2
## $ df.control: tibble [50 × 6] (S3: tbl_df/tbl/data.frame)
## ..$ ticker : chr [1:50] "MMM" "ABT" "ABBV" "ABMD" ...
## ..$ src : chr [1:50] "yahoo" "yahoo" "yahoo" "yahoo" ...
## ..$ download.status : chr [1:50] "OK" "OK" "OK" "OK" ...
## ..$ total.obs : int [1:50] 2264 2264 1510 2264 2264 2264 2264 2264 2264 2264 ...
## ..$ perc.benchmark.dates: num [1:50] 1 1 0.667 1 1 ...
## ..$ threshold.decision : chr [1:50] "KEEP" "KEEP" "OUT" "KEEP" ...
## $ df.tickers:'data.frame': 106408 obs. of 10 variables:
## ..$ price.open : num [1:106408] 83.1 82.8 83.9 83.3 83.7 ...
## ..$ price.high : num [1:106408] 83.4 83.2 84.6 83.8 84.3 ...
## ..$ price.low : num [1:106408] 82.7 81.7 83.5 82.1 83.3 ...
## ..$ price.close : num [1:106408] 83 82.5 83.7 83.7 84.3 ...
## ..$ volume : num [1:106408] 3043700 2847000 5268500 4470100 3405800 ...
## ..$ price.adjusted : num [1:106408] 63.5 63.1 64 64.1 64.5 ...
## ..$ ref.date : Date[1:106408], format: "2010-01-04" "2010-01-05" ...
## ..$ ticker : chr [1:106408] "MMM" "MMM" "MMM" "MMM" ...
## ..$ ret.adjusted.prices: num [1:106408] NA -0.006263 0.014182 0.000717 0.007047 ...
## ..$ ret.closing.prices : num [1:106408] NA -0.006264 0.014182 0.000717 0.007046 ...