R | msperlin

Major update to BatchGetSymbols

I just released a long due update to package BatchGetSymbols. The files are under review in CRAN and you should get the update soon. Meanwhile, you can install the new version from Github: if (!require(devtools)) install.packages('devtools') devtools::install_github('msperlin/BatchGetSymbols') The main innovations are: Clever cache system: By default, every new download of data will be saved in a local file located in a directory chosen by user. Every new request of data is compared to the available local information. If data is missing, the function only downloads the piece of data that is missing.

Looking back in 2017 and plans for 2018

My blog in 2017 As we come close to the end of 2017, its time to look back. This has been a great year for me in many ways. This blog started as a way to write short pieces about using R for finance and promote my book in an organic way. Today, I’m very happy with my decision. Discovering and trying new writing styles keeps my interest very much alive. Academic research is very strict on what you can write and publish. It is satisfying to see that I can promote my work and have an impact in different ways, not only through the publication of academic papers.

Serving shiny apps in the internet with your own server

In this post I’ll share my experience in setting up my own virtual server for hosting shiny applications in Digital Ocean. First, context. I’m working in a academic project where we build a package for accessing financial data and corporate events directly from B3, the Brazilian financial exchange. The objective is to set a reproducible standard and facilite data acquisition of a large, and very interesting, dataset. The result is GetDFPData. Since many researchers and students in Brazil are not knowledgeable in R, we needed to make it easier for people to use the software.

Package GetDFPData

Package GetDFPData is being substituted by GetDFPData2. See this blog post for details. Financial statements of companies traded at B3 (formerly Bovespa), the Brazilian stock exchange, are available in its website. Accessing the data for a single company is straightforward. In the website one can find a simple interface for accessing this dataset. An example is given here. However, gathering and organizing the data for a large scale research, with many companies and many dates, is painful. Financial reports must be downloaded or copied individually and later aggregated.

The Brazilian Yield Curve

The latest version of GetTDData offers function get.yield.curve to download the current Brazilian yield curve directly from Anbima. The yield curve is a financial tool that, based on current prices of fixed income instruments, shows how the market perceives the future real, nominal and inflation returns. You can find more details regarding the use and definition of a yield curve in Investopedia. Unfortunately, function get.yield.curve only downloads the current yield curve from the website. Data for historical curves over five business days are not available in Anbima website. The new version of GetTDData is available in github and CRAN:

Studying CRAN package names

Setting a name for a CRAN package is an intimate process. Out of an infinite range of possibilities, an idea comes for a package and you spend at least a couple of days writing up and testing your code before submitting to CRAN. Once you set the name of the package, you cannot change it. Your choice index your effort and, it shouldn’t be a surprise that the name of the package can improve its impact. Looking at package names, one strategy that I commonly observe is to use small words, a verb or noun, and add the letter R to it.

My Book about using R in Finance

I am very please to announce that my book,Processing and Analyzing Financial Data with R, is finally out! This book is an english version of my previous title in portuguese. This is a long term project that I plan to keep on working over the years. You can find it in Amazon. Following great titles about R, I decided to also publish an online version with full content here. More details about the book, including table of contents, is availabe in its webpage.

Can we predict stock prices with Prophet?

Facebook recently released a API package allowing access to its forecasting model called prophet. According to the underling post: It's not your traditional ARIMA-style time series model. It's closer in spirit to a Bayesian-influenced generalized additive model, a regression of smooth terms. The model is resistant to the effects of outliers, and supports data collected over an irregular time scale (ingliding presence of missing data) without the need for interpolation. The underlying calculation engine is Stan; the R and Python packages simply provide a convenient interface. After reading it, I got really curious about the predictive performance of this method for stock prices.

Writing a R book and self-publishing it in Amazon

Many people, including my university colleagues and friends, have asked me about the process of writing a book and self publishing it in Amazon. You can find the details about the english version of the book here and here. Given so much interest, I’m going to report the whole process in this post. First, motivation. Why did I write a book? I am a university professor. Writing is a major part of my work and I really enjoy it. Think about it, it is a magical process. I press a specific and long combination of strokes in my keyboard and that translates into information distributed all over the world.

Using R to study tennis players

In the previous post about tennis, we studied how changes in ball’s composition in hard and grass courts affected the game back in 2000. In this post, we will analyse a different dataset from the same repository and look at the players winning records in ATP matches. The data I’m again using the great repository of tennis data of Jeff Sackmann. In this case, however, I’m using the ATP repository that contains ATP match data since 1968 until today. Again, I thank Jeff Sackmann for making this dataset publicly available.