I’m just about to leave for my vacation and, as usual, I’ll write about the highlights of 2019 and my plans for the year to come. First, let’s talk about my work in 2019.
Highlights of 2019 The year of 2019 was not particularly fruitful in journal publications. I only had two: Accessing Financial Reports and Corporate Events with GetDFPData, published in RBfin and A consumer credit risk structural model based on affordability: balance at risk published in JCR. Both are papers I wrote back in 2017 and 2018 and not new articles.
This post is deprecated due to changes in package code. See the new post in this link. In the new edition of my R book, to be released in early 2020 (see current TOC, new packages and notification form), I’m giving special attention to its use in the classroom. For that, I’ve created class slides and R exercises in the static and dynamic form. All the extra content will be freely available in the internet and distributed with package afedR. Anyone can use it, without the need of purchasing the book (but off course it would help).
Back in 2017 I wrote the first international1 edition of my book “Analyzing Financial and Economic Data with R” (online version) . While I was happy with the content of the book at the time of publication, today I know I can make it better. As of early 2019, I’m working in the new edition of the book, taking my time (and weekends!) in fixing all issues, expanding chapters and writing new CRAN packages.
The current TOC is available here. Let me summarize the main changes from the previous edition:
Introduction In my latest post I wrote about package GetEdgarData, which downloaded structured data from the SEC. I’ve been working on this project and soon realized that the available data at the SEC/DERA section is not complete. For example, all Q4 statements are missing. This seems to be the way all exchanges release the financial documents. I’ve found the same problem here in the Brazilian exchange.
It came to my attention that there is an alternative way of fetching corporate data and adjusted prices, the SimFin project. From its own website:
Introduction As of 2019-10-31, this package is discontinued and will not longer be updated. See this post for more details about the alternative, package simfinR.
Every company traded in the US stock market must report its quarterly and yearly documents to the SEC and the public in general. This includes its accounting statements (10-K, 10-K) and any other corporate event that is relevant to investors.
Edgar is the interface where we can search for a company’s filling information. By looking up a company’s CIK code, one can find all previous filling information.
The shiny version of GetDFPData is currently hosted in a private server at DigitalOcean. A problem with the basic (5 USD) server I was using is with the low amount of available memory (RAM and HD). With that, I had to limit all xlsx queries for the data, otherwise the shiny app would ran out of memory. After upgrading R in the server, the xlsx option was no longer working.
Today I tried all tricks in the book for keeping the 5 USD server and get the code to work.
Introduction Quandl is one of the best platforms for finding and downloading financial and economic time series. The collection of free databases is solid and I use it intensively in my research and class material.
But, a couple of things from the native package Quandl always bothered me:
Multiple data is always returned in the wide (column oriented) format (why??); No local caching of data; No control for importing error and status; Not easy to work within the tidyverse collection of packages As you suspect, I decided to tackle the problem over the weekend.
Update 2019-08-09: The shutdown is just postponed to 2019-11-14. See the official release here.
Surprise, surprise. B3’s ftp site is still up and running.
Following previous post regarding the shutdown of B3’s ftp site and its impact over GetHFData, I’m happy to report that the site is up and running.
We can check it with code:
library(GetHFData) library(tidyverse) df.ftp <- ghfd_get_ftp_contents(type.market = 'equity') # check time difference max(df.ftp$dates) - min(df.ftp$dates) Let’s download some trade data:
df.trades <- ghfd_get_HF_data(my.assets = 'PETR3', type.market = 'equity', first.date = max(df.ftp$dates)-3, last.date = max(df.
Well, bad news travels fast.
Over the last couple of weeks I’ve been receiving a couple of emails regarding B3’s decision of shutting down its ftp site. More specifically, users are eager to know how it will impact my data grabbing packages in CRAN. I’ll use this post to explain the situation for everyone.
The only package affected directly will be GetHFData, which uses the ftp site for downloading the raw zipped files with trades and quotes. The main function will no longer work as all internet files are not available.
One of the investment concepts that every long term investor should know is the effect of consistency over corporate performance. The main idea is that older and profitable companies are likely to continue to be profitable and even improve its performance in the upcoming years. Likewise, companies with constant losses are likely to continue in the same path.
This idea is related to the Lindy Effect. Quoting directly from wikipedia:
The Lindy effect is a theory that the future life expectancy of some non-perishable things like a technology or an idea is proportional to their current age, so that every additional period of survival implies a longer remaining life expectancy.