Looking back at 2018 and plans for 2019
At the end of every year I plan to write about the highlight of the current year and set plans for the future. First, let’s talk about my work in 2018.
Highlights of 2018
Research wise, my scientometrics paper Is predatory publishing a real threat? Evidence from a large database study was featured in many news outlets. Its altmetric page is doing great, with over 1100 downloads and featured at top 5% of all research output measured by altmetric. This is, by far, the most impactful research piece I ever wrote. Its rewarding to see my work featured in the local and international media.
This year I also released the first version of GetDFPData, a R package for accessing a large database of financial information from B3, the Brazilian exchange. I’m glad to report that many people are using it for their own research. I can see the number of visits in the web interface and the frequent emails I get about the package. The feedback from other researchers has been great but, off course, there are always ways to improve the code. I’ve been constantly developing it over time.
The GetDFPData package also had an impact in my own research. I’ve always been biased towards the topic of capital markets and now I’m doing research in corporate finance, mostly due to the new access to a large database of corporate events. Currently, I have three paper initiatives in analyzing the effect of boards formation towards financial performance of Brazilian companies. These will likely probably be published in 2019 or 2020.
In late 2018 I started my YouTube series padfeR, with video tutorials about using R for Finance and Economics. The idea is to have a greater impact and help those that are starting to use R. So far, all videos are in Portuguese but I do have plans for doing it in english in the future. Hopefully I’ll find some time in 2019 to start it.
Overall, 2018 was a great year. I’m always thankful for having the opportunity of working in a job that I love and look forward to work (almost) every single day.
My blog posts in 2018
In november I changed the technology behind my blog from Jekyll to Hugo. Can’t stress enough how much I’m liking the Academic template built with blogdown and hosted in my own server. It is far easier to write posts and maintain the website.
First, let’s see how many posts I have so far.
my.blog.folder <- '~/Dropbox/11-My Website/01-msperlin.com/content/post/'
post.files <- list.files(path = my.blog.folder, pattern = '.Rmd')
post.files
## [1] "2017-02-16-Writing-a-book.Rmd"
## [2] "2017-02-16-Writing-a-book.Rmd.lock~"
## [3] "2017-12-06-Package-GetDFPData.Rmd"
## [4] "2017-12-06-Package-GetDFPData.Rmd.lock~"
## [5] "2017-12-13-Serving-shiny-apps-internet.Rmd"
## [6] "2017-12-13-Serving-shiny-apps-internet.Rmd.lock~"
## [7] "2017-12-30-Looking-Back-2017.Rmd"
## [8] "2017-12-30-Looking-Back-2017.Rmd.lock~"
## [9] "2018-01-22-Update-BatchGetSymbols.Rmd"
## [10] "2018-01-22-Update-BatchGetSymbols.Rmd.lock~"
## [11] "2018-03-16-Writing_Papers_About_Pkgs.Rmd"
## [12] "2018-03-16-Writing_Papers_About_Pkgs.Rmd.lock~"
## [13] "2018-04-22-predatory-scientometrics.Rmd"
## [14] "2018-04-22-predatory-scientometrics.Rmd.lock~"
## [15] "2018-05-12-Investing-Long-Run.Rmd"
## [16] "2018-05-12-Investing-Long-Run.Rmd.lock~"
## [17] "2018-06-12-padfR-ed2.Rmd"
## [18] "2018-06-12-padfR-ed2.Rmd.lock~"
## [19] "2018-06-29-BenchmarkingSSD.Rmd"
## [20] "2018-06-29-BenchmarkingSSD.Rmd.lock~"
## [21] "2018-10-10-BatchGetSymbols-NewVersion.Rmd"
## [22] "2018-10-10-BatchGetSymbols-NewVersion.Rmd.lock~"
## [23] "2018-10-11-Update-GetLattesData.Rmd"
## [24] "2018-10-11-Update-GetLattesData.Rmd.lock~"
## [25] "2018-10-13-NewPackage-PkgsFromFiles.Rmd"
## [26] "2018-10-13-NewPackage-PkgsFromFiles.Rmd.lock~"
## [27] "2018-10-19-R-and-loops.Rmd"
## [28] "2018-10-19-R-and-loops.Rmd.lock~"
## [29] "2018-10-20-Linux-and-R.Rmd"
## [30] "2018-10-20-Linux-and-R.Rmd.lock~"
## [31] "2018-11-03-NewBlog.Rmd"
## [32] "2018-11-03-NewBlog.Rmd.lock~"
## [33] "2018-11-03-RstudioTricks.Rmd"
## [34] "2018-11-03-RstudioTricks.Rmd.lock~"
## [35] "2019-01-08-Looking-Back-2018.Rmd"
## [36] "2019-01-08-Looking-Back-2018.Rmd.lock~"
## [37] "2019-01-12-GetDFPData-ver14.Rmd"
## [38] "2019-01-12-GetDFPData-ver14.Rmd.lock~"
## [39] "2019-03-09-pafdR-promotion.Rmd"
## [40] "2019-03-09-pafdR-promotion.Rmd.lock~"
## [41] "2019-03-10-pafdR-promotion_2.Rmd"
## [42] "2019-03-10-pafdR-promotion_2.Rmd.lock~"
## [43] "2019-03-23-Bettina-Case.Rmd"
## [44] "2019-03-23-Bettina-Case.Rmd.lock~"
## [45] "2019-04-13-Parallel-BatchGetsymbols.Rmd"
## [46] "2019-04-13-Parallel-BatchGetsymbols.Rmd.lock~"
## [47] "2019-04-15-GetBCBData.Rmd"
## [48] "2019-04-15-GetBCBData.Rmd.lock~"
## [49] "2019-05-01-MeanVariance.Rmd"
## [50] "2019-05-01-MeanVariance.Rmd.lock~"
## [51] "2019-05-17-R-in-Brazil.Rmd"
## [52] "2019-05-17-R-in-Brazil.Rmd.lock~"
## [53] "2019-05-20-Lindy-Effect.Rmd"
## [54] "2019-05-20-Lindy-Effect.Rmd.lock~"
## [55] "2019-07-01-ftp-shutdown.Rmd"
## [56] "2019-07-01-ftp-shutdown.Rmd.lock~"
## [57] "2019-08-08-ftp-NOT-shutdown.Rmd"
## [58] "2019-08-08-ftp-NOT-shutdown.Rmd.lock~"
## [59] "2019-10-01-new-package-GetQuandlData.Rmd"
## [60] "2019-10-01-new-package-GetQuandlData.Rmd.lock~"
## [61] "2019-10-12-support-GetDFPData-shiny.Rmd"
## [62] "2019-10-12-support-GetDFPData-shiny.Rmd.lock~"
## [63] "2019-10-16-new-package-GetEdgarData.Rmd"
## [64] "2019-10-16-new-package-GetEdgarData.Rmd.lock~"
## [65] "2019-11-01-new-package-simfinR.Rmd"
## [66] "2019-11-01-new-package-simfinR.Rmd.lock~"
## [67] "2019-11-25-feedback-TOC-afedR.Rmd"
## [68] "2019-11-25-feedback-TOC-afedR.Rmd.lock~"
## [69] "2019-12-02-dynamic-exercises-afedR.Rmd"
## [70] "2019-12-02-dynamic-exercises-afedR.Rmd.lock~"
## [71] "2019-12-15-Looking-Back-2019.Rmd"
## [72] "2019-12-15-Looking-Back-2019.Rmd.lock~"
## [73] "2020-01-15-afedR-ed2-announcement.Rmd"
## [74] "2020-01-15-afedR-ed2-announcement.Rmd.lock~"
## [75] "2020-02-25-afedR-ed2-slides-available.Rmd"
## [76] "2020-02-25-afedR-ed2-slides-available.Rmd.lock~"
## [77] "2020-03-29-garch-tutorial-in-r.Rmd"
## [78] "2020-03-29-garch-tutorial-in-r.Rmd.lock~"
## [79] "2020-04-17-update-getdfpdata.Rmd"
## [80] "2020-04-17-update-getdfpdata.Rmd.lock~"
## [81] "2020-04-20-free-compiled-data-in-site.Rmd"
## [82] "2020-04-20-free-compiled-data-in-site.Rmd.lock~"
## [83] "2020-04-20-new-package-GetCVMData.Rmd"
## [84] "2020-04-20-new-package-GetCVMData.Rmd.lock~"
## [85] "2020-04-25-investments-costs.Rmd"
## [86] "2020-04-25-investments-costs.Rmd.lock~"
## [87] "2020-05-24-pirf-is-online.Rmd"
## [88] "2020-05-24-pirf-is-online.Rmd.lock~"
## [89] "2020-05-27-call-for-papers-rac.Rmd"
## [90] "2020-05-27-call-for-papers-rac.Rmd.lock~"
## [91] "2020-07-07-garch-tutorial-in-r-REVISED.Rmd"
## [92] "2020-07-07-garch-tutorial-in-r-REVISED.Rmd.lock~"
## [93] "2020-07-18-new_packages-GetFREData-GetDFPData2.Rmd"
## [94] "2020-07-18-new_packages-GetFREData-GetDFPData2.Rmd.lock~"
## [95] "2020-12-22-Looking-Back-2020.Rmd"
## [96] "2020-12-22-Looking-Back-2020.Rmd.lock~"
## [97] "2021-02-18-dynamic-exercises-adfeR.Rmd"
## [98] "2021-02-18-dynamic-exercises-adfeR.Rmd.lock~"
## [99] "2021-02-19-dynamic-exercises-adfeR.Rmd.lock~"
## [100] "2021-02-20-adfeR-ed3-announcement.Rmd"
## [101] "2021-02-20-adfeR-ed3-announcement.Rmd.lock~"
## [102] "2021-02-28-dynamic-exercises-afedR.Rmd"
## [103] "2021-02-28-dynamic-exercises-afedR.Rmd.lock~"
The blog started in january 2017 and, over time, I wrote 103 posts. That feels alright. I’m not felling forced to write and I do it whenever I fell like I have something to share.
Let’s get more information from the .Rmd files. I’ll write function read_blog_files
and use it for all post files.
read_blog_files <- function(f.in) {
require(tidyverse)
my.front.matter <- rmarkdown::yaml_front_matter(f.in)
df.out <- data_frame(post.title = my.front.matter$title,
post.date = lubridate::ymd(my.front.matter$date),
post.month = as.Date(format(post.date, '%Y-%m-01')),
tags = paste0(my.front.matter$tags, collapse = ';'),
categories = paste0(my.front.matter$categories, collapse = ';'),
content = paste0(read_lines(f.in), collapse = ' '))
return(df.out)
}
df.posts <- dplyr::bind_rows(purrr::map(post.files, read_blog_files))
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.3.3 ✔ purrr 0.3.4
## ✔ tibble 3.1.0 ✔ dplyr 1.0.4
## ✔ tidyr 1.1.2 ✔ stringr 1.4.0
## ✔ readr 1.4.0 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## Please use `tibble()` instead.
glimpse(df.posts)
## Rows: 51
## Columns: 6
## $ post.title <chr> "Writing a R book and self-publishing it in Amazon", "Packa…
## $ post.date <date> 2017-02-16, 2017-12-06, 2017-12-13, 2017-12-30, 2018-01-22…
## $ post.month <date> 2017-02-01, 2017-12-01, 2017-12-01, 2017-12-01, 2018-01-01…
## $ tags <chr> "R;book;self-publish", "R;GetDFPData;corporate events;finan…
## $ categories <chr> "R;book;self-publish", "R;GetDFPData;B3", "R;shiny;webserve…
## $ content <chr> "--- title: \"Writing a R book and self-publishing it in Am…
First, we’ll look at the frequency of posts over time.
df.posts.2018 <- df.posts %>%
filter(post.date > as.Date('2018-01-01'))
print( ggplot(df.posts.2018, aes(x = post.month)) + geom_histogram(stat='count') +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(y = 'Number of posts', x = ''))
## Warning: Ignoring unknown parameters: binwidth, bins, pad
Seems to average about once a month. The blank spaces show that I did not write for a couple of months.
Checking 2018’s plans
In the end of 2017 my plans for 2018 were:
Work on the second edition of the portuguese book.
Done! I’m glad to report that the second edition of the book was published in June 2018. It was great to review the book and add several new chapters and sections. As I mentioned in the publication post, this is the largest and longest project I ever worked and it is very satisfying to see it develop over time. Even more satisfying is to receive positive feedback of readers that are reading and using the book to learn to code in R! Many teachers in Economics and Business are also starting to use it in the classroom.
The book will continue to be update every couple of years. One of the greatest things about R, among many others, is that the language is continually evolving and changing. I have no doubt that there will always be new material to write about.
Start a portal for financial data in Brazil
Unfortunately this project did not launch. I wrote a couple of R scripts for fetching and saving data automatically in my server but it never became a webpage. I started to work on other projects and the website was not a priority.
Plans for 2019
New edition of “Processing and Analyzing Financial Data with R”
The international version of my book pafdR was published in january 2017. I fell its time to update it with the new chapters and structure from the second edition in portuguese. There are many improvements to the book, with an emphasis in the tidyverse universe.
Work on my new book: “Investing For the Long Term” (provisory title)
There is a huge deficit of financial knowledge in Brazil, specially in saving and investing. I’ve been a long term investor for most of my career as an academic and I fell there is a lot I can contribute to the topic of financial education by bringing data science into the problem of investing.
The book will be a introduction to investments for the common person in Brazil, with a heavy data-based approach. It will not be about trading strategies or anything related to short term trading. The idea is to bring data analysis for the common long term investor, showing how the financial market works and how one can build passive income by constantly buying good financial contracts.
I have no clue if it will be published em 2019. Unlike my previous book, I’m taking my time to write this one. No rush and no deadlines :).
Solidify my research agenda in Board Composition
As I mentioned before, my research agenda has shifted from capital markets to board compositions. This is a very interesting topic with many implications for listed companies. I’m leaning a lot from researching into these topics.
Currently, I have four initiatives with different co-authors:
- Gender and board composition
- Politics and board composition
- Professors in the Board of Companies
- Board description of Brazilian Companies
Hoepfully, these will be published in 2019 or 2020.