Accessing Financial Reports and Corporate Events Data using GetDFPData

Marcelo S. Perlin

IAG/PUC-RJ, Rio de Janeiro, May 25, 2018

Who am I?

Teacher and researcher (PPGA/UFRGS, Porto Alegre-RS)

Loves data analysis, programming, and tennis

Homepage and blog: https://msperlin.github.io/

Author of Processing and Analyzing Financial Data with R - Online (free) version here

Introduction

GetDFPData provides an open R interface to all financial statements and corporate events distributed by B3 and CVM

Why should we care?
- Academic research
- Retail investors and journalists
- The original corporate data from B3 is unstructured and commercial software is expensive
How does GetDFPData improves existing solutions?
- Its free and available to everyone with a CL (command line) or shiny (web) interface
- made by researchers for researchers: tidy (tabular) datasets that play nice with the tidyverse
- Local caching system improves access speed (it never grabs the same data from B3 twice)
- It offers many datasets not found in commercial software (family relations, debt structure, auditing report and more)

Motivation

A look into past local publications (Qualis A2, B1):

Papers by Journal
Journal	Surveyed	Total	Percent
Contabilidade Vista & Revista	19	80	0.24
Revista Brasileira de Finanças	9	85	0.11
Revista Contabilidade & Finanças	34	118	0.29
Revista Contabilidade e Organizações	22	82	0.27
Revista Contemporânea de Contabilidade	21	112	0.19

What we learn?
- Corporate finance studies ammount to 25% of the literature in Finance in Brazil
- 73% of published papers used Economatica! 30% used BMF and CVM systems.
- An average study used 179 companies within a 7 year period!
  - As an informed guess, this ammounts to 1253 copy and paste operations from B3’s website, per table!

Using `GetDFPData`

The Web Interface

A web version was developed and published in the internet as a shiny application.

http://www.msperlin.com/shiny/GetDFPData/

The CL (command line) R interface

Installation of `GetDFPData`

Install R
Install RStudio
Install package:

# Release version in CRAN
install.packages('GetDFPData')

Using `GetDFPData`

The starting point of GetDFPData is looking for information about current companies.

library(GetDFPData)
library(dplyr)

df.info <- gdfpd.get.info.companies(type.data = 'companies')

## Found cache file. Loading data..

glimpse(df.info)

## Observations: 694
## Variables: 16
## $ name.company         <chr> "521 PARTICIPAÇOES S.A. - EM LIQUIDAÇÃO E...
## $ id.company           <int> 16330, 16284, 108, 20940, 21725, 19313, 1...
## $ cnpj                 <dbl> 1.547749e+12, 1.851771e+12, 6.066481e+13,...
## $ date.registration    <date> 1997-07-11, 1997-05-30, 1969-05-16, 2007...
## $ date.constitution    <date> 1996-07-30, 1997-04-02, 1944-08-18, 1976...
## $ city                 <chr> "RIO DE JANEIRO", "RIO DE JANEIRO", "PORT...
## $ estate               <chr> "RJ", "RJ", "RS", "SP", "SP", "SP", "SP",...
## $ situation            <chr> "ATIVO", "ATIVO", "CANCELADA", "CANCELADA...
## $ situation.operations <chr> "LIQUIDAÇÃO EXTRAJUDICIAL", "FASE OPERACI...
## $ listing.segment      <chr> NA, "Tradicional", "Tradicional", "Tradic...
## $ main.sector          <chr> NA, "Financeiro e Outros", "Materiais Bás...
## $ sub.sector           <chr> NA, "Outros", "Siderurgia e Metalurgia", ...
## $ segment              <chr> NA, "Outros", "Siderurgia", "Açucar e Alc...
## $ tickers              <chr> NA, "QVQP3B", NA, NA, "ADHM3", "AELP3", "...
## $ first.date           <date> 1998-12-31, 2001-12-31, 2009-12-31, 2009...
## $ last.date            <date> 2017-12-31, 2017-12-31, 2009-12-31, 2009...

Finding names of companies

Every company in the database can be identified by its official name. Function gdfpd.search.company allow the user to search for the official name of a company.

Let’s find the official name of Ambev, one of the largest companies in Brazil:

my.names <- gdfpd.search.company('ambev')

## Found cache file. Loading data..
## 
## Found 2 companies:
## COMPANHIA DE BEBIDAS DAS AMÉRICAS-AMBEV | situation = CANCELADA | first date = 2010-12-31 | last date - 2012-12-31
## AMBEV S.A.                              | situation = ATIVO | first date = 2012-12-31 | last date - 2017-12-31

Its official name in Bovespa records is AMBEV S.A..

Downloading financial information for one company

name.companies <- 'AMBEV S.A.'
first.date <- '2012-12-31'
last.date  <- '2016-12-31'

df.reports <- gdfpd.GetDFPData(name.companies = name.companies, 
                               first.date = first.date,
                               last.date = last.date)

## Found cache file. Loading data..
## Fetching new files from Bovespa.
## 
## Downloading data for 1 companies
## First Date: 2012-12-31
## Laste Date: 2016-12-31
## Inflation index: dollar
## 
## Downloading inflation data
##  Caching inflation RDATA into tempdir()  Done
## 
## Inputs looking good! Starting download of files:
## 
## AMBEV S.A.
##  Available periods: 2016-12-31   2015-12-31  2014-12-31  2013-12-31  2012-12-31
## 
## 
## Processing 23264 - AMBEV S.A.
##  Finding info from Bovespa
##      Found BOV cache file
##  Processing 23264 - AMBEV S.A. | date 2016-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2015-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2014-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2013-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2012-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file

The Result

The resulting object from gdfpd.GetDFPData is a tibble, a data.frame type of object that allows for list columns (Müller and Wickham 2017):

glimpse(df.reports)

## Observations: 1
## Variables: 40
## $ company.name                     <chr> "AMBEV S.A."
## $ company.code                     <int> 23264
## $ cnpj                             <chr> "07526557000100"
## $ date.company.constitution        <date> 2005-07-08
## $ date.cvm.registration            <date> 2013-10-30
## $ company.tickers                  <chr> "ABEV3"
## $ min.date                         <date> 2012-12-31
## $ max.date                         <date> 2016-12-31
## $ n.periods                        <int> 5
## $ company.segment                  <chr> "Tradicional"
## $ current.stockholders             <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ current.stock.composition        <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.assets                        <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.liabilities                   <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.income                        <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.cashflow                      <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.assets.consolidated           <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.liabilities.consolidated      <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.income.consolidated           <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.cashflow.consolidated         <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.auditing.report               <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.dividends                <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.stockholders             <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.capital.issues           <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.mkt.value                <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.capital.increases        <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.capital.reductions       <list> [<>]
## $ history.stock.repurchases        <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.other.stock.events       <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.compensation             <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.compensation.summary     <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.transactions.related     <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.debt.composition         <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.governance.listings      <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.board.composition        <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.committee.composition    <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.family.relations         <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.family.related.companies <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.auditing                 <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.responsible.docs         <list> [<c("AMBEV S.A.", "AMBEV S.A...

All financial statements for the different quarters are available within df.reports. For example, the income statements for all desired quarters of AMBEV are:

df.income <- df.reports$fr.income[[1]]

glimpse(df.income)

## Observations: 130
## Variables: 6
## $ name.company       <chr> "AMBEV S.A.", "AMBEV S.A.", "AMBEV S.A.", "...
## $ ref.date           <date> 2016-12-31, 2016-12-31, 2016-12-31, 2016-1...
## $ acc.number         <chr> "3.01", "3.02", "3.03", "3.04", "3.04.01", ...
## $ acc.desc           <chr> "Receita de Venda de Bens e/ou Serviços", "...
## $ acc.value          <dbl> 20634193, -11263620, 9370573, 4006806, -399...
## $ acc.value.infl.adj <dbl> 6332420.75, -3456688.66, 2875732.09, 122964...

Downloading financial information for several companies

GetDFPData is specially designed for handling large scale download of data. Let’s build a case with three randomly selected active companies:

my.companies <- c("COMPANHIA CELG DE PARTICIPAÇÕES", "DOHLER S.A.", "KLABIN S.A.")

first.date <- '2012-01-31'
last.date  <- '2017-01-31'

df.reports <- gdfpd.GetDFPData(name.companies = my.companies, 
                               first.date = first.date,
                               last.date = last.date)

And now we can check the resulting tibble:

glimpse(df.reports )

## Observations: 3
## Variables: 40
## $ company.name                     <chr> "COMPANHIA CELG DE PARTICIPAÇ...
## $ company.code                     <int> 21393, 5207, 12653
## $ cnpj                             <chr> "08560444000193", "8468340800...
## $ date.company.constitution        <date> 2006-12-04, 1755-12-09, 1978...
## $ date.cvm.registration            <date> 2008-03-14, 1973-12-26, 1997...
## $ company.tickers                  <chr> "GPAR3", "DOHL3;DOHL4", "KLBN...
## $ min.date                         <date> 2012-12-31, 2012-12-31, 2012...
## $ max.date                         <date> 2016-12-31, 2016-12-31, 2016...
## $ n.periods                        <int> 5, 5, 5
## $ company.segment                  <chr> "Tradicional", "Tradicional",...
## $ current.stockholders             <list> [<c("COMPANHIA CELG DE PARTI...
## $ current.stock.composition        <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.assets                        <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.liabilities                   <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.income                        <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.cashflow                      <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.assets.consolidated           <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.liabilities.consolidated      <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.income.consolidated           <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.cashflow.consolidated         <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.auditing.report               <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.dividends                <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.stockholders             <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.capital.issues           <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.mkt.value                <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.capital.increases        <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.capital.reductions       <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.stock.repurchases        <list> [<>, <>, <c("KLABIN S.A.", "...
## $ history.other.stock.events       <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.compensation             <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.compensation.summary     <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.transactions.related     <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.debt.composition         <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.governance.listings      <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.board.composition        <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.committee.composition    <list> [<COMPANHIA CELG DE PARTICIP...
## $ history.family.relations         <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.family.related.companies <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.auditing                 <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.responsible.docs         <list> [<c("COMPANHIA CELG DE PARTI...

Integration with `tidyverse`

library(tidyverse)

# set assets tbl
tab.assets <- bind_rows(df.reports$fr.assets) %>%
  filter(acc.desc == 'Ativo Total') %>%
  group_by(name.company, ref.date) %>%
  summarise(Total.Assets = acc.value)

# plot it
p <- ggplot(tab.assets, aes(x = ref.date, y=Total.Assets)) +
  geom_col() + facet_wrap(~name.company, scales = 'free')
print(p)

Exporting financial data

The package also includes function gdfpd.export.DFP.data for exporting the financial data in the Excel or csv format.

Next we present an example of saving the output as an Excel file.

my.basename <- 'MyExcelData'
my.type.export <- 'xlsx' # only supported so far

gdfpd.export.DFP.data(df.reports = df.reports, 
                      base.file.name = my.basename,
                      type.export = my.type.export)

## File  MyExcelData.xlsx  already exists. Deleting it..
## Start Copying Dataframes to MyExcelData.xlsx
##  Copying table metadata - 3 rows, 5 columns
##  Copying table metadata - 3 rows, 5 columns
##  Copying table current.stockholders - 28 rows, 6 columns
##  Copying table current.stock.composition - 9 rows, 4 columns
##  Copying table fr.assets - 240 rows, 6 columns
##  Copying table fr.liabilities - 375 rows, 6 columns
##  Copying table fr.income - 400 rows, 6 columns
##  Copying table fr.cashflow - 316 rows, 6 columns
##  Copying table fr.assets.consolidated - 225 rows, 6 columns
##  Copying table fr.liabilities.consolidated - 390 rows, 6 columns
##  Copying table fr.income.consolidated - 504 rows, 6 columns
##  Copying table fr.cashflow.consolidated - 350 rows, 6 columns
##  Copying table fr.auditing.report - 15 rows, 3 columns
##  Copying table history.dividends - 149 rows, 12 columns
##  Copying table history.stockholders - 131 rows, 14 columns
##  Copying table history.capital.issues - 30 rows, 5 columns
##  Copying table history.mkt.value - 15 rows, 5 columns
##  Copying table history.capital.increases - 72 rows, 8 columns
##  Copying table history.capital.reductions - 6 rows, 12 columns
##  Copying table history.stock.repurchases - 16 rows, 12 columns
##  Copying table history.other.stock.events - 17 rows, 8 columns
##  Copying table history.compensation - 45 rows, 18 columns
##  Copying table history.compensation.summary - 45 rows, 9 columns
##  Copying table history.transactions.related - 48 rows, 13 columns
##  Copying table history.debt.composition - 27 rows, 9 columns
##  Copying table history.governance.listings - 15 rows, 5 columns
##  Copying table history.board.composition - 365 rows, 18 columns
##  Copying table history.committee.composition - 10 rows, 18 columns
##  Copying table history.family.relations - 83 rows, 10 columns
##  Copying table history.family.related.companies - 419 rows, 11 columns
##  Copying table history.auditing - 21 rows, 10 columns
##  Copying table history.responsible.docs - 25 rows, 5 columns
## 
## Export sucessful. Data available at file MyExcelData.xlsx

Example of usage: Family companies and financial performance

To illustrate the use of GetDFPData we select a topic that is relatively simple and current: the relationship between family companies and financial performance.

Methodology

In order to test if a founding-family company has a better/worst performance, we analyze a large sample of Boards of Directors in Brazil. To do this, we estimate the following (fixed effects panel data) model:

Where the main variables of interest are:

- Return on assets (ROA) and return on equity (ROE) for company i, year t.

- percentage of participation of family members in boards for company i, year t.

And the controls variables are:

- log of total assets, company i, year t.

- log of companies age, company i, year t.

- standard deviation of daily returns of most liquid stock, calculated by year and by company.

- ratio of total debt to total assets.

Example of usage: The Code

Our sample selection procedure follows the aforementioned studies. First, we’ll filter the data for only companies that have tickers, that is, are tradeable in the equity market, and those not belonging in the financial and utility sectors. This is a common filter in corporate finance studies as these sectors have specific dynamics related to their performance measured by ROE and ROA.

library(GetDFPData)
library(dplyr)
library(stringr)

# get info table
df.info <- gdfpd.get.info.companies(type.data = 'companies')

## Found cache file. Loading data..

# filter companies
df.info <- df.info %>%
  filter(!is.na(tickers),
         !(main.sector %in% c('Financeiro e Outros', 'Utilidade Pública')))

We find 242 companies with this simple filter. Now, let’s download data for the selected companies. Be aware that this is a lengthy download and may take a significant amount of time.

# set options
first.date <- '2012-01-31'
last.date  <- '2018-01-01'
my.companies <- df.info$name.company

# get data
df.reports <- gdfpd.GetDFPData(name.companies = my.companies,
                               first.date = first.date,
                               last.date = last.date)

The next step is the download of stock data with package BatchGetSymbols and the calculation of our volatility measure, . Before downloading the data, we need to find all tickers of all companies. In the case of more than one stock for a company, we keep the asset with most financial volume.

library(BatchGetSymbols)
library(lubridate)

get.ticker.df <- function(df.in) {
  # Gets ticker string and organizes it in another data_frame

  temp.split <- str_split(df.in$tickers, ';')[[1]]

  temp.df <- data_frame(name.company = df.in$name.company,
                        ticker = temp.split)
}

# get dataframe with tickers
my.l <- by(data = df.info,
           INDICES = df.info$name.company,
           FUN = get.ticker.df)

# bind into one dataframe
df.tickers <- do.call(bind_rows, my.l)

# add .SA for yahoo finance
my.tickers <- paste0(df.tickers$ticker, '.SA')

# get data
df.stocks <- BatchGetSymbols(tickers = my.tickers,
                             first.date = '2010-01-01',
                             last.date = '2017-01-01',
                             bench.ticker = '^BVSP',
                             thresh.bad.data = 0.5)[[2]]

# replace .SA (keep same notation)
df.stocks$ticker <- str_replace_all(df.stocks$ticker,
                                    fixed('.SA'), '')

# join with ticker df
df.stocks <- inner_join(df.tickers, df.stocks)

# find selected tickers by volume
tab.volume <- df.stocks %>%
  group_by(name.company, ticker) %>%
  summarise(mean.volume = mean(volume, na.rm = TRUE)) %>%
  group_by(name.company) %>%
  summarise(selected.ticker = ticker[which.max(mean.volume)])

df.stocks <- df.stocks[df.stocks$ticker %in% tab.volume$selected.ticker, ]

# calculate volatility measure
df.volat <- df.stocks %>%
  mutate(ref.date = as.Date(paste0(year(ref.date),'-12-31'))) %>%
  group_by(name.company, ref.date) %>%
  summarise(risk = sd(ret.adjusted.prices, na.rm = TRUE))

The last step in the data acquisition stage is to bind all financial reports in a single dataframe.

get.col <- function(df.in, my.col) {
  # function to bind all rows (dataframes) of a tibble
  return(do.call(rbind, df.in[[my.col]]))
}

# financial reports
df.fr <- bind_rows(get.col(df.reports, 'fr.income'),
                   get.col(df.reports, 'fr.assets'),
                   get.col(df.reports, 'fr.cashflow'),
                   get.col(df.reports, 'fr.liabilities'))

# board composition and family participation
df.board <- get.col(df.reports, 'history.board.composition')
df.family <- get.col(df.reports, 'history.family.relations')

Now that all raw data is available, we can proceed to calculate our variables of interest. The next code will process df.family and df.board, which contains information about board composition and family members of companies, and output a reference table that shows, for each company and year, the percentage of board seats held by family members.

ref.tab <- df.family %>%
  group_by(name.company, ref.date) %>%
  summarise(persons = str_c(unique(person.name),
                            collapse = ' ; ')) %>%
  merge(df.board) %>%
  mutate(dummy = person.name %in% str_split(persons,
                                            fixed(' ; '),
                                            simplify = TRUE)) %>%
  group_by(name.company, ref.date) %>%
  summarise(family.idx = mean(dummy)) %>%
  glimpse()

## Observations: 943
## Variables: 3
## $ name.company <chr> "ADVANCED DIGITAL HEALTH MEDICINA PREVENTIVA S.A....
## $ ref.date     <date> 2014-12-31, 2009-12-31, 2010-12-31, 2011-12-31, ...
## $ family.idx   <dbl> 0.20000000, 0.16666667, 0.20000000, 0.19230769, 0...

Finally, we build our final dataframe containing all variables necessary for our panel and OLS model:

get.acc <- function(acc.value, search.vec, my.str) {
  # get account value from fr.dataframe

  out.value <- acc.value[search.vec == my.str]

  # test if it exists
  if (length(out.value) == 0) out.value <- NA

  return(out.value)

}

# build final table
model.table <- df.fr %>%
  group_by(name.company, ref.date) %>%
  summarise(profit.loss = get.acc(acc.value, acc.number, '3.09'),
            total.assets = get.acc(acc.value, acc.number, '1'),
            capital = get.acc(acc.value, acc.number,  '2.03'),
            total.sales = get.acc(acc.value, acc.number, '3.01'),
            EBIT = get.acc(acc.value, acc.number, '3.05'),
            deprec.amort = get.acc(acc.value, acc.number, '6.01.01.02'),
            EBITDA = EBIT + deprec.amort,
            short.term.debt = get.acc(acc.value, acc.number, '2.01.04'),
            long.term.debt = get.acc(acc.value, acc.number, '2.02.01'),
            total.debt = short.term.debt + long.term.debt,
            size = log(total.assets),
            leverage = total.debt/total.assets,
            ROA = EBITDA/total.assets,
            ROE = profit.loss/capital) %>%
  inner_join(df.volat) %>%
  inner_join(ref.tab) %>%
  inner_join(df.info) %>%
  filter(total.assets > 0,
         total.sales > 0,
         capital > 0)  %>%
  inner_join(select(df.reports, company.name,
                    date.company.constitution,
                    date.cvm.registration),
             by = c('name.company' = 'company.name' )) %>%
  mutate(age = log(as.numeric(ref.date - date.company.constitution)/365)) %>%
  select(name.company, ref.date, ROA,
         ROE, size, age, risk, leverage, family.idx, main.sector) %>%
  glimpse()

## Observations: 506
## Variables: 10
## $ name.company <chr> "ALPARGATAS SA", "ALPARGATAS SA", "ALPARGATAS SA"...
## $ ref.date     <date> 2010-12-31, 2011-12-31, 2012-12-31, 2013-12-31, ...
## $ ROA          <dbl> 0.17146869, 0.15552614, 0.13210314, 0.11059302, 0...
## $ ROE          <dbl> 0.23129018, 0.20804121, 0.16911026, 0.16899388, 0...
## $ size         <dbl> 14.47189, 14.52877, 14.66921, 14.89599, 14.92580,...
## $ age          <dbl> 4.642624, 4.652211, 4.661732, 4.671137, 4.680456,...
## $ risk         <dbl> 0.01515056, 0.01973376, 0.01816129, 0.01482123, 0...
## $ leverage     <dbl> 0.114016614, 0.059653695, 0.057139840, 0.13141140...
## $ family.idx   <dbl> 0.20000000, 0.19230769, 0.19230769, 0.19230769, 0...
## $ main.sector  <chr> "Consumo Cíclico", "Consumo Cíclico", "Consumo Cí...

As a final step, we remove outliers from the data using a simple winsorization procedure with a 2.5% cut-off in both sides of the distribution:

winsorize <- function(x) {
  lim <- quantile(x, probs = c(0.025,0.975), na.rm = T)
  x[x<lim[1]] <- lim[1]
  x[x>lim[2]] <- lim[2]
  return(x)
}

model.table[,c('ROA','ROE','risk')] <- sapply(model.table[,c('ROA','ROE','risk')],
                                              winsorize)

A summary of the resulting dataset is given next:

summary(model.table)

##  name.company          ref.date               ROA          
##  Length:506         Min.   :2010-12-31   Min.   :-0.05480  
##  Class :character   1st Qu.:2011-12-31   1st Qu.: 0.04494  
##  Mode  :character   Median :2013-12-31   Median : 0.08376  
##                     Mean   :2013-11-23   Mean   : 0.08911  
##                     3rd Qu.:2015-12-31   3rd Qu.: 0.13820  
##                     Max.   :2016-12-31   Max.   : 0.23153  
##                                          NA's   :23        
##       ROE                size             age              risk        
##  Min.   :-0.74808   Min.   : 9.639   Min.   :0.3384   Min.   :0.01086  
##  1st Qu.: 0.01808   1st Qu.:13.491   1st Qu.:3.4971   1st Qu.:0.01985  
##  Median : 0.08990   Median :14.472   Median :3.9772   Median :0.02442  
##  Mean   : 0.06248   Mean   :14.535   Mean   :3.9126   Mean   :0.03129  
##  3rd Qu.: 0.16937   3rd Qu.:15.359   3rd Qu.:4.2611   3rd Qu.:0.03234  
##  Max.   : 0.43936   Max.   :20.276   Max.   :7.6095   Max.   :0.12451  
##                                                       NA's   :2        
##     leverage        family.idx      main.sector       
##  Min.   :0.0000   Min.   :0.00000   Length:506        
##  1st Qu.:0.1140   1st Qu.:0.07743   Class :character  
##  Median :0.2615   Median :0.12500   Mode  :character  
##  Mean   :0.2566   Mean   :0.15648                     
##  3rd Qu.:0.3736   3rd Qu.:0.18182                     
##  Max.   :0.7893   Max.   :0.71429                     
##

Model estimation and results

With the required dataset cleaned and structured, we proceed with the estimation of the models. First we set the model specification as a formula object:

control.vars <- '+ size + age + risk + leverage'
my.index <- c('name.company', 'ref.date')

my.formula.ROA.plm <- formula(paste0('ROA ~ family.idx + I(family.idx^2) ',
                                 control.vars,
                                 ' + factor(year(ref.date))'))

my.formula.ROE.plm <- formula(paste0('ROE ~ family.idx + I(family.idx^2) ',
                                 control.vars,
                                 ' + factor(year(ref.date))'))

my.formula.ROE.lm <- formula(paste0('ROE ~ family.idx + I(family.idx^2) ',
                                 control.vars,
                                 ' + factor(year(ref.date))',
                                 ' + factor(main.sector)') )

my.formula.ROA.lm <- formula(paste0('ROA ~ family.idx + I(family.idx^2) ',
                                 control.vars,
                                 ' + factor(year(ref.date))',
                                 ' + factor(main.sector)') )

We have four models, two panel data and two ordinary least squares. We can now estimate them using plm and lm:

library(plm)

my.plm.ROA <- plm(data = model.table,
                  formula = my.formula.ROA.plm,
                  index = my.index,
                  model = 'within')

my.lm.ROA <- lm(data = model.table,
                formula = my.formula.ROA.lm)

my.plm.ROE <- plm(data = model.table,
                  formula = my.formula.ROE.plm,
                  index = my.index,
                  model = 'within')

my.lm.ROE <- lm(data = model.table,
                formula = my.formula.ROE.lm)

The Results

library(stargazer)

stargazer(my.plm.ROA, my.lm.ROA, my.plm.ROE, my.lm.ROE,
          type = 'html',
          title = 'Estimation Results',
          model.names = F,
          column.labels = c('FE', 'OLS', 'FE', 'OLS'),
          style = 'aer',
          omit = 'factor*',
          omit.stat = c('f', 'ser'),
          add.lines = list(c('Year Dummies', 'Yes', 'Yes',
                             'Yes', 'Yes'),
                           c('Sector Dummies', 'No', 'Yes',
                             'No', 'Yes')))

**Estimation Results**

	ROA		ROE
	FE	OLS	FE	OLS
	(1)	(2)	(3)	(4)

family.idx	-0.010	-0.027	-0.356	0.374^*
	(0.106)	(0.079)	(0.346)	(0.195)

I(family.idx2)	0.151	0.063	0.116	-0.867^***
	(0.219)	(0.145)	(0.650)	(0.327)

size	0.062^***	-0.002	0.003	0.006
	(0.012)	(0.002)	(0.017)	(0.005)

age	0.040	0.000	0.162^*	-0.003
	(0.026)	(0.003)	(0.090)	(0.008)

risk	-0.242^**	-0.907^***	-0.957^**	-3.121^***
	(0.122)	(0.146)	(0.437)	(0.399)

leverage	-0.079^**	-0.004	-0.089	-0.187^***
	(0.031)	(0.018)	(0.101)	(0.049)

Constant		0.192^***		0.239^***
		(0.032)		(0.088)

Year Dummies	Yes	Yes	Yes	Yes
Sector Dummies	No	Yes	No	Yes
Observations	481	481	504	504
R²	0.290	0.183	0.178	0.266
Adjusted R²	0.108	0.151	-0.029	0.238

Notes:	^***Significant at the 1 percent level.
	^**Significant at the 5 percent level.
	^*Significant at the 10 percent level.

Conclusions

GetDFPData has a clear impact, facilitating access to a large dataset from B3
Researchers will benefit with more time for research and less time for data acquisition and cleaning
Retails investors can use the package for their own financial analysis
The package is in constant development
More details about the package in its academic paper

Thank you!! Questions?

References

Müller, Kirill, and Hadley Wickham. 2017. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.

Accessing Financial Reports and Corporate Events Data using GetDFPData

Who am I?

Introduction

Motivation

Using GetDFPData

The Web Interface

The CL (command line) R interface

Installation of GetDFPData

Using GetDFPData

Finding names of companies

Downloading financial information for one company

The Result

Downloading financial information for several companies

Integration with tidyverse

Exporting financial data

Example of usage: Family companies and financial performance

Methodology

Example of usage: The Code

Model estimation and results

The Results

Conclusions

Thank you!! Questions?

References

Using `GetDFPData`

Installation of `GetDFPData`

Using `GetDFPData`

Integration with `tidyverse`