Accessing Financial Reports and Corporate Events Data using GetDFPData

Marcelo Perlin, Guilherme Kirch, Daniel Vancin

Unisinos/Porto Alegre - January 19, 2018

Introduction

Financial statements are at the heart of corporate finance studies

Commercial data vendors offers paid solutions, but with some restrictions for large scale access

B3 provides public access to its datasets in three different systems: DFP, ITR, FRE and FCA, but with several limitation for large scale access

GetDFPData provides an open R interface to all financial statements distributed by B3 and CVM

Motivation

A look into past publications in Brazil:

Papers by Journal
Journal	Surveyed	Total	Percent
Contabilidade Vista & Revista	19	80	0.24
Revista Brasileira de Finanças	9	85	0.11
Revista Contabilidade & Finanças	34	118	0.29
Revista Contabilidade e Organizações	22	82	0.27
Revista Contemporânea de Contabilidade	21	112	0.19

What we learn?
- Corporate finance studies ammount to 25% of the literature in Finance in Brazil
- 73% of published papers used Economatica! 30% used BMF and CVM systems.
- An average study used 179 companies within a 7 year period!
  - As an informed guess, this ammounts to 1253 operations copy and paste operations!

Features of `GetDFPData`

Made by researchers, to researchers

Provides a reference table with up to date information about companies:
- available reports, sectors, governance listing segments, stock tickers
Cleans and adjusts the data
- Different accounting formats are merged using matching tools
- Errors in the data are fixed, when possible, or discarted
Always tabular format (tidy data principles (Wickham 2014)) and always the correct class in R
R and web/shiny interface
Open and free!

Using `GetDFPData`

The Web Interface

A web version was developed and published in the internet as a shiny application.

http://www.msperlin.com/shiny/GetDFPData/

The R interface

Installation of `GetDFPData`

Install R
Install RStudio
Install package

# Release version in CRAN
install.packages('GetDFPData')

Using `GetDFPData`

The starting point of GetDFPData is looking for information about current companies.

library(GetDFPData)
library(tibble)

df.info <- gdfpd.get.info.companies(type.data = 'companies')

## Found cache file. Loading data..

glimpse(df.info)

## Observations: 691
## Variables: 8
## $ name.company    <chr> "521 PARTICIPAÇOES S.A. - EM LIQUIDAÇÃO EXTRAJ...
## $ id.company      <int> 16330, 16284, 108, 20940, 21725, 19313, 18970,...
## $ situation       <chr> "ATIVO", "ATIVO", "CANCELADA", "CANCELADA", "A...
## $ listing.segment <chr> NA, "Tradicional", "Tradicional", "Tradicional...
## $ main.sector     <chr> NA, "Financeiro e Outros", "Materiais Básicos"...
## $ tickers         <chr> NA, "QVQP3B", NA, NA, "ADHM3", "AELP3", "TIET1...
## $ first.date      <date> 1998-12-31, 2001-12-31, 2009-12-31, 2009-12-3...
## $ last.date       <date> 2016-12-31, 2016-12-31, 2009-12-31, 2009-12-3...

The current number of active and inactive companies, as of 2018-01-19, is available in column situation. From it, we can see that there are 519 active companies and 167 cancelled.

We can also look at the distribution of sectors of traded companies at B3:

t.sector <- table(df.info$main.sector)
print(as.data.frame(sort(t.sector, decreasing = TRUE)))

##                               Var1 Freq
## 1              Financeiro e Outros  145
## 2                  Consumo Cíclico   92
## 3                 Bens Industriais   80
## 4                Utilidade Pública   74
## 5                Materiais Básicos   47
## 6              Consumo não Cíclico   35
## 7                Não Classificados   28
## 8                            Saúde   18
## 9          Construção e Transporte   14
## 10 Petróleo. Gás e Biocombustíveis   13
## 11        Tecnologia da Informação   10
## 12                Telecomunicações    8
## 13 Petróleo, Gás e Biocombustíveis    2

The distribution of current governance listings is also available:

t.listings <- table(df.info$listing.segment)
print(as.data.frame(sort(t.listings, decreasing = TRUE)))

##                             Var1 Freq
## 1                    Tradicional  359
## 2                   Novo Mercado  143
## 3 Corporate Governance - Level 1   27
## 4 Corporate Governance - Level 2   21
## 5                   Bovespa Mais   14
## 6         Bovespa Mais - Level 2    2

Tickers of companies are also available in column tickers. As an example, we can check all currently traded stocks of Gerdau:

tickers.Gerdau <- df.info$tickers[which(df.info$name.company == 'GERDAU S.A.')]
print(tickers.Gerdau)

## [1] "GGBR3;GGBR4"

Finding names of companies

Every company in the database can be identified by its official name. Function gitrd.search.company allow the user to search for the official name of a company.

Let’s find the official name of Ambev, one of the largest companies in Brazil:

my.names <- gdfpd.search.company('ambev')

## Found cache file. Loading data..
## 
## Found 2 companies:
## COMPANHIA DE BEBIDAS DAS AMÉRICAS-AMBEV | situation = CANCELADA | first date = 2010-12-31 | last date - 2012-12-31
## AMBEV S.A.                              | situation = ATIVO | first date = 2012-12-31 | last date - 2016-12-31

Its official name in Bovespa records is AMBEV S.A..

Downloading financial information for one company

name.companies <- 'AMBEV S.A.'
first.date <- '2012-12-31'
last.date  <- '2016-12-31'

df.reports <- gdfpd.GetDFPData(name.companies = name.companies, 
                               first.date = first.date,
                               last.date = last.date)

## Found cache file. Loading data..
## 
## Downloading data for 1 companies
## First Date: 2012-12-31
## Laste Date: 2016-12-31
## Inflation index: dollar
## 
## Downloading inflation data
##  Caching inflation RDATA into tempdir()  Done
## 
## Inputs looking good! Starting download of files:
## 
## AMBEV S.A.
##  Available periods: 2016-12-31   2015-12-31  2014-12-31  2013-12-31  2012-12-31
## 
## 
## Processing 23264 - AMBEV S.A.
##  Finding info from Bovespa
##      Found cache file DFP Cache Folder/23264_AMBEV S./GetDFPData_BOV_cache_23264_AMBE.rds
##  Processing 23264 - AMBEV S.A. | date 2016-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2015-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2014-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2013-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file
##  Processing 23264 - AMBEV S.A. | date 2012-12-31
##      Acessing DFP data | Found DFP cache file
##      Acessing FRE data | Found FRE cache file
##      Acessing FCA data | Found FCA cache file

The resulting object from gdfpd.GetDFPData is a tibble, a data.frame type of object that allows for list columns (Müller and Wickham 2017):

glimpse(df.reports)

## Observations: 1
## Variables: 33
## $ company.name                  <chr> "AMBEV S.A."
## $ company.code                  <int> 23264
## $ company.tickers               <chr> "ABEV3"
## $ min.date                      <date> 2012-12-31
## $ max.date                      <date> 2016-12-31
## $ n.periods                     <int> 5
## $ company.segment               <chr> "Tradicional"
## $ current.stockholders          <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ current.stock.composition     <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.assets                     <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.liabilities                <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.income                     <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.cashflow                   <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.assets.consolidated        <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.liabilities.consolidated   <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.income.consolidated        <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ fr.cashflow.consolidated      <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.dividends             <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.stockholders          <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.capital.issues        <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.mkt.value             <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.capital.increases     <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.capital.reductions    <list> [<>]
## $ history.stock.repurchases     <list> [<c("AMBEV S.A.", "AMBEV S.A.")...
## $ history.other.stock.events    <list> [<AMBEV S.A., 2015-12-31, 2013-...
## $ history.compensation          <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.compensation.summary  <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.transactions.related  <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.debt.composition      <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.governance.listings   <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.board.composition     <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.committee.composition <list> [<c("AMBEV S.A.", "AMBEV S.A.",...
## $ history.family.relations      <list> [<c("AMBEV S.A.", "AMBEV S.A.")...

All financial statements for the different quarters are available within df.reports. For example, the income statements for all desired quarters of AMBEV are:

df.income.long <- df.reports$fr.income[[1]]

glimpse(df.income.long)

## Observations: 130
## Variables: 6
## $ name.company       <chr> "AMBEV S.A.", "AMBEV S.A.", "AMBEV S.A.", "...
## $ ref.date           <date> 2016-12-31, 2016-12-31, 2016-12-31, 2016-1...
## $ acc.number         <chr> "3.01", "3.02", "3.03", "3.04", "3.04.01", ...
## $ acc.desc           <chr> "Receita de Venda de Bens e/ou Serviços", "...
## $ acc.value          <dbl> 20634193, -11263620, 9370573, 4006806, -399...
## $ acc.value.infl.adj <dbl> 6332420.75, -3456688.66, 2875732.09, 122964...

If desired, the wide format can be forced in the resulting financial reports with function gdfpd.convert.to.wide:

temp.df <- dplyr::filter(df.income.long, ref.date >= as.Date('2015-01-31'))
df.income.wide <- gdfpd.convert.to.wide(temp.df)

knitr::kable(df.income.wide)

acc.number	acc.desc	name.company	2015-12-31	2016-12-31
3.01	Receita de Venda de Bens e/ou Serviços	AMBEV S.A.	22106965	20634193
3.02	Custo dos Bens e/ou Serviços Vendidos	AMBEV S.A.	-11431627	-11263620
3.03	Resultado Bruto	AMBEV S.A.	10675338	9370573
3.04	Despesas/Receitas Operacionais	AMBEV S.A.	3371176	4006806
3.04.01	Despesas com Vendas	AMBEV S.A.	-3591038	-3999531
3.04.02	Despesas Gerais e Administrativas	AMBEV S.A.	-1277065	-1190964
3.04.03	Perdas pela Não Recuperabilidade de Ativos	AMBEV S.A.	0	0
3.04.04	Outras Receitas Operacionais	AMBEV S.A.	1974913	2631206
3.04.05	Outras Despesas Operacionais	AMBEV S.A.	-530970	-245490
3.04.06	Resultado de Equivalência Patrimonial	AMBEV S.A.	6795336	6811585
3.05	Resultado Antes do Resultado Financeiro e dos Tributos	AMBEV S.A.	14046514	13377379
3.06	Resultado Financeiro	AMBEV S.A.	-1304158	-2509282
3.06.01	Receitas Financeiras	AMBEV S.A.	3988762	2502321
3.06.02	Despesas Financeiras	AMBEV S.A.	-5292920	-5011603
3.07	Resultado Antes dos Tributos sobre o Lucro	AMBEV S.A.	12742356	10868097
3.08	Imposto de Renda e Contribuição Social sobre o Lucro	AMBEV S.A.	-318585	1678513
3.08.01	Corrente	AMBEV S.A.	1920185	2157328
3.08.02	Diferido	AMBEV S.A.	-2238770	-478815
3.09	Resultado Líquido das Operações Continuadas	AMBEV S.A.	12423771	12546610
3.10	Resultado Líquido de Operações Descontinuadas	AMBEV S.A.	0	0
3.10.01	Lucro/Prejuízo Líquido das Operações Descontinuadas	AMBEV S.A.	0	0
3.10.02	Ganhos/Perdas Líquidas sobre Ativos de Operações Descontinuadas	AMBEV S.A.	0	0
3.11	Lucro/Prejuízo do Período	AMBEV S.A.	12423771	12546610
3.99	Lucro por Ação - (Reais / Ação)	AMBEV S.A.	0	0
3.99.01	Lucro Básico por Ação	AMBEV S.A.	0	0
3.99.02	Lucro Diluído por Ação	AMBEV S.A.	0	0

Downloading financial information for several companies

GetDFPData is specially designed for handling large scale download of data. Let’s build a case with three randomly selected active companies:

my.companies <- c("COMPANHIA CELG DE PARTICIPAÇÕES", "DOHLER S.A.", "KLABIN S.A.")

first.date <- '2012-01-31'
last.date  <- '2017-01-31'

df.reports <- gdfpd.GetDFPData(name.companies = my.companies, 
                                  first.date = first.date,
                                  last.date = last.date)

And now we can check the resulting tibble:

glimpse(df.reports )

## Observations: 3
## Variables: 33
## $ company.name                  <chr> "KLABIN S.A.", "COMPANHIA CELG D...
## $ company.code                  <int> 12653, 21393, 5207
## $ company.tickers               <chr> "KLBN11;KLBN3;KLBN4", "GPAR3", "...
## $ min.date                      <date> 2012-12-31, 2012-12-31, 2012-12-31
## $ max.date                      <date> 2016-12-31, 2016-12-31, 2016-12-31
## $ n.periods                     <int> 5, 5, 5
## $ company.segment               <chr> "Corporate Governance - Level 2"...
## $ current.stockholders          <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ current.stock.composition     <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.assets                     <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.liabilities                <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.income                     <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.cashflow                   <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.assets.consolidated        <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.liabilities.consolidated   <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.income.consolidated        <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ fr.cashflow.consolidated      <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.dividends             <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.stockholders          <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.capital.issues        <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.mkt.value             <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.capital.increases     <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.capital.reductions    <list> [<>, <c("COMPANHIA CELG DE PART...
## $ history.stock.repurchases     <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.other.stock.events    <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.compensation          <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.compensation.summary  <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.transactions.related  <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.debt.composition      <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.governance.listings   <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.board.composition     <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.committee.composition <list> [<c("KLABIN S.A.", "KLABIN S.A....
## $ history.family.relations      <list> [<c("KLABIN S.A.", "KLABIN S.A....

A closer look at `gdfpd.GetDFPData`

Function gdfpd.GetDFPData provides many options to the user. Here we will describe and explain its features.

name.companies: Official names of companies to get financial reports (e.g. ‘ELETROPAULO METROPOLITANA EL.S.PAULO S.A’).
first.date: First date (YYYY-MM-DD) to download data. Character or Date. E.g. first.date = ‘2010-01-01’.
last.date: Last date (YYYY-MM-DD) to download data. Character or Date. E.g. last.date = ‘2017-01-01’.
inflation.index: Sets the inflation index to use for finding inflation adjusted values of all reports. Possible values are ‘dollar’ (default) or ‘IPCA’, the brazilian main inflation index. When using ‘IPCA’, the base date is set as the last date found in the DFP dataset.
max.levels: Sets the maximum number of levels of accounting items in financial reports such as assets and liabilities (default = 3). If set to Inf, will return all available accounting itens.
folder.out: Folder where to download and manipulate the zip files. Default = tempdir()
do.cache: Logical for using a cache system or not. Default = TRUE
cache.folder: Folder where to cache all processed information. Default = file.path(getwd(),‘DFP Cache Folder’)
max.dl.tries: Maximum number of attempts for dowloading files

Exporting financial data

The package also includes function gdfpd.export.DFP.data for exporting the financial data in the Excel or csv format.

Next we present an example of saving the output as an Excel file.

my.basename <- 'MyExcelData'
my.type.export <- 'xlsx' # only supported so far

gdfpd.export.DFP.data(df.reports = df.reports, 
                      base.file.name = my.basename,
                      type.export = my.type.export)

## File  MyExcelData.xlsx  already exists. Deleting it..
## Start Copying Dataframes to MyExcelData.xlsx
##  Copying table metadata - 3 rows, 5 columns
##  Copying table metadata - 3 rows, 5 columns
##  Copying table current.stock.composition - 9 rows, 4 columns
##  Copying table fr.assets - 240 rows, 6 columns
##  Copying table fr.liabilities - 375 rows, 6 columns
##  Copying table fr.income - 400 rows, 6 columns
##  Copying table fr.cashflow - 316 rows, 6 columns
##  Copying table fr.assets.consolidated - 225 rows, 6 columns
##  Copying table fr.liabilities.consolidated - 390 rows, 6 columns
##  Copying table fr.income.consolidated - 504 rows, 6 columns
##  Copying table fr.cashflow.consolidated - 350 rows, 6 columns
##  Copying table history.dividends - 146 rows, 12 columns
##  Copying table history.stockholders - 71 rows, 14 columns
##  Copying table history.capital.issues - 13 rows, 5 columns
##  Copying table history.mkt.value - 14 rows, 5 columns
##  Copying table history.capital.increases - 23 rows, 8 columns
##  Copying table history.capital.reductions - 3 rows, 12 columns
##  Copying table history.stock.repurchases - 8 rows, 12 columns
##  Copying table history.other.stock.events - 4 rows, 8 columns
##  Copying table history.compensation - 43 rows, 18 columns
##  Copying table history.compensation.summary - 43 rows, 9 columns
##  Copying table history.transactions.related - 48 rows, 13 columns
##  Copying table history.debt.composition - 27 rows, 9 columns
##  Copying table history.governance.listings - 15 rows, 5 columns
##  Copying table history.board.composition - 316 rows, 17 columns
##  Copying table history.committee.composition - 10 rows, 15 columns
##  Copying table history.family.relations - 18 rows, 8 columns
## 
## Export sucessful. Data available at file MyExcelData.xlsx

Usage Statistics of Shinny App

## Warning: Ignoring unknown parameters: binwidth, bins, pad

## Warning: Ignoring unknown parameters: binwidth, bins, pad

Conclusions

GetDFPData has a clear impact, facilitating access to a large dataset from B3
Researchers will benefit with more time for research and less time for data acquisition and cleaning
The software is in constant development

References

Müller, Kirill, and Hadley Wickham. 2017. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.

Wickham, Hadley. 2014. “Tidy Data.” The Journal of Statistical Software 59 (10). http://www.jstatsoft.org/v59/i10/.

Accessing Financial Reports and Corporate Events Data using GetDFPData

Introduction

Motivation

Features of GetDFPData

Using GetDFPData

The Web Interface

The R interface

Installation of GetDFPData

Using GetDFPData

Finding names of companies

Downloading financial information for one company

Downloading financial information for several companies

A closer look at gdfpd.GetDFPData

Exporting financial data

Usage Statistics of Shinny App

Conclusions

References

Features of `GetDFPData`

Using `GetDFPData`

Installation of `GetDFPData`

Using `GetDFPData`

A closer look at `gdfpd.GetDFPData`