Marcelo S. Perlin
IAG/PUC-RJ, Rio de Janeiro, May 25, 2018
Teacher and researcher (PPGA/UFRGS, Porto Alegre-RS)
Loves data analysis, programming, and tennis
Homepage and blog: https://msperlin.github.io/
Author of Processing and Analyzing Financial Data with R - Online (free) version here
GetDFPData
provides an open R interface to all financial statements and corporate events distributed by B3 and CVM
GetDFPData
improves existing solutions?
Journal | Surveyed | Total | Percent |
---|---|---|---|
Contabilidade Vista & Revista | 19 | 80 | 0.24 |
Revista Brasileira de Finanças | 9 | 85 | 0.11 |
Revista Contabilidade & Finanças | 34 | 118 | 0.29 |
Revista Contabilidade e Organizações | 22 | 82 | 0.27 |
Revista Contemporânea de Contabilidade | 21 | 112 | 0.19 |
GetDFPData
A web version was developed and published in the internet as a shiny application.
GetDFPData
# Release version in CRAN
install.packages('GetDFPData')
GetDFPData
The starting point of GetDFPData
is looking for information about current companies.
library(GetDFPData)
library(dplyr)
df.info <- gdfpd.get.info.companies(type.data = 'companies')
## Found cache file. Loading data..
glimpse(df.info)
## Observations: 694
## Variables: 16
## $ name.company <chr> "521 PARTICIPAÇOES S.A. - EM LIQUIDAÇÃO E...
## $ id.company <int> 16330, 16284, 108, 20940, 21725, 19313, 1...
## $ cnpj <dbl> 1.547749e+12, 1.851771e+12, 6.066481e+13,...
## $ date.registration <date> 1997-07-11, 1997-05-30, 1969-05-16, 2007...
## $ date.constitution <date> 1996-07-30, 1997-04-02, 1944-08-18, 1976...
## $ city <chr> "RIO DE JANEIRO", "RIO DE JANEIRO", "PORT...
## $ estate <chr> "RJ", "RJ", "RS", "SP", "SP", "SP", "SP",...
## $ situation <chr> "ATIVO", "ATIVO", "CANCELADA", "CANCELADA...
## $ situation.operations <chr> "LIQUIDAÇÃO EXTRAJUDICIAL", "FASE OPERACI...
## $ listing.segment <chr> NA, "Tradicional", "Tradicional", "Tradic...
## $ main.sector <chr> NA, "Financeiro e Outros", "Materiais Bás...
## $ sub.sector <chr> NA, "Outros", "Siderurgia e Metalurgia", ...
## $ segment <chr> NA, "Outros", "Siderurgia", "Açucar e Alc...
## $ tickers <chr> NA, "QVQP3B", NA, NA, "ADHM3", "AELP3", "...
## $ first.date <date> 1998-12-31, 2001-12-31, 2009-12-31, 2009...
## $ last.date <date> 2017-12-31, 2017-12-31, 2009-12-31, 2009...
Every company in the database can be identified by its official name. Function gdfpd.search.company
allow the user to search for the official name of a company.
Let’s find the official name of Ambev, one of the largest companies in Brazil:
my.names <- gdfpd.search.company('ambev')
## Found cache file. Loading data..
##
## Found 2 companies:
## COMPANHIA DE BEBIDAS DAS AMÉRICAS-AMBEV | situation = CANCELADA | first date = 2010-12-31 | last date - 2012-12-31
## AMBEV S.A. | situation = ATIVO | first date = 2012-12-31 | last date - 2017-12-31
Its official name in Bovespa records is AMBEV S.A..
name.companies <- 'AMBEV S.A.'
first.date <- '2012-12-31'
last.date <- '2016-12-31'
df.reports <- gdfpd.GetDFPData(name.companies = name.companies,
first.date = first.date,
last.date = last.date)
## Found cache file. Loading data..
## Fetching new files from Bovespa.
##
## Downloading data for 1 companies
## First Date: 2012-12-31
## Laste Date: 2016-12-31
## Inflation index: dollar
##
## Downloading inflation data
## Caching inflation RDATA into tempdir() Done
##
## Inputs looking good! Starting download of files:
##
## AMBEV S.A.
## Available periods: 2016-12-31 2015-12-31 2014-12-31 2013-12-31 2012-12-31
##
##
## Processing 23264 - AMBEV S.A.
## Finding info from Bovespa
## Found BOV cache file
## Processing 23264 - AMBEV S.A. | date 2016-12-31
## Acessing DFP data | Found DFP cache file
## Acessing FRE data | Found FRE cache file
## Acessing FCA data | Found FCA cache file
## Processing 23264 - AMBEV S.A. | date 2015-12-31
## Acessing DFP data | Found DFP cache file
## Acessing FRE data | Found FRE cache file
## Acessing FCA data | Found FCA cache file
## Processing 23264 - AMBEV S.A. | date 2014-12-31
## Acessing DFP data | Found DFP cache file
## Acessing FRE data | Found FRE cache file
## Acessing FCA data | Found FCA cache file
## Processing 23264 - AMBEV S.A. | date 2013-12-31
## Acessing DFP data | Found DFP cache file
## Acessing FRE data | Found FRE cache file
## Acessing FCA data | Found FCA cache file
## Processing 23264 - AMBEV S.A. | date 2012-12-31
## Acessing DFP data | Found DFP cache file
## Acessing FRE data | Found FRE cache file
## Acessing FCA data | Found FCA cache file
The resulting object from gdfpd.GetDFPData
is a tibble
, a data.frame type of object that allows for list columns (Müller and Wickham 2017):
glimpse(df.reports)
## Observations: 1
## Variables: 40
## $ company.name <chr> "AMBEV S.A."
## $ company.code <int> 23264
## $ cnpj <chr> "07526557000100"
## $ date.company.constitution <date> 2005-07-08
## $ date.cvm.registration <date> 2013-10-30
## $ company.tickers <chr> "ABEV3"
## $ min.date <date> 2012-12-31
## $ max.date <date> 2016-12-31
## $ n.periods <int> 5
## $ company.segment <chr> "Tradicional"
## $ current.stockholders <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ current.stock.composition <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.assets <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.liabilities <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.income <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.cashflow <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.assets.consolidated <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.liabilities.consolidated <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.income.consolidated <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.cashflow.consolidated <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ fr.auditing.report <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.dividends <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.stockholders <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.capital.issues <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.mkt.value <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.capital.increases <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.capital.reductions <list> [<>]
## $ history.stock.repurchases <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.other.stock.events <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.compensation <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.compensation.summary <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.transactions.related <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.debt.composition <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.governance.listings <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.board.composition <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.committee.composition <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.family.relations <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.family.related.companies <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.auditing <list> [<c("AMBEV S.A.", "AMBEV S.A...
## $ history.responsible.docs <list> [<c("AMBEV S.A.", "AMBEV S.A...
All financial statements for the different quarters are available within df.reports
. For example, the income statements for all desired quarters of AMBEV are:
df.income <- df.reports$fr.income[[1]]
glimpse(df.income)
## Observations: 130
## Variables: 6
## $ name.company <chr> "AMBEV S.A.", "AMBEV S.A.", "AMBEV S.A.", "...
## $ ref.date <date> 2016-12-31, 2016-12-31, 2016-12-31, 2016-1...
## $ acc.number <chr> "3.01", "3.02", "3.03", "3.04", "3.04.01", ...
## $ acc.desc <chr> "Receita de Venda de Bens e/ou Serviços", "...
## $ acc.value <dbl> 20634193, -11263620, 9370573, 4006806, -399...
## $ acc.value.infl.adj <dbl> 6332420.75, -3456688.66, 2875732.09, 122964...
GetDFPData
is specially designed for handling large scale download of data. Let’s build a case with three randomly selected active companies:
my.companies <- c("COMPANHIA CELG DE PARTICIPAÇÕES", "DOHLER S.A.", "KLABIN S.A.")
first.date <- '2012-01-31'
last.date <- '2017-01-31'
df.reports <- gdfpd.GetDFPData(name.companies = my.companies,
first.date = first.date,
last.date = last.date)
And now we can check the resulting tibble
:
glimpse(df.reports )
## Observations: 3
## Variables: 40
## $ company.name <chr> "COMPANHIA CELG DE PARTICIPAÇ...
## $ company.code <int> 21393, 5207, 12653
## $ cnpj <chr> "08560444000193", "8468340800...
## $ date.company.constitution <date> 2006-12-04, 1755-12-09, 1978...
## $ date.cvm.registration <date> 2008-03-14, 1973-12-26, 1997...
## $ company.tickers <chr> "GPAR3", "DOHL3;DOHL4", "KLBN...
## $ min.date <date> 2012-12-31, 2012-12-31, 2012...
## $ max.date <date> 2016-12-31, 2016-12-31, 2016...
## $ n.periods <int> 5, 5, 5
## $ company.segment <chr> "Tradicional", "Tradicional",...
## $ current.stockholders <list> [<c("COMPANHIA CELG DE PARTI...
## $ current.stock.composition <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.assets <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.liabilities <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.income <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.cashflow <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.assets.consolidated <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.liabilities.consolidated <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.income.consolidated <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.cashflow.consolidated <list> [<c("COMPANHIA CELG DE PARTI...
## $ fr.auditing.report <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.dividends <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.stockholders <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.capital.issues <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.mkt.value <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.capital.increases <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.capital.reductions <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.stock.repurchases <list> [<>, <>, <c("KLABIN S.A.", "...
## $ history.other.stock.events <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.compensation <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.compensation.summary <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.transactions.related <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.debt.composition <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.governance.listings <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.board.composition <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.committee.composition <list> [<COMPANHIA CELG DE PARTICIP...
## $ history.family.relations <list> [<>, <c("DOHLER S.A.", "DOHL...
## $ history.family.related.companies <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.auditing <list> [<c("COMPANHIA CELG DE PARTI...
## $ history.responsible.docs <list> [<c("COMPANHIA CELG DE PARTI...
tidyverse
library(tidyverse)
# set assets tbl
tab.assets <- bind_rows(df.reports$fr.assets) %>%
filter(acc.desc == 'Ativo Total') %>%
group_by(name.company, ref.date) %>%
summarise(Total.Assets = acc.value)
# plot it
p <- ggplot(tab.assets, aes(x = ref.date, y=Total.Assets)) +
geom_col() + facet_wrap(~name.company, scales = 'free')
print(p)
The package also includes function gdfpd.export.DFP.data
for exporting the financial data in the Excel or csv format.
Next we present an example of saving the output as an Excel file.
my.basename <- 'MyExcelData'
my.type.export <- 'xlsx' # only supported so far
gdfpd.export.DFP.data(df.reports = df.reports,
base.file.name = my.basename,
type.export = my.type.export)
## File MyExcelData.xlsx already exists. Deleting it..
## Start Copying Dataframes to MyExcelData.xlsx
## Copying table metadata - 3 rows, 5 columns
## Copying table metadata - 3 rows, 5 columns
## Copying table current.stockholders - 28 rows, 6 columns
## Copying table current.stock.composition - 9 rows, 4 columns
## Copying table fr.assets - 240 rows, 6 columns
## Copying table fr.liabilities - 375 rows, 6 columns
## Copying table fr.income - 400 rows, 6 columns
## Copying table fr.cashflow - 316 rows, 6 columns
## Copying table fr.assets.consolidated - 225 rows, 6 columns
## Copying table fr.liabilities.consolidated - 390 rows, 6 columns
## Copying table fr.income.consolidated - 504 rows, 6 columns
## Copying table fr.cashflow.consolidated - 350 rows, 6 columns
## Copying table fr.auditing.report - 15 rows, 3 columns
## Copying table history.dividends - 149 rows, 12 columns
## Copying table history.stockholders - 131 rows, 14 columns
## Copying table history.capital.issues - 30 rows, 5 columns
## Copying table history.mkt.value - 15 rows, 5 columns
## Copying table history.capital.increases - 72 rows, 8 columns
## Copying table history.capital.reductions - 6 rows, 12 columns
## Copying table history.stock.repurchases - 16 rows, 12 columns
## Copying table history.other.stock.events - 17 rows, 8 columns
## Copying table history.compensation - 45 rows, 18 columns
## Copying table history.compensation.summary - 45 rows, 9 columns
## Copying table history.transactions.related - 48 rows, 13 columns
## Copying table history.debt.composition - 27 rows, 9 columns
## Copying table history.governance.listings - 15 rows, 5 columns
## Copying table history.board.composition - 365 rows, 18 columns
## Copying table history.committee.composition - 10 rows, 18 columns
## Copying table history.family.relations - 83 rows, 10 columns
## Copying table history.family.related.companies - 419 rows, 11 columns
## Copying table history.auditing - 21 rows, 10 columns
## Copying table history.responsible.docs - 25 rows, 5 columns
##
## Export sucessful. Data available at file MyExcelData.xlsx
To illustrate the use of GetDFPData we select a topic that is relatively simple and current: the relationship between family companies and financial performance.
In order to test if a founding-family company has a better/worst performance, we analyze a large sample of Boards of Directors in Brazil. To do this, we estimate the following (fixed effects panel data) model:
\[ Performance_{i,t} = \phi_1 \times Family_{i,t} + \phi_2 \times Family_{i,t} ^2 + \beta \times Controls + \alpha_i + \mu_t + \epsilon_{i,t} \]
Where the main variables of interest are:
\(Performance_{i,t}\) - Return on assets (ROA) and return on equity (ROE) for company i, year t.
\(Family _{i,t}\) - percentage of participation of family members in boards for company i, year t.
And the controls variables are:
\(size _{i,t}\) - log of total assets, company i, year t.
\(age _{i,t}\) - log of companies age, company i, year t.
\(risk _{i,t}\) - standard deviation of daily returns of most liquid stock, calculated by year and by company.
\(leverage _{i,t}\) - ratio of total debt to total assets.
Our sample selection procedure follows the aforementioned studies. First, we’ll filter the data for only companies that have tickers, that is, are tradeable in the equity market, and those not belonging in the financial and utility sectors. This is a common filter in corporate finance studies as these sectors have specific dynamics related to their performance measured by ROE and ROA.
library(GetDFPData)
library(dplyr)
library(stringr)
# get info table
df.info <- gdfpd.get.info.companies(type.data = 'companies')
## Found cache file. Loading data..
# filter companies
df.info <- df.info %>%
filter(!is.na(tickers),
!(main.sector %in% c('Financeiro e Outros', 'Utilidade Pública')))
We find 242 companies with this simple filter. Now, let’s download data for the selected companies. Be aware that this is a lengthy download and may take a significant amount of time.
# set options
first.date <- '2012-01-31'
last.date <- '2018-01-01'
my.companies <- df.info$name.company
# get data
df.reports <- gdfpd.GetDFPData(name.companies = my.companies,
first.date = first.date,
last.date = last.date)
The next step is the download of stock data with package BatchGetSymbols
and the calculation of our volatility measure, \(risk _{i,t}\). Before downloading the data, we need to find all tickers of all companies. In the case of more than one stock for a company, we keep the asset with most financial volume.
library(BatchGetSymbols)
library(lubridate)
get.ticker.df <- function(df.in) {
# Gets ticker string and organizes it in another data_frame
temp.split <- str_split(df.in$tickers, ';')[[1]]
temp.df <- data_frame(name.company = df.in$name.company,
ticker = temp.split)
}
# get dataframe with tickers
my.l <- by(data = df.info,
INDICES = df.info$name.company,
FUN = get.ticker.df)
# bind into one dataframe
df.tickers <- do.call(bind_rows, my.l)
# add .SA for yahoo finance
my.tickers <- paste0(df.tickers$ticker, '.SA')
# get data
df.stocks <- BatchGetSymbols(tickers = my.tickers,
first.date = '2010-01-01',
last.date = '2017-01-01',
bench.ticker = '^BVSP',
thresh.bad.data = 0.5)[[2]]
# replace .SA (keep same notation)
df.stocks$ticker <- str_replace_all(df.stocks$ticker,
fixed('.SA'), '')
# join with ticker df
df.stocks <- inner_join(df.tickers, df.stocks)
# find selected tickers by volume
tab.volume <- df.stocks %>%
group_by(name.company, ticker) %>%
summarise(mean.volume = mean(volume, na.rm = TRUE)) %>%
group_by(name.company) %>%
summarise(selected.ticker = ticker[which.max(mean.volume)])
df.stocks <- df.stocks[df.stocks$ticker %in% tab.volume$selected.ticker, ]
# calculate volatility measure
df.volat <- df.stocks %>%
mutate(ref.date = as.Date(paste0(year(ref.date),'-12-31'))) %>%
group_by(name.company, ref.date) %>%
summarise(risk = sd(ret.adjusted.prices, na.rm = TRUE))
The last step in the data acquisition stage is to bind all financial reports in a single dataframe.
get.col <- function(df.in, my.col) {
# function to bind all rows (dataframes) of a tibble
return(do.call(rbind, df.in[[my.col]]))
}
# financial reports
df.fr <- bind_rows(get.col(df.reports, 'fr.income'),
get.col(df.reports, 'fr.assets'),
get.col(df.reports, 'fr.cashflow'),
get.col(df.reports, 'fr.liabilities'))
# board composition and family participation
df.board <- get.col(df.reports, 'history.board.composition')
df.family <- get.col(df.reports, 'history.family.relations')
Now that all raw data is available, we can proceed to calculate our variables of interest. The next code will process df.family
and df.board
, which contains information about board composition and family members of companies, and output a reference table that shows, for each company and year, the percentage of board seats held by family members.
ref.tab <- df.family %>%
group_by(name.company, ref.date) %>%
summarise(persons = str_c(unique(person.name),
collapse = ' ; ')) %>%
merge(df.board) %>%
mutate(dummy = person.name %in% str_split(persons,
fixed(' ; '),
simplify = TRUE)) %>%
group_by(name.company, ref.date) %>%
summarise(family.idx = mean(dummy)) %>%
glimpse()
## Observations: 943
## Variables: 3
## $ name.company <chr> "ADVANCED DIGITAL HEALTH MEDICINA PREVENTIVA S.A....
## $ ref.date <date> 2014-12-31, 2009-12-31, 2010-12-31, 2011-12-31, ...
## $ family.idx <dbl> 0.20000000, 0.16666667, 0.20000000, 0.19230769, 0...
Finally, we build our final dataframe containing all variables necessary for our panel and OLS model:
get.acc <- function(acc.value, search.vec, my.str) {
# get account value from fr.dataframe
out.value <- acc.value[search.vec == my.str]
# test if it exists
if (length(out.value) == 0) out.value <- NA
return(out.value)
}
# build final table
model.table <- df.fr %>%
group_by(name.company, ref.date) %>%
summarise(profit.loss = get.acc(acc.value, acc.number, '3.09'),
total.assets = get.acc(acc.value, acc.number, '1'),
capital = get.acc(acc.value, acc.number, '2.03'),
total.sales = get.acc(acc.value, acc.number, '3.01'),
EBIT = get.acc(acc.value, acc.number, '3.05'),
deprec.amort = get.acc(acc.value, acc.number, '6.01.01.02'),
EBITDA = EBIT + deprec.amort,
short.term.debt = get.acc(acc.value, acc.number, '2.01.04'),
long.term.debt = get.acc(acc.value, acc.number, '2.02.01'),
total.debt = short.term.debt + long.term.debt,
size = log(total.assets),
leverage = total.debt/total.assets,
ROA = EBITDA/total.assets,
ROE = profit.loss/capital) %>%
inner_join(df.volat) %>%
inner_join(ref.tab) %>%
inner_join(df.info) %>%
filter(total.assets > 0,
total.sales > 0,
capital > 0) %>%
inner_join(select(df.reports, company.name,
date.company.constitution,
date.cvm.registration),
by = c('name.company' = 'company.name' )) %>%
mutate(age = log(as.numeric(ref.date - date.company.constitution)/365)) %>%
select(name.company, ref.date, ROA,
ROE, size, age, risk, leverage, family.idx, main.sector) %>%
glimpse()
## Observations: 506
## Variables: 10
## $ name.company <chr> "ALPARGATAS SA", "ALPARGATAS SA", "ALPARGATAS SA"...
## $ ref.date <date> 2010-12-31, 2011-12-31, 2012-12-31, 2013-12-31, ...
## $ ROA <dbl> 0.17146869, 0.15552614, 0.13210314, 0.11059302, 0...
## $ ROE <dbl> 0.23129018, 0.20804121, 0.16911026, 0.16899388, 0...
## $ size <dbl> 14.47189, 14.52877, 14.66921, 14.89599, 14.92580,...
## $ age <dbl> 4.642624, 4.652211, 4.661732, 4.671137, 4.680456,...
## $ risk <dbl> 0.01515056, 0.01973376, 0.01816129, 0.01482123, 0...
## $ leverage <dbl> 0.114016614, 0.059653695, 0.057139840, 0.13141140...
## $ family.idx <dbl> 0.20000000, 0.19230769, 0.19230769, 0.19230769, 0...
## $ main.sector <chr> "Consumo Cíclico", "Consumo Cíclico", "Consumo Cí...
As a final step, we remove outliers from the data using a simple winsorization procedure with a 2.5% cut-off in both sides of the distribution:
winsorize <- function(x) {
lim <- quantile(x, probs = c(0.025,0.975), na.rm = T)
x[x<lim[1]] <- lim[1]
x[x>lim[2]] <- lim[2]
return(x)
}
model.table[,c('ROA','ROE','risk')] <- sapply(model.table[,c('ROA','ROE','risk')],
winsorize)
A summary of the resulting dataset is given next:
summary(model.table)
## name.company ref.date ROA
## Length:506 Min. :2010-12-31 Min. :-0.05480
## Class :character 1st Qu.:2011-12-31 1st Qu.: 0.04494
## Mode :character Median :2013-12-31 Median : 0.08376
## Mean :2013-11-23 Mean : 0.08911
## 3rd Qu.:2015-12-31 3rd Qu.: 0.13820
## Max. :2016-12-31 Max. : 0.23153
## NA's :23
## ROE size age risk
## Min. :-0.74808 Min. : 9.639 Min. :0.3384 Min. :0.01086
## 1st Qu.: 0.01808 1st Qu.:13.491 1st Qu.:3.4971 1st Qu.:0.01985
## Median : 0.08990 Median :14.472 Median :3.9772 Median :0.02442
## Mean : 0.06248 Mean :14.535 Mean :3.9126 Mean :0.03129
## 3rd Qu.: 0.16937 3rd Qu.:15.359 3rd Qu.:4.2611 3rd Qu.:0.03234
## Max. : 0.43936 Max. :20.276 Max. :7.6095 Max. :0.12451
## NA's :2
## leverage family.idx main.sector
## Min. :0.0000 Min. :0.00000 Length:506
## 1st Qu.:0.1140 1st Qu.:0.07743 Class :character
## Median :0.2615 Median :0.12500 Mode :character
## Mean :0.2566 Mean :0.15648
## 3rd Qu.:0.3736 3rd Qu.:0.18182
## Max. :0.7893 Max. :0.71429
##
With the required dataset cleaned and structured, we proceed with the estimation of the models. First we set the model specification as a formula object:
control.vars <- '+ size + age + risk + leverage'
my.index <- c('name.company', 'ref.date')
my.formula.ROA.plm <- formula(paste0('ROA ~ family.idx + I(family.idx^2) ',
control.vars,
' + factor(year(ref.date))'))
my.formula.ROE.plm <- formula(paste0('ROE ~ family.idx + I(family.idx^2) ',
control.vars,
' + factor(year(ref.date))'))
my.formula.ROE.lm <- formula(paste0('ROE ~ family.idx + I(family.idx^2) ',
control.vars,
' + factor(year(ref.date))',
' + factor(main.sector)') )
my.formula.ROA.lm <- formula(paste0('ROA ~ family.idx + I(family.idx^2) ',
control.vars,
' + factor(year(ref.date))',
' + factor(main.sector)') )
We have four models, two panel data and two ordinary least squares. We can now estimate them using plm
and lm
:
library(plm)
my.plm.ROA <- plm(data = model.table,
formula = my.formula.ROA.plm,
index = my.index,
model = 'within')
my.lm.ROA <- lm(data = model.table,
formula = my.formula.ROA.lm)
my.plm.ROE <- plm(data = model.table,
formula = my.formula.ROE.plm,
index = my.index,
model = 'within')
my.lm.ROE <- lm(data = model.table,
formula = my.formula.ROE.lm)
library(stargazer)
stargazer(my.plm.ROA, my.lm.ROA, my.plm.ROE, my.lm.ROE,
type = 'html',
title = 'Estimation Results',
model.names = F,
column.labels = c('FE', 'OLS', 'FE', 'OLS'),
style = 'aer',
omit = 'factor*',
omit.stat = c('f', 'ser'),
add.lines = list(c('Year Dummies', 'Yes', 'Yes',
'Yes', 'Yes'),
c('Sector Dummies', 'No', 'Yes',
'No', 'Yes')))
ROA | ROE | |||
FE | OLS | FE | OLS | |
(1) | (2) | (3) | (4) | |
family.idx | -0.010 | -0.027 | -0.356 | 0.374* |
(0.106) | (0.079) | (0.346) | (0.195) | |
I(family.idx2) | 0.151 | 0.063 | 0.116 | -0.867*** |
(0.219) | (0.145) | (0.650) | (0.327) | |
size | 0.062*** | -0.002 | 0.003 | 0.006 |
(0.012) | (0.002) | (0.017) | (0.005) | |
age | 0.040 | 0.000 | 0.162* | -0.003 |
(0.026) | (0.003) | (0.090) | (0.008) | |
risk | -0.242** | -0.907*** | -0.957** | -3.121*** |
(0.122) | (0.146) | (0.437) | (0.399) | |
leverage | -0.079** | -0.004 | -0.089 | -0.187*** |
(0.031) | (0.018) | (0.101) | (0.049) | |
Constant | 0.192*** | 0.239*** | ||
(0.032) | (0.088) | |||
Year Dummies | Yes | Yes | Yes | Yes |
Sector Dummies | No | Yes | No | Yes |
Observations | 481 | 481 | 504 | 504 |
R2 | 0.290 | 0.183 | 0.178 | 0.266 |
Adjusted R2 | 0.108 | 0.151 | -0.029 | 0.238 |
Notes: | ***Significant at the 1 percent level. | |||
**Significant at the 5 percent level. | ||||
*Significant at the 10 percent level. |
GetDFPData
has a clear impact, facilitating access to a large dataset from B3
Researchers will benefit with more time for research and less time for data acquisition and cleaning
Retails investors can use the package for their own financial analysis
The package is in constant development
More details about the package in its academic paper
Müller, Kirill, and Hadley Wickham. 2017. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.