Preface
Since you are reading this book, you are likely a data analyst looking for alternative and more efficient ways to add value to your organization, an undergraduate or graduate student in the first steps of learning data science, or an experienced researcher, looking for new computational tools. In any case, be assured that you are in the right place. This book will teach you how to use R and RStudio for data analysis in finance and economics.
The first version of the book originates from the class material I teach my postgraduate students in my university. By observing students learning and using R in the classroom, I frequently see the positive impact of technology on their careers. They spend less time doing repetitive and soul-crushing spreadsheet data chores, and more time thinking about their analysis and learning new tools. This book my humble attempt to go beyond the local classroom and reach an international audience.
Another motivation for writing this book is my personal experience using code from other researchers. Usually, the code is not well-organized, lacks clarity, and, possibly, only works in the computer of its author! After being constantly frustrated, I realized the work required to figure out the code of other researchers would take more time than writing the procedure myself. These cases hurt the development of science, as one of its basic principles is the reproducibility of experiments. As researchers are expected to be good writers, it should also be expected that their code is in a proper format and readable by other people. With this book, I will tackle this problem by presenting a code structure focused on scientific reproducibility, organization, and usability.
In this book, we will not work on the advanced uses of R. The content will be limited to simple and practical examples. One challenge I had while writing this book was defining the boundary between introductory and advanced material. Wherever possible, I gradually dosed the level of complexity. For readers interested in learning advanced features and inner workings of R, I suggest the book Venables et al. (2004), Teetor (2011) and Wickham (2019).
This is what you’ll learn from this book:
- Using R and RStudio
- In chapter 01 we will discuss the use of R as a programming platform designed to solve data-related problems in finance and economics. In chapter 02 we will explore basic commands and functionalities that will increase your productivity as a data analyst.
- Importing financial and economic data
- In chapters 04 and 05 we will learn to import data from local files, such as an Excel spreadsheet, or the internet, using specialized packages that can download financial and economic data such as stock prices, economic indices, the US yield curve, corporate financial statements, and many others.
- Cleaning, structuring and analyzing the data with R
- In chapters 06 and 07 we will concentrate our study on the ecosystem of basic and advanced classes of objects within R. We will learn to manipulate objects such as numeric vectors, dates and whole tables. In chapters 08 and 09 we’ll learn to use the programming to solve data-related problems such as cleaning and structuring messy data. In chapter 11 we will learn applications of the most common econometric models used in finance and economics including linear regression, generalized linear model, Arima model and others.
- Creating a visual analysis of data
- In chapter 10 we’ll learn to use functions from package {ggplot2} (Wickham, Chang, et al. 2023) to create clever visualizations of our datasets, including the most popular applications in finance and economics, time series and statistical plots.
- Reporting your results
- In chapter 12 we will see how to report our data analysis using specialized packages and the RMarkdown technology. It includes the topic of presenting and exporting tables, figures and models to a written report.
- Writing better and faster code
- In the last chapter of the book we discuss best programming practices with R. We will look at how to profile code and search for bottlenecks and improving execution time with caching strategies using package {memoise} (Wickham et al. 2021), C++ code with {Rcpp} (Eddelbuettel et al. 2023) and parallel computing with {furrr} (Vaughan and Dancho 2022).
Conventions
The format of the book was chosen to maximize learnability and memorization. Here are the conventions used throughout the text:
- Packages
- Every R package used in the text will have the textual format of {package}. The first time a R package shows up in the text, a formal citation will also be available.
- Functions
- Functions are formatted as dplyr::glimpse() , with the information of which package the function belongs to. This notation is simply a copy of real R code, that is, you can call functions using the same structure. The first time the function is referenced, the package name will be included, except for packages that are pre-loaded in a R session ({base}, {utils} and others).
- Code
-
All R code will be presented in boxes, with the code output prefixed by string
R>
. Inline comments are set with the symbol#
. Anything on the right side of#
is not evaluated by R. Here’s an example, showing the contents of alist
in R:
R> [[1]]
R> [1] "xx"
R>
R> [[2]]
R> [1] 1 2 3 4 5
R>
R> [[3]]
R> [1] "dec"
Supplement Material
All the material used in the book, including code examples separated by chapters, is publicly available on the internet and distributed with an R package called {afedR3} (M. S. Perlin 2023a). It includes data files and several functions that can make it easier to run the examples of the book. If you plan to write some code as you read the book, this package will greatly help your journey.
In order to install the book package in your computer, you need to execute a couple of lines of code in R. For that, copy and paste the following commands into RStudio prompt (bottom left of screen, with a “>” sign) and press enter for each command. Be aware you’ll need R and RStudio installed in your computer (see section 1.4 for details).
# install devtools dependency
install.packages('devtools')
# install book package
devtools::install_github('msperlin/afedR3')
What this code will do is to install package {devtools} (Wickham et al. 2022), a required dependency for installing a package from Github, which is where the book bundle is hosted. After that, a call to devtools::install_github('msperlin/afedR3')
will install the package in your computer. You can safely ignore any warning messages about long paths during installation.
After installing package {afedR3} (M. S. Perlin 2023a), you can, but its not necessary, to copy all book files to a local folder by executing the following command in R:
afedR3::bookfiles_get(path_to_copy = '~/afedR3')
The previous code will unzip the book file into your “Documents/afedR3” folder, as the tilda (~
) is a shortcut to your “Documents” directory2. If you prefer the old-fashioned way of using an internet page, you can find and download the package zip file from github3.
A suggestion, before you read the rest of the book: go to the book website and search for the related links page at the bottom. There you will find all internet addresses highlighted in the text, including the links for the installation of R and RStudio.
Content for Instructors
If you are an R instructor, you’ll find plenty of material you can use with your classes. I made sure you get everything you need:
- Over 100 exercises
-
Every chapter in this book includes exercises that your students can practice, with solutions available in the web version of the book. Also, all exercises are available in the
exams
format, meaning that you can compile the same exercises in pdf or html. Moreover, you can export the exercises to e-learning platforms such as Moodle and Blackboard. See this blog post4 for instructions on how to use it with your students. - Web version
- The first seven chapters of the book are freely available at link https://www.msperlin.com/afedr, which is more than enough material for an introductory class on R and data analysis.
All of this content is released with the MIT license, so feel free to use and abuse it, as long as you give the credits to the original author. You can find the content within the book package {afedR3} (see previous instructions on installation) or directly at the book site5.
I hope you enjoy this book and find it useful for your work.
Good reading!
Marcelo S. Perlin