1 Introduction
In the digital era, information is abundant and accessible. From the ever-changing price of financial contracts to the unstructured data of social media websites, the high volume of information creates a strong need for data analysis in the workplace. A company or organization benefit immensely when it can create a bridge between raw information from its environment and making strategic decisions. Undoubtedly, this is a prolific time for professionals skilled in using the right tools for acquiring, storing, and analyzing data.
In particular, datasets related to Economics and Finance are widely available to the public. International and local institutions, such as central banks, government research agencies, financial exchanges, and many others, provide their data publicly, either by legal obligation or to foment research. Whether you are looking into statistics for a particular country or a company, most information is just two clicks away.
Not surprisingly, it is expected that a graduate student or a data analyst has learned at least one programming language that allows him/her to do his work more efficiently. Learning how to program is becoming a requisite for the job market. This is where the role and contribution of R comes into play. In the next sections, I will explain what R is and why you should use it.
1.1 What is R
R is a programming language specially designed to resolve statistical problems and display graphical representations of data. R is a modern version of S, a programming language originally created in Bell Laboratories (formerly AT&T, now Lucent Technologies). The base code of R was developed by two academics, Ross Ihaka and Robert Gentleman, resulting in the programming platform we have today. For anyone curious about the name, the letter R was chosen due to the common first letter of the name of their creators.
Today, R is almost synonymous with data analysis, with a large user base and consolidated modules. It is likely that researchers from various fields, from economics to biology, find in R significant preexisting code that facilitates their analysis. On the business side, large and established companies, such as Google and Microsoft, already adopted R as the internal language for data analysis. R is maintained by the R Foundation6 and the R Consortium7, a collective effort to fund projects for extending the programming language.
1.2 Why Choose R
Learning a new programming language requires a lot of time and effort. Perhaps you’re wondering why you should choose R and invest time in learning it. Here are the main arguments.
First, R is a mature and stable platform, continuously supported and intensively used in the industry. When choosing R, you will have the computational background not only for an academic career but also to work as a data analyst in private organizations. Due to its open license, you can use R anywhere. Also, the strong support from the community means it is very unlikely the R platform will ever fade away or be substituted. Depending on your career choices, R might be the only programming language you ever need to learn.
Learning R is easy. My experience in teaching R allows me to confidently state that students, even those with no programming experience, have no problem learning the language and using it to create their own code. The language is intuitive and certain rules and functions can be extended to different cases. Once you understand how the software expects you to think, it become easy to traverse over different modules and functionalities.
The engine of R and the interface of RStudio creates a highly productive environment. The graphical interface provided by RStudio facilitates the use of R and increases productivity by introducing new features to the platform. By combining both, the user has at his disposal many tools that facilitate the development of research scripts and other projects.
R Packages allow the user to do many different things with R. We will soon learn that R offers several modules that can be installed over the internet whenever necessary. These modules extend the basic language of R and enable the most diverse functionalities. Besides basic data tasks such as reading and writing, you can, for example, use R to build and publish a blog, send emails, create exams, write random jokes and poems (seriously!), and many other features. The existing external modules in R are truly an impressive achievement of the community.
R is compatible with different operating systems and it can interface with different programming languages. If you need to execute code from another programming language, such as C++, Python, Julia, it is easy to integrate it with R. Therefore, the user is not restricted to a single programming language and can easily use features and functions from others. For example, the C++ code is well known for its superior speed in numerical tasks. From an R script, you can use package {Rcpp} (Eddelbuettel et al. 2023) to write a C++ function and effortlessly use it within your R code.
R is free! The main software and all its packages are free. A generous license motivates the adoption of the R language in a business environment, where obtaining individual and collective licenses of commercial software can be costly. This means you can take R anywhere you go, regardless of whether you have a budget for software or not.
1.3 What Can You Do With R and RStudio?
R is a fairly complete programming language and any computational problem can be solved based on it. Given the adoption of R for different areas of knowledge, the list is extensive. With finance and economics, I can highlight the following possibilities:
Substitute and improve data-intensive tasks from spreadsheet-like software;
Develop routines for managing investment portfolios and executing financial orders;
Creating tools for calculating and reporting economic indices such as inflation and unemployment;
Performing empirical data research using statistical techniques, such as econometric models and hypothesis testing;
Create dynamic websites with the {shiny} (Chang et al. 2021) package, allowing anyone in the world to use a computational tool created by you;
Automate the process of writing technical reports with the RMarkdown and Quarto technology;
Moreover, public access to packages developed by users further expands these capabilities. The CRAN views website8 offers a Task Views panel for the topic of Finance9 and Econometrics10. There you can find the main packages to perform specific operations such as importing financial data from the internet, estimating econometric models, calculation of different risk estimates, among many other possibilities. Reading this page and the knowledge of these packages is essential for those who intend to work in Finance and Economics.
1.4 Installing R
Before going any further, let’s install the required software on your computer. The most direct and practical way to install R is to direct your favourite internet browser to R website11 and click the Download link in the left side of the page, as shown in Figure 1.1.
The next screen gives you a choice of the mirror to download the installation files. The CRAN repository (R Comprehensive Archive network) is mirrored in various parts of the world. You can choose one of the links from the nearest location to you. If undecided, just select the mirror 0-Cloud (see Figure 1.2), which will automatically take you to the nearest location.
The next step involves selecting your operating system, likely to be Windows. From now on, due to the greater popularity of this platform, we will focus on installing R in Windows. The instructions for installing R in other operating systems can be easily found online. Regardless of the underlying platform, using R is about the same. There are a few exceptions, especially when R interacts with the file system. In the content of the book, special care was taken to choose functions that work the same way in different operating systems. A few exceptions are highlighted throughout the book. So, even if you are using a Mac or a flavor of Linux, you can take full advantage of the material presented here.
After clicking the link Download R for Windows, as in Figure 1.3, the next screen will show the following download options: base, contrib, old.contrib and RTools. The first (base), should be selected. It contains the download link to the executable installation file of R in Windows.
Some R packages requires local compilation of the files. For that, you need RTools, a bundle of compilers and utilities. So, you can safely install RTools from CRAN website.
After clicking the link base, the next screen will show the link to the download of the R installation file. After downloading the file, open it and follow the steps in the installation screen. At this time, no special configuration is required. I suggest keeping all the default choices and simply hit accept in the displayed dialogue screens. After the installation of R, it is strongly recommended to install RStudio, which will be addressed next.
Be aware that R has a consistent release schedule. Every four months a new version of R is released, fixing bugs and implementing new solutions. There are two main types of releases, major and minor. For example, today, 2023-02-23, the latest version of R is 4.2.2. The first digit (“4”) indicates the major release while all others are of the minor type. Generally, the minor changes are very specific and, possibly, will have little impact on your work.
However, unlike minor releases, major releases are fully reflected in the R package ecosystem. Every time you install a new major version of R, you will have to reinstall all packages. Particularly, the problem here is that it is not uncommon that a new major release comes with package incompatibility issues. My advice is: every time a new major release of R comes out, wait a few months before installing it on your machine. Thus, the authors of the packages will have more time to update their codes, minimizing the possibility of compatibility problems.
1.5 Installing RStudio
The base installation of R includes its own GUI (graphical user interface), where we can write and execute code. However, this native interface has several limitations. RStudio Desktop substitutes the original GUI and makes access to R more practical and efficient. One way to understand this relationship is with an analogy with cars. While R is the engine of the programming language, RStudio is the body and instrument panel, which significantly improves the user experience. With RStudio you’ll have code highlight, creation of projects, and much more.
The installation of RStudio is simpler than that of R. Direct your favourite browser to Posit (formerly RStudio) website12 and click in Download RStudio and then Download RStudio Desktop. After that, just select the installation file relative to the operating system on which you will work. This option is probably WINDOWS Vista 7/8/10. Note that RStudio is also available for Mac and Linux.
I emphasize that using RStudio is not essential to develop programs in R. Other interfaces are available and can be used. However, in my experience, RStudio is the interface that offers a vast range of features for the language and is widely used, which justifies its choice. If you want to explore other programming interfaces for R, one that I really enjoy and use is Microsoft’s VSCode13.
1.6 Resources in the Web
The R community is vivid and engaging. There are many authors, such as myself14, that constantly release material about R in their blogs. It includes the announcement of new packages, analysis of real world datasets, curiosities, rants, and tutorials. R-Bloggers15 is a website that aggregates these blogs, making it easier for anyone to access and participate. I strongly recommend to sign up for the R-Bloggers feed in RSS16, Facebook17 or Twitter18. Not only you’ll be informed of what is happening in the R community, but also learn a lot by reading other people’s code and articles.
Learning and using R can be a social experience. Several conferences and user groups are available in many countries. You can find the complete list in this link19. I also suggest looking in social platforms for local R groups in your region.
1.7 Structure and Organization
This book presents a practical approach to using R in finance and economics. To get the most out of it, I suggest you first try to understand what the code does and, after that, use it on your own computer. Whenever you find a piece of code that you do not understand, go on and study it. At first, it might seem like a daunting task but, with time, be confident that the learning process will get a lot easier as the code blocks will start to make sense and connect to each other.
Learning to program in a new platform is like learning a foreign spoken language: the use in day-to-day problems is imperative to create fluency. All the code and data used in this book is available with the installation of package {afedR3} (M. S. Perlin 2023a) (see the preface for instructions on how to install it). I suggest you test the code on your computer and play with it, modifying the examples and checking the effect of changes in the outputs. Whenever you have a computational problem, try using R to solve it. You’ll stumble and make mistakes at first. But I guarantee that, soon enough, you’ll be able to write complex data tasks effortlessly.
Throughout the book, every demonstration of code will have two parts: the R code and its output. The output is nothing more than the textual result of the commands on the screen. All inputs and outputs code will be marked in the text with a special format. See the following example:
R> [[1]]
R> [1] "abc"
R>
R> [[2]]
R> [1] 1 2 3 4 5
R>
R> [[3]]
R> [1] "dec"
For the previous chunk of code, lines this_list <- list('abc', 1:5, 'dec')
and print(this_list)
are actual commands given to R. The output of this simple piece of code is the on-screen presentation of the contents of object this_list
.
The code can also be spatially organized using newlines. This is a common strategy around arguments of functions. The next chunk of code is equivalent to the previous and will run the exact same way. Notice how we used a new line to vertically align the arguments of function list
. You’ll soon see that, throughout the book, this type of vertical alignment is constantly used.
R> [[1]]
R> [1] "abc"
R>
R> [[2]]
R> [1] 1 2 3 4 5
R>
R> [[3]]
R> [1] "dec"
The code also follows a well-defined structure. One decision in writing computer code is how to name objects and how to structure it. It is recommended to follow a clear pattern, so it is easy to maintain over time and be used and understood by others. For this book, a mixture of the author’s personal choices with the coding style suggested by Google20 was used. The reader, however, may choose the structure he finds more efficient and aesthetically pleasing. Like many things in life, this is a choice. We will get back at discussing code structure in chapter 13.
1.8 Exercises
Q.3
Why is R special when comparing to other programming languages, such as Python, C++, javascript and others?
Q.5
Consider the following alternatives about R and RStudio:
I - R was developed in 2018 and is an innovative and unstable project;
II - RStudio is an alternative programming language to R;
III - R is not compatible with different programming languages;
Which alternatives are correct?
Q.6
Once you have R and RStudio installed, head over to the CRAN package website21 and look for technologies you use in your work. For example, if you use Google Sheets22 ostensibly in your work, you will soon discover that there is a package in CRAN called googlesheets4
that interacts with spreadsheets in the cloud.
Q.8
Use Google to search for R groups in your region. Check if the meetings are frequent and, if you don’t have a major impediment, go to one of these meetings and make new friends.
Q.9
Go to the RBloggers website23 and look for a topic of interest to you, such as football (soccer) or investments (investments). Read at least three of the found blog posts.
Q.10
If you work in an institution with data infrastructure, talk to the person in charge of the IT department and verify what technologies are used. Check if, through R, it is possible to access all tables in the databases. For now there is no need to write code, yet. Just check if this possibility exists.