Libraries and Packages

Let's inspect and breakdown the first two lines of code we saw in the previous lesson.

Both of these lines are loading things that other people have created, into our R environment. The first line loads a package called tidyverse. Loading a package grants us access to all its contents. The second line loads the iris dataset.

Let's look at the first line. The keyword in the first line is library. However, to understand what a library is, we also need to learn what a package and function are as well, so here are the definitions of each of these three things:

A library is the location where a package is located.
A package is a collection of functions, data and documentation that extends the functionality of R.
A function transforms one or multiple inputs by following a set of instructions and produces an output.

Let's look at the diagram below to help see the relation between these three things.

Libraries and packages diagram

Let's start on the inside and work our way out. In the diagram we see three functions, each function may serve different purposes, take different inputs and produce different outputs. We will cover functions in more detail in the next lesson, so we will just also note here that often a function is designed to do just one thing. And as such we may need to use multiple functions to accomplish our task. For this reason, multiple functions are collected together into a package which is why the package is the middle oval in the diagram.

A library is the location where a package is located. To make an analogy; a package is like a book, and a library is the shelf where the book is stored. We might "buy" a new book (install a package) and put it on our shelf (in a library). When we want to read it (use the package's functions), we need to take it off the shelf (by using the library function to load it).

R has a central software repository called The Comprehensive R Archive Network (CRAN). And this is where the tidyverse package and many more packages are stored. So there is actually one more hidden line of code that is ran before these two, and that is install.packages("tidyverse"). When this line is ran, R will search within this online repository for a package called tidyverse. After it finds it, it will save it into a library on our own computer where R can find it and access it easier.

Now that we have this background information, let's look at the first two lines of code again. The first line loads the tidyverse package into R so the functions within the package can be used. And, we will learn all about the tidyverse in the Tidyverse section of this course.

The second line loads the iris dataset we have been looking at, into R's environment. This is also how we have been loading the dataset in the code editors in the previous lessons. Throughout courses here at Digi Cafe, we will load the packages and data for you, so you can focus on learning the concepts and skills. However, knowing what environment the code runs in can help provide the additional context needed to understand the code. You can always find out what environment the code is running in by checking the Code environment dropdown menu at the top of any lesson with a code editor.

Now that we understand what a package and a library is let's finish learning about the third item functions in our next lesson!