Data Structures

We have been looking at dataframes up till now, this is indeed the data structure we will use the most in this course, however there are a two more that we will use, in particular lists and vectors.

Lists

Have you ever created a grocery list or a to-do list? In R, such collections are aptly represented by a list. A list is an ordered sequence of items. True to R's flexible nature, the elements in a list can be of any datatype. This means you can mix and match different datatypes, even including other lists within a list. Lists are fundamental tools in R and are defined using the list function. Below are examples showcasing lists that contain varied datatypes. We'll delve deeper into these datatypes in the datatypes lesson:

  • list_1: is a list with a named element x
  • list_2: is a list with two numbers.
  • list_3: is a list with both numbers and text.

A quick note: The list data structure has a complicated structure, so it does not print out very well. We indicate the separation in the lists by adding a print call before each. Since it is not handled well within a print or cat function.

We can determine the number of elements in a list using the built-in function length. An empty list will have a length of zero, a list with one element will have a length of one, and so on and so forth. We add the escape sequence \n we learned in the previous lesson to the end of the cat function to start the printout of the next function call on a new line.

Think of a book's table of contents. Each chapter or section has a specific page number associated with it. If we want to read a particular chapter, we look up its page number in the table of contents and then turn to that page. In this analogy, the page number is like an "index" that tells us where to find the chapter within the book.

Similarly, in programming, a list can be thought of as a book, and each item in the list is like a chapter. The "index" tells us the position of an item in the list, helping us to quickly access or modify it. For lists without named elements such as list_2, and list_3, R counts the first item in a list starting from 1. The second item is at index 2, and so on up until the last item in the list at index len(list). For example, in list_3 in the first code editor, there are 4 items and thus the length of it is 4. Therefore, the last index in the list is also 4. We may access the value at a specific index by using the double square brackets [[ ]] and the index number.

For lists with named elements such as list_1, we can access the value for the named element by using the dollar sign $ and the name of the element, such as we accessed a column of data in a dataframe in the previous lesson.

Vectors

In R, the technical definition of a vector is a one-dimensional array that can hold elements of a single datatype. A simpler way to think about it is that a vector is a simpler version of the list, where all of the elements must be the same datatype. Let's see an example, we can make a vector with the c function:

The index of a vector works the same way as the index of a list. The only difference is that we use single square brackets [ ] instead of double square brackets.

Vectors are dynamic structures, allowing modifications even after their creation. To add elements to a vector, simply add it as an element within the creation of another vector and it will collect these together into a new vector. To remove an item, pass in the index of the element we want to remove with a minus sign in front of it.

DataFrames

Let's take what we learned about indexing and apply this to dataframes. Each column in a dataframe can be considered as a list or vector, where all the columns have the same length. For example in the dataframe below, Coffee cannot have 4 rows while Price has 3 rows; they need the same number of rows to align. We can access a column of data in a dataframe using the dollar sign $ and the name of the column. We can also access a row and column element of data in a dataframe using the square brackets [ ] and indexing the row in the same way as we did with vectors.