Data Structures
We have been looking at dataframes up till now, this is indeed the data structure we will use the most in this course, however there are a two more that we will use, in particular lists and vectors.
Lists
Have you ever created a grocery list or a to-do list?
In R, such collections are aptly represented by a list.
A list is an ordered sequence of items. True to R's flexible nature,
the elements in a list can be of any datatype. This means you can mix
and match different datatypes, even including other lists within a list.
Lists are fundamental tools in R and are defined using the list
function. Below are examples showcasing lists that contain varied
datatypes. We'll delve deeper into these
datatypes in the datatypes lesson:
- list_1: is a list with a named element x
- list_2: is a list with two numbers.
- list_3: is a list with both numbers and text.
A quick note: The list data structure has a complicated structure, so it
does not print out very well. We indicate the separation in
the lists by adding a print
call before each.
Since it is not handled well within a print
or cat
function.
We can determine the number of elements in a list using the
built-in function length
.
An empty list will have a length of zero, a list with one
element will have a length of one, and so on and so forth.
We add the escape sequence \n
we learned in
the previous lesson to the end of
the cat
function to start the printout of the
next function call on a new line.
Think of a book's table of contents. Each chapter or section has a specific page number associated with it. If we want to read a particular chapter, we look up its page number in the table of contents and then turn to that page. In this analogy, the page number is like an "index" that tells us where to find the chapter within the book.
Similarly, in programming, a list can be thought of as a book, and
each item in the list is like a chapter. The "index" tells us the
position of an item in the list, helping us to quickly access or
modify it. For lists without named elements such as list_2,
and list_3, R counts
the first item in a list starting from 1. The second item is
at index 2, and so on up until the last item in the list at
index len(list)
. For example, in
list_3 in the first code editor, there are 4 items
and thus the length of it is 4. Therefore, the last
index in the list is also 4. We may access the
value at a specific index by using the double square brackets
[[ ]]
and the index number.
For lists with named elements such as list_1, we can
access the value for the named element by using the dollar sign
$
and the name of the element, such as we
accessed a column of data in a dataframe in the previous lesson.
Vectors
In R, the technical definition of a vector is a one-dimensional
array that can hold elements of a single datatype. A simpler way to
think about it is that a vector is a simpler version of the list,
where all of the elements must be the same datatype.
Let's see an example, we can make a vector with the
c
function:
The index of a vector works the same way as the index of a list.
The only difference is that we use single square brackets
[ ]
instead of double square brackets.
Vectors are dynamic structures, allowing modifications even after their creation. To add elements to a vector, simply add it as an element within the creation of another vector and it will collect these together into a new vector. To remove an item, pass in the index of the element we want to remove with a minus sign in front of it.
DataFrames
Let's take what we learned about indexing and apply this to dataframes.
Each column in a dataframe can be considered as a list or vector, where
all the columns have the same length. For example in the dataframe below,
Coffee cannot
have 4 rows while Price has 3 rows; they need the same
number of rows to align. We can access a column of data in a dataframe
using the dollar sign $
and the name of the column.
We can also access a row and column element of data in a dataframe
using the square brackets [ ]
and indexing the row in the
same way as we did with vectors.