Mutate
We have learned enough to start applying multiple concepts
together such as the pipe operator, summary statistic functions
in the previous lesson, and in this lesson, mutate
.
We will also learn
about white space and how to use it.
The mutate
function has to be one of the most
used functions in the Tidyverse. Despite the confusing name,
this function does not shapeshift nor deform data!
Rather all it does is add a new column to the dataframe.
Say we want to add another column to the iris dataframe, which we will name
Sepal.Length_2, that is equal to twice the Sepal.Length.
We can do so with the following mutate
function.
Just as the summarise
function, mutate
comes after the dataframe name and the pipe operator, and
within the function argument, the ordering goes:
"new variable name" on the left of the equal sign and whatever
operation on an existing variable on the right of the equal sign.
mutate(new_column = existing_column + c)
We can make multiple variables at the same time, we need only separate the two with a comma.
White space
Unlike other programming languages such as Python, R does not evaluate white space by default. As such, we can make code more readable by pushing text to a new line and indenting so similar code is stacked on top of each other.
The design of not evaluating white space allows code to be presented in easer to read formats and avoids horizontal scrolling. There is no hard limit to what can be put on one line, but we have found keeping R code to a maximum of 80 characters per line easier to read; the exact number that works for you depends on your screen size, resolution, and personal preference.
In the following code editor we have the same content as the previous code editor, we have just added white space to make it easier to read. We have added white space by pressing the enter (aka return) key to make new lines, using the tab key to indent, and lining up the closing parentheses with the first character in the function.
Piping multiple functions
Now, suppose we want to double the Sepal.Length and then also compute the mean of the modified variable. Naturally, we may accomplish this in two separate steps as illustrated below:
But there is a simpler way by chaining these two operations using the pipe operator.
Here, we can finally see the efficiency of the pipe operator! By employing two pipe operators in a single operation, we've reduced the necessity for saving multiple objects and have streamlined the code overall.
The iris dataframe on the left, is channeled to the
mutate
function on the right. The mutate
function then appends a new column, Sepal.Length_2,
to the dataframe. This modified dataframe is subsequently conveyed
to the summarise
function via the second pipe operator.
We have also added white space in this chain of operations
to make the code easier to read.
Practice exercise
Fill in the rest of the code within the mutate
function
to find the standard deviation of Petal.Length + 1
by calling
the summarise
function on a column you will create
called Petal.Length_2
.