Digi Cafe - Beginner Data Science in R

We have learned enough to start applying multiple concepts together such as the pipe operator, summary statistic functions in the previous lesson, and in this lesson, mutate. We will also learn about white space and how to use it.

The mutate function has to be one of the most used functions in the Tidyverse. Despite the confusing name, this function does not shapeshift nor deform data! Rather all it does is add a new column to the dataframe.

Say we want to add another column to the iris dataframe, which we will name Sepal.Length_2, that is equal to twice the Sepal.Length. We can do so with the following mutate function.

Just as the summarise function, mutate comes after the dataframe name and the pipe operator, and within the function argument, the ordering goes: "new variable name" on the left of the equal sign and whatever operation on an existing variable on the right of the equal sign.

We can make multiple variables at the same time, we need only separate the two with a comma.

White space

Unlike other programming languages such as Python, R does not evaluate white space by default. As such, we can make code more readable by pushing text to a new line and indenting so similar code is stacked on top of each other.

The design of not evaluating white space allows code to be presented in easer to read formats and avoids horizontal scrolling. There is no hard limit to what can be put on one line, but we have found keeping R code to a maximum of 80 characters per line easier to read; the exact number that works for you depends on your screen size, resolution, and personal preference.

In the following code editor we have the same content as the previous code editor, we have just added white space to make it easier to read. We have added white space by pressing the enter (aka return) key to make new lines, using the tab key to indent, and lining up the closing parentheses with the first character in the function.

Piping multiple functions

Now, suppose we want to double the Sepal.Length and then also compute the mean of the modified variable. Naturally, we may accomplish this in two separate steps as illustrated below:

But there is a simpler way by chaining these two operations using the pipe operator.

Here, we can finally see the efficiency of the pipe operator! By employing two pipe operators in a single operation, we've reduced the necessity for saving multiple objects and have streamlined the code overall.

The iris dataframe on the left, is channeled to the mutate function on the right. The mutate function then appends a new column, Sepal.Length_2, to the dataframe. This modified dataframe is subsequently conveyed to the summarise function via the second pipe operator. We have also added white space in this chain of operations to make the code easier to read.

Practice exercise

Fill in the rest of the code within the mutate function to find the standard deviation of Petal.Length + 1 by calling the summarise function on a column you will create called Petal.Length_2 .

Mutate

White space

Piping multiple functions

Practice exercise