The Pipe Operator
We are kicking off the Tidyverse section of the course with a very important and widely used concept called the pipe operator. This operator is part of the magrittr package, which is included in the Tidyverse collection of packages. The pipe operator helps us to clearly and sequentially express multiple operations, making complex data cleaning and transformation tasks much more readable and manageable. As the pipe operator is so useful, we will be using it throughout the rest of the course.
The pipe operator is denoted by the symbol %>%
. You might
already be familiar with some basic operators like +
for
addition and *
for multiplication. These operators work by
taking the items immediately to their left and right as inputs. While
operators and functions may seem similar because they both take inputs
and return outputs, they are distinct in terms of their capabilities
and uses.
For instance, the expression a + b
involves the use of
the addition operator +
, which takes a and b
as inputs and returns their sum. This is a straightforward operation.
On the other hand, we could create a function sum(a, b)
to
perform the same task. However, functions offer more flexibility by
allowing us to include additional actions, such as printing a statement,
within the same block of code. This demonstrates the key difference
between an operator and a function: an operator performs a single,
predefined operation, while a function can be designed to perform
multiple actions and operations.
The pipe operator, however, works slightly differently than the typical operators we just discussed. Instead of taking inputs from both its left and right, the pipe operator takes the output of the item on its left and uses it as the input for the item on its right, in doing so makes the code more readable and concise.
Let's see an example using the dim
function on the iris
dataset. The dim
function returns the dimensions of a
dataframe, where the first value indicates the number of rows, and
the second value indicates the number of columns (we can think of
this like its height and width).
Okay! Let's break down what just happened. The output from the left
side of the pipe operator was used as input for the function on the
right side. In this case, there was no computation on the left side
of the pipe operator, so the iris dataframe passed through unchanged.
This means we could achieve the same result by simply writing
dim(iris)
.
So, why use the pipe operator in this case? It might seem unnecessary from this example as it takes up more space and doesn't appear to make the code clearer or faster. However, the true power of the pipe operator shines when performing a sequence of operations. It helps make complex data cleaning and transformation tasks much more readable and understandable, as we will see in the upcoming lessons.
Practice exercise
Use the pipe operator and the head
function to display
the first six rows of the iris dataset. Note that the iris dataset
is already loaded in the session.