Digi Cafe - Beginner Data Science in R

We are kicking off the Tidyverse section of the course with a very important and widely used concept called the pipe operator. This operator is part of the magrittr package, which is included in the Tidyverse collection of packages. The pipe operator helps us to clearly and sequentially express multiple operations, making complex data cleaning and transformation tasks much more readable and manageable. As the pipe operator is so useful, we will be using it throughout the rest of the course.

The pipe operator is denoted by the symbol %>%. You might already be familiar with some basic operators like + for addition and * for multiplication. These operators work by taking the items immediately to their left and right as inputs. While operators and functions may seem similar because they both take inputs and return outputs, they are distinct in terms of their capabilities and uses.

For instance, the expression a + b involves the use of the addition operator +, which takes a and b as inputs and returns their sum. This is a straightforward operation. On the other hand, we could create a function sum(a, b) to perform the same task. However, functions offer more flexibility by allowing us to include additional actions, such as printing a statement, within the same block of code. This demonstrates the key difference between an operator and a function: an operator performs a single, predefined operation, while a function can be designed to perform multiple actions and operations.

The pipe operator, however, works slightly differently than the typical operators we just discussed. Instead of taking inputs from both its left and right, the pipe operator takes the output of the item on its left and uses it as the input for the item on its right, in doing so makes the code more readable and concise.

Let's see an example using the dim function on the iris dataset. The dim function returns the dimensions of a dataframe, where the first value indicates the number of rows, and the second value indicates the number of columns (we can think of this like its height and width).

Okay! Let's break down what just happened. The output from the left side of the pipe operator was used as input for the function on the right side. In this case, there was no computation on the left side of the pipe operator, so the iris dataframe passed through unchanged. This means we could achieve the same result by simply writing dim(iris).

So, why use the pipe operator in this case? It might seem unnecessary from this example as it takes up more space and doesn't appear to make the code clearer or faster. However, the true power of the pipe operator shines when performing a sequence of operations. It helps make complex data cleaning and transformation tasks much more readable and understandable, as we will see in the upcoming lessons.

Practice exercise

Use the pipe operator and the head function to display the first six rows of the iris dataset. Note that the iris dataset is already loaded in the session.

The Pipe Operator

Practice exercise