Conditional Expressions with Filter

In the previous lesson we saw the select function used to subset data across columns. The counterpart to this is the filter function, which is used to subset data across rows.

filter operates on conditional expressions. These conditional expressions in turn evaluate to booleans. We briefly mentioned booleans in the Datatypes lesson. To refresh, booleans are a datatype that can only be one of two values: TRUE or FALSE. To get these booleans from data we need to use something called conditional expressions.

Conditional Expressions

A conditional expression is a logical statement that evaluates to either TRUE or FALSE; in the context of filtering dataframes, this essentially creates a mask of TRUE and FALSE values which is used to filter the dataframe, selecting only the rows where the condition is TRUE.

The first conditional expression we will look at is the equal expression. This is denoted by ==. And it checks whether two values are equal. For example, if we have a variable one with a value of 1, the expression one == 1 checks if one is equal to 1, which is TRUE in this case. If the two values being compared are not the same, the expression will evaluate to FALSE.

We'll next give examples of this and other conditional expressions in context of filter. Other expressions include:

  • greater than >
  • greater than or equal to >=
  • less than <
  • less than or equal to <=
  • not equal !=
  • included in %in%

Equality ==

The following subsets the data to only rows where Petal.Width is equal to 0.2. Note that we are adding on a head function here so we are not printing out too much in the output.

We can also subset the data to only rows where Species is versicolor.

Greater than >

The greater expression subsets the data to only rows where the column is greater than the value on the right hand side. In this example we subset the data to rows where Petal.Width is greater than 1.0. So for example, 1.1, 1.01, and even the very close value 1.0000001 would all be included, but 1.0 would not be.

Greater than or equal >=

The greater than or equal to expression works the same as >, however it also includes the value on the right hand side as within the valid range. So in this case, 1.0 would be included.

Less than < and Less than or equal <=

These work the same as > and >= but in the opposite direction. These are included as a practice exercise.

Not equal to !=

The not equal expression is the opposite of ==. So since Species == "versicolor" subsets the data to only rows where species is equal to versicolor, Species != "versicolor" will subset the data to Species that are not equal to versicolor.

Included in %in%

The %in% expression evaluates to true whenever the content on the left hand side is included in the list of items on the right hand side. As such it is quite a versatile expression. In the following example we subset the data to rows where Species is either setosa or versicolor. To do so we first make the list of strings to check for, then we use the filter function and %in% expression.

Of course, since there are only three species in the iris dataset an easier way to have done this would have been to exclude virginica. This method can be a life saver though if there were many more than three unique items!

Practice exercise

Use filter to subset the iris dataframe to rows where Petal.Length is less than or equal to 3.0.