Conditional Expressions with Filter
In the previous lesson we saw the select
function
used to subset data across columns. The counterpart to this
is the filter
function, which is used to subset
data across rows.
filter
operates on conditional expressions. These
conditional expressions in turn evaluate to booleans. We briefly
mentioned booleans in the Datatypes lesson.
To refresh, booleans are a datatype that can only be one of two values:
TRUE
or FALSE
. To get these booleans from
data we need to use something called conditional expressions.
Conditional Expressions
A conditional expression is a logical statement that evaluates to
either TRUE
or FALSE
; in the context of filtering
dataframes, this essentially creates a mask of TRUE
and
FALSE
values which is used to filter the dataframe,
selecting only the rows where the condition is TRUE
.
The first conditional expression we will look at is the equal expression.
This is denoted by ==
. And it checks whether
two values are equal. For example, if we have a variable
one with a value of 1, the expression
one == 1
checks if one is equal to
1, which is TRUE
in this case.
If the two values being compared are not the same, the expression
will evaluate to FALSE
.
We'll next give examples of this and other conditional expressions
in context of filter
. Other expressions include:
- greater than
>
- greater than or equal to
>=
- less than
<
- less than or equal to
<=
- not equal
!=
- included in
%in%
Equality ==
The following subsets the data to only rows where Petal.Width
is equal to 0.2. Note that we are adding on a head
function here so we are not printing out too much in the output.
We can also subset the data to only rows where Species is versicolor.
Greater than >
The greater expression subsets the data to only rows where the column is greater than the value on the right hand side. In this example we subset the data to rows where Petal.Width is greater than 1.0. So for example, 1.1, 1.01, and even the very close value 1.0000001 would all be included, but 1.0 would not be.
Greater than or equal >=
The greater than or equal to expression works the same as
>
, however it also includes the value on the
right hand side as within the valid range.
So in this case, 1.0 would be included.
Less than <
and Less than or equal <=
These work the same as >
and >=
but in the opposite direction. These are included as a practice exercise.
Not equal to !=
The not equal expression is the opposite of ==
.
So since Species == "versicolor"
subsets the data to
only rows where species is equal to versicolor,
Species != "versicolor"
will subset the data to
Species that are not equal to versicolor.
Included in %in%
The %in% expression evaluates to true whenever the content on the
left hand side is included in the list of items on the right hand side.
As such it is quite a versatile expression.
In the following example we subset the data to rows where Species
is either setosa or versicolor. To do so we first make
the list of strings to check for, then we use the filter
function and %in%
expression.
Of course, since there are only three species in the iris dataset an easier way to have done this would have been to exclude virginica. This method can be a life saver though if there were many more than three unique items!
Practice exercise
Use filter
to subset the iris dataframe to rows where
Petal.Length is less than or equal to 3.0.