Conditional Expressions with Query
In the previous lesson we saw the loc
method
used to subset data across columns. The counterpart to this
is the query
method, which is used to subset
data across rows. In addition, bracket notation as we learned in
the previous lesson may be used when we index the row instead
of the column.
query
operates on conditional expressions. These
conditional expressions in turn evaluate to booleans. We briefly
mentioned booleans in the Datatypes lesson.
To refresh, booleans are a datatype that can only be one of two values:
True
or False
. To get these booleans from
data we need to use something called conditional expressions.
Conditional Expressions
A conditional expression is a logical statement that evaluates to
either True
or False
; in the context of filtering
dataframes, this essentially creates a mask of True
and
False
values which is used to filter the dataframe,
selecting only the rows where the condition is True
.
The first conditional expression we will look at is the equal expression.
This is denoted by ==
. And it checks whether
two values are equal. For example, if we have a variable
one with a value of 1, the expression
one == 1
checks if one is equal to
1, which is True
in this case.
If the two values being compared are not the same, the expression
will evaluate to False
.
We'll next give examples of this and other conditional expressions
in context of filter
. Other expressions include:
- greater than
>
- greater than or equal to
>=
- less than
<
- less than or equal to
<=
- not equal
!=
- included in
isin
Equality ==
The following subsets the data to only rows where Petal.Width
is equal to 0.2. Note that we are adding on a head
function here so we are not printing out too much in the output.
Note: query
accepts a string conditional expression. It will
know column names in the dataframe, however some columns that have
special characters (such as a period) need to be accessed using a backtick.
We can also subset the data to only rows where Species is versicolor.
We can achieve the same filtering using bracket notation. Inside the
brackets, we are creating a boolean list that will filter the
dataframe by rows where the condition is True
.
Greater >
The greater expression subsets the data to only rows where the column is greater than the value on the right hand side. In this example we subset the data to rows where Petal.Width is greater than 1.0. So for example, 1.1, 1.01, and even the very close value 1.0000001 would all be included, but 1.0 would not be.
Greater than or equal >=
The greater than or equal to expression works the same as
>
, however it also includes the value on the
right hand side as within the valid range.
So in this case, 1.0 would be included.
Less than and Less than or equal < and <=
These work the same as >
and >=
but in the opposite direction. These are included as a practice exercise.
Not equal to !=
The not equal expression is the opposite of ==
.
So since Species == "versicolor"
subsets the data to
only rows where species is equal to versicolor,
Species != "versicolor"
will subset the data to
Species that are not equal to versicolor.
Included in isin
The isin method evaluates to true whenever the content on the
left hand side is included in the list of items passed as an argument to the method.
As such it is quite a versatile expression.
In the following example we subset the data to rows where Species
is either setosa or versicolor. To do so we first make
the list of strings to check for, then we use the
isin
method.
Of course, since there are only three species in the iris dataset an easier way to have done this would have been to exclude virginica. This method can be a life saver though if there were many more than three unique items!
Practice exercise
Use query
to subset the iris dataframe to rows where
Petal.Length is less than or equal to 3.0.